Abstract
Recent computational theories of interoception suggest that perception of bodily states rests upon an expected reliability- or precision-weighted integration of afferent signals and prior beliefs. The computational psychiatry framework further suggests that aberrant precision-weighting may lead to misestimation of bodily states, potentially hindering effective visceral regulation and promoting psychopathology. In a previous study, we fit a Bayesian computational model of perception to behavior on a heartbeat tapping task to test whether aberrant precision-weighting was associated with misestimation of bodily states. We found that, during an interoceptive perturbation designed to amplify afferent signal precision (inspiratory breath-holding), healthy individuals increased the precision-weighting assigned to ascending cardiac signals (relative to resting conditions), while individuals with symptoms of anxiety, depression, substance use disorders, and/or eating disorders did not. A second study also replicated the pattern observed in healthy participants. In this pre-registered study, we aimed to replicate our prior findings in a new transdiagnostic patient sample (N=285) similar to the one in the original study. These new results successfully replicated those found in our previous study, indicating that, transdiagnostically, patients were unable to adjust beliefs about the reliability of interoceptive signals – preventing the ability to accurately perceive changes in their bodily state. Follow-up analyses combining samples from the previous and current study (N=719) also afforded the power to identify group differences within narrower diagnostic groups and to examine predictive accuracy when logistic regression models were trained on one sample and tested on the other. Given the increased confidence in the generalizability of these effects, future studies should examine the utility of interceptive precision measures in predicting treatment outcomes or identify whether these computational mechanisms might represent novel therapeutic targets for improving visceral regulation.
Introduction
Interoception – the process by which the nervous system senses, interprets, and integrates the internal state of the body – is thought to play an important role in a number of psychiatric disorders. Evidence of altered interoceptive processing has been observed in depression, anxiety, eating, and substance use disorders, among others (reviewed in (1)). This incudes, for example, negative associations between depression and heartbeat counting accuracy (2–5), heightened interoceptive sensations in panic disorder (reviewed in (6)), blunted neural responses during interoceptive processing in substance users (7), and stronger effects of expectation on interoception in eating disorders (8). However, debates about the limitations of current interoception measures have grown in recent years (9–14), and the pathophysiological mechanisms underlying psychiatric disorders remain unclear.
In an effort to address these methodological and inferential limitations, we previously applied a computational modeling approach designed to minimize certain limitations of previous approaches and to allow inferences regarding the algorithmic processing mechanisms spanning depression, anxiety, eating, and substance use disorders (15). Our model assumed that the brain updates beliefs about the state of the body by integrating afferent interoceptive signals with prior beliefs in a manner weighted by their expected reliability – or precision – corresponding to approximate Bayesian inference. As the reliability of prior beliefs and afferent signals can change over time and context, adaptive interoceptive processing requires continual tuning of these precision weightings – a process that could be affected in psychopathology. Fitting our model to behavioral data in a cardiac interoception (heartbeat tapping) task revealed that healthy individuals successfully adapted the precision assigned to both prior expectations and afferent signals during different task contexts. Most importantly in the present context, they assigned low precision to the sensory signal at rest, while this precision assignment increased during a breath-hold perturbation designed to amplify the strength of the afferent signal. In a follow-up study with a new sample of healthy individuals, we replicated this effect using the same task and model (16). This increase in precision assignment during an interoceptive perturbation was not unexpected, as only roughly 35% of individuals appear to accurately perceive their own heartbeats at rest (6). In contrast, when visceral states are perturbed, cardiac perception becomes more accurate, particularly under conditions of heightened cardiorespiratory arousal (17–19).
In contrast to healthy participants, the original study found evidence that a transdiagnostic psychiatric sample did not update their signal precision estimates during the breath-hold manipulation, suggesting rigidity in these estimates and an insensitivity to changes in the afferent signal. However, this finding remained to be replicated. In the present pre-registered study (https://osf.io/9znsg), we attempted to replicate this inflexible pattern of precision estimation in a new transdiagnostic sample with a similar composition to the previous study and who performed the same task under identical conditions.
Methods
Participants
Data were collected as part of the larger Tulsa 1000 project, which includes an exploratory and confirmatory dataset comprised of 1050 individuals (20). Our prior study used the exploratory dataset (N = 434), while this study used the confirmatory dataset. The available subset of the confirmatory sample (N = 479) in the Tulsa 1000 study included 94 healthy comparisons (HCs; 37 male; mean age: 32.35, SD = 11.16), 10 individuals with anxiety disorders alone (iANX; 1 male; mean age: 37.18, SD = 6.37), 57 with major depression alone (iDEP; 20 male; mean age: 36.00, SD = 10.12), 127 with co-morbid anxiety/depression (iDEP+ANX; 26 male; mean age: 31.99, SD = 9.83), 155 with substance use disorders (iSUDs; i.e., with or without co-morbid affective disorders; 60 male; mean age: 34.07, SD = 8.45), and 36 with eating disorders (iEDs; i.e., with or without co-morbid affective disorders; 3 male; mean age: 27.75, SD = 8.46). Please note that acronyms here include “i” to indicate “individuals with” so as not to identify individuals by their diagnoses in subsequent use.
To arrive at this sample, individuals aged 18-55 years were screened based on dimensional psychopathology scores. Inclusion used the following measures and criteria: Patient Health Questionnaire (PHQ-9; (21)) ≥ 10, Overall Anxiety Severity and Impairment Scale (OASIS; (22)) ≥ 8, Drug Abuse Screening Test (DAST-10; (23)) score > 2, and/or Eating Disorder Screen (SCOFF; (24)) score ≥ 2 (for screening measure scores on the full sample, see Table 1). Participants were grouped based on DSM-IV or DSM-5 diagnosis using the Mini International Neuropsychiatric Inventory 6 or 7 (MINI; (25)). HCs did not show elevated symptoms or possess any psychiatric diagnoses.
Participants were excluded if they (i) tested positive for drugs of abuse, (ii) met criteria for psychotic, bipolar, or obsessive-compulsive disorders, or reported (iii) history of moderate-to-severe traumatic brain injury, neurological disorders, or severe or unstable medical conditions, (iv) active suicidal intent or plan, or (v) change in psychotropic medication status within 6 weeks. Full inclusion/exclusion criteria are described in (20). The study was approved by the Western Institutional Review Board. All participants provided written informed consent prior to completion of the study protocol, in accordance with the Declaration of Helsinki, and were compensated for participation. ClinicalTrials.gov identifier: #NCT02450240.
The clinical and demographic information reported in Table 1 includes the 479 participants with available data from the confirmatory dataset, excluding only those with invalid task or electrocardiogram (EKG) data. We note here, however, that some of these participants did not pass additional quality control checks and were subsequently removed, as described further below.
Heartbeat perception task
As part of the T1000 project, participants completed several assessments, self-report measures, and behavioral tasks (detailed in (20)). The heartbeat tapping task relevant to the present study was based on a previously developed task (26, 27). As in our previous study, the task was repeated under three conditions (each for a period of 60 seconds) designed to influence prior confidence and afferent signal precision. In the “guessing” condition, participants were simply instructed to close their eyes and press down on a key when they felt their heartbeat, to try to mirror their heartbeat as closely as possible, and told that they should take their best guess even if they weren’t sure. In the “no guessing” condition, all instructions were identical except that they were told to only press the key when they actually felt their heartbeat, and if they did not feel their heartbeat that they should not press the key. This can be seen as altering prior beliefs associated with confidence thresholds (i.e., higher confidence was required to choose to press the key). In the “breath-hold” (perturbation) condition, participants were again instructed not to guess, but were also asked to first empty their lungs of all air and then inhale as deeply as possible and hold it for as long as they could tolerate (up to the length of the one-minute trial). This manipulation attempted to putatively increase the strength of the afferent cardiac signal by increasing physiological arousal. Conceptually, therefore, one can view the no-guessing instruction as a baseline condition, where adding the breath-hold increases signal precision and adding the guessing instruction instead increases prior bias toward detecting heartbeats.
An identical “tone tapping” condition was used as a control, where participants were instructed to press the key every time they heard a 1000Hz auditory tone, each of which was presented for 100ms (78 tones, randomly jittered by +/- 10% and presented in a pattern following a sine curve with a frequency of 13 cycles/minute, mimicking the range of respiratory sinus arrhythmia during a normal breathing range of 13 breaths per minute). This was completed between the first (guessing) and second (no-guessing) heartbeat tapping conditions. As this tone could be treated as a maximally precise signal, any variations in precision estimates under the model are better explained as due to motor response noise. These “tone precision” estimates could therefore be used as a covariate in analyses to better isolate individual differences not attributable to motor stochasticity in the timing of key presses.
Directly after completing each task condition, individuals were asked the following using a visual analogue scale:
“How intensely did you feel your heartbeat?” or “How intensely did you hear the tone?”
“How accurate was your performance?”
“How difficult was the previous task?”
Each scale had anchors of “not at all” and “extremely” on the two ends. Numerical scores could range from 0 to 100.
Physiological measurements
The objective timing of participants’ heartbeats was verified throughout the task using cardiac signals from a lead II EKG and pulse waveform signals from a photoplethysmography (PPG) device attached to the ear lobe (via a BIOPAC MP150 acquisition unit). Note, however, that PPG was not attached during the heartbeat tapping task as people can sometimes cutaneously feel their heartbeat through the compression of a PPG clip (28). Response times were collected using a task implemented in PsychoPy, with data collection synchronized via a parallel port interface.
EKG and response data were scored using MATLAB code developed in-house. Each participant’s pulse transit time (PTT) was estimated as the median delay between R wave and the corresponding inflection in the PPG signal.
Computational model
A detailed description of the computational modeling approach can be found in our prior report (15). All steps described there were carried out identically in the current study. Briefly, each task time series was divided into intervals corresponding to the periods of time directly before and after each heartbeat. Potentially perceivable heartbeats were based on the timing of the peak of the EKG R-wave + 200 milliseconds (ms), based on previous estimates for the PTT (29). The length of each interval in the timeseries (i.e., before vs. after a heartbeat) was based on dividing the time period between every two successive heartbeats (+ 200 ms) in half and then treating the after-beat intervals as the time periods in which the systole (heart muscle contraction) signal was present and in which a key press should be chosen if it was felt. Before-beat intervals were treated as the diastole (heart muscle relaxation) period, in which tapping should not occur. Each interval was treated as a “trial” in which either a tap or no tap could be chosen and in which either a systole or diastole signal was present.
To model perception, we used a hidden Markov model with two time points per “trial”, corresponding, respectively, to a formal trial start state and the presentation of a systole or diastole signal. Table 2 describes each element of the model in detail (see our prior report for additional motivations and justification of model choices (15)). Heartbeat perception (state inference) depended on a bias in prior beliefs that a heartbeat would be felt (pHB) and the precision of the interoceptive sensory signal (IP).
As in our original study, we also assessed evidence for a learning model (see Table 2) through Bayesian model comparison with a perception-only model in which prior beliefs did not change over time. The learning model assumed prior beliefs, p(st+1|st), could update after each new observation, and that learning could happen at different rates for each individual based on an inverse scalar parameter bO (i.e., higher values indicate slower learning).
Thus, the final parameters estimated for each participant included the IP, pHB, and bO. The parameter values that maximized the likelihood of each participant’s responses were identified using variational Laplace (30), implemented within the spm_nlsi_Newton.m parameter estimation routine available within the freely available SPM12 software package (Wellcome Trust Centre for Neuroimaging, London, UK, http://www.fil.ion.ucl.ac.uk/spm). The same prior means (IP = .5, pHB = .5, and bO = 1) and variances (.5) for estimation were set as in our initial study.
After estimation, the “raw” IP parameter values (IPraw) were transformed to capture two distinct constructs of interest. As described in Table 2, because IPraw values both above and below .5 indicate higher precision (i.e., values below approaching 0 indicate reliable anticipatory key presses, whereas values approaching 1 indicate reliable key presses after each systole), our ultimate measure of precision was recalculated by centering IPraw on 0 and taking its absolute value as follows: This means that IP has a minimum value of 0 and a maximum value of 0.5. The IPraw values were then instead used to assess individual differences in the tendency to key press in an anticipatory or reactive (AvR) fashion: Higher AvR values (> 0.5) thus indicated a stronger tendency to reactively key press in response to a systole as opposed to key pressing in an anticipatory fashion (< 0.5).
Quality Control and Final Sample Sizes
Prior to performing our analyses, several participants were removed due to quality control checks. Specifically, 172 individuals were removed due to “cheating” (i.e., review of their video recording revealed that they took their pulse while performing the task), and 22 individuals were removed due to being outliers when performing the tone task (using Iterative Grubb’s with p < 0.01) – assumed to reflect inappropriate engagement during the task (e.g., being inattentive, tapping rapidly without listening to the tones, etc.). This resulted in 285 participants, including 59 HCs, 4 iANX, 26 iDEP, 88 iDEP+ANX, 84 iSUDs, and 24 iEDs. Due to the small sample size in the anxiety group, these 4 participants were excluded from disorder-specific analyses.
Statistical analysis
Primary replication analyses
Analyses replicated those carried out in our prior study. We first used JZS Bayes factor (BF) analyses (repeated-measures ANOVAs) with default prior scales in R (31) to compare evidence for models with transdiagnostic vs. diagnosis-specific effects. These models included group, condition, and their interaction, and either coded group as two levels (HCs vs. all patients) or 5 levels (HCs, iDEP, iDEP+ANX, iEDs, and iSUDs). This model comparison was done to assess whether the results were better interpreted as transdiagnostic or diagnosis-specific. Based on our previous study, we expected that BF analyses would support a model assuming a single transdiagnostic group. Linear mixed effects models (LMEs) were then conducted (using the lme4 package in R (32)) to confirm significant group differences (i.e., between HCs and either the transdiagnostic group or the four patient groups) for each parameter, and if they differ between conditions (i.e., guessing, no-guessing, breath-hold). To confirm the specificity of significant effects, we also ran models that accounted for individual differences in age, sex, body mass index (BMI), median pulse transit time (PTT), number of heartbeats (and its interaction with group and condition), and medication status (i.e., one analysis per model parameter). Due to missing values in PTT (n = 89) and BMI (n = 1), the group-level averages were imputed to maximize sample size while not influencing the effect of either predictor in the models. One additional participant was missing BMI data from the time of task completion but had data in the months following. A regression line was fit to the data points available and the best fit value at the time of task completion was imputed for this individual. As in the original study, we also included IP estimates for the auditory control condition as a covariate to account for variability in behavior attributable to motor stochasticity (e.g., differences in reaction times). Categorical variables (i.e., group, condition, sex, and medication status) were sum-coded in all models.
Secondary analyses
As in our prior study, for comparison to traditional interoception measures we performed identical analyses predicting a commonly-used counting accuracy measure of performance (33). These analyses were also performed for several self-report and physiological variables in each condition, including heart rate and self-reported confidence in performance, task difficulty, and perceived heartbeat intensity. We further tested for relationships between model parameters and these measures, as well as with age, sex, BMI, and PTT. Based on our prior results, we expected positive relationships in the no-guessing and breath-hold conditions between both model parameters and self-reported heartbeat intensity, and a positive relationship between pHB and self-reported confidence. In the no-guessing condition, we expected a negative correlation between pHB and self-reported difficulty. We also expected weak and strong positive relationships between counting accuracy and IP and pHB, respectively. Separate ANOVAs and chi-squared analyses also tested for significant differences between groups in other individual difference variables, including: age, sex, PHQ, OASIS, DAST, BMI, PTT, heartrate, number of key presses, and self-reported beliefs about felt heartbeat intensity, task difficulty, and confidence in task performance.
Correlations between model parameters were examined both within and between conditions. Based on our prior report, we expected to see weak positive correlations between both IP in the no-guessing and breath-hold conditions, weak positive correlations between pHB in the guessing and no-guessing conditions, and moderate positive correlations between pHB in the no-guessing and breath-hold conditions. We also expected to find weak positive correlations between IP and pHB in the no-guessing and breath-hold conditions, and weak negative relationships between AvR and both IP and pHB across all conditions.
We also ran correlational analyses with continuous scores on some of the clinical measures (PHQ, OASIS, DAST) gathered, excluding HCs, to assess whether model parameters provided additional information about symptom severity. Based on our prior exploratory results, we predicted the following relationships in the current dataset: In the no-guessing condition, IP would be negatively associated with both depression (PHQ) and anxiety (OASIS) severity, and positively associated with substance use severity (DAST); pHB would also be positively associated with substance use severity (DAST). In the breath-hold condition, pHB would be positively associated with substance use severity (DAST), and higher anxiety (OASIS) would be associated with a more anticipatory response pattern (AvR).
The previous dataset included common self-report measures of interoception, including the Multidimensional Assessment of Interoceptive Awareness (MAIA (34)) the Toronto Alexithymia Scale (TAS-20 (35)), and the Anxiety Sensitivity Index (ASI (36)), which were correlated with model parameters for some subscales in exploratory analyses (see Supplementary Figure S5 in our prior study). We also attempted to replicate these exploratory results.
Sample comparisons and combined sample analyses
We compared the initial exploratory sample with this confirmatory sample on demographic and clinical characteristics as well as model parameter values. In addition to the preregistered replication analyses above, we then combined these two samples (N = 719) to maximize power for examining potential differences between individuals within the smaller diagnostic categories. First, we tested whether model parameters could predict the presence of specific substance use or affective disorders relative to healthy comparisons using separate logistic regressions. Each regression included both parameters (separated by condition) as predictors of diagnostic status (coding 0 = healthy comparison and 1 = those with the specific disorder, excluding all other participants). We also performed these tests predicting disorders without comorbidities when the sample size allowed it. Following this, we performed analogous regressions within groups comparing those with a specific disorder (coded as 1) to those without it (coded as 0) excluding HCs. This allowed us to determine if the model parameters could differentiate disorders from one another. All logistic regressions for a given disorder were performed separately in those with vs. without substance use disorders (SUDs).
Finally, we sought to evaluate if a model including interoceptive precision parameters that was trained on the exploratory sample could successfully classify participants in the confirmatory sample by diagnostic status. To this end, we performed prediction classification analyses (logistic regressions) including the IP parameters in each of the three HB tapping conditions. We first tested whether HCs could be separated from all patients, irrespective of diagnosis. We then removed HCs and tested whether we could correctly classify specific diagnoses. Training and class prediction within logistic regressions was carried out using the statsmodels package in Python. Frequency balancing of the training dataset was performed using the skylearn package. These analyses allowed us to evaluate predictive accuracy based on area-under-the-curve (AUC) scores and confusion matrices in the test dataset.
Results
Participant characteristics
Complete information on sample size, demographics, and symptom screening measures is provided in Table 3. Initial t-tests indicated that the iED were younger than all other groups (p < .05) and that iDEP+ANX were younger than iDEP and iSUD (p < .042). Additionally, the iED had significantly lower BMI than all other groups (p < .032). No other groups differed significantly. A chi-squared analysis also showed significant differences in the proportion of males to females between groups (chi-squared = 18.00, df = 5, p = .003) specifically between HCs and iED (p = .006), HCs and iDEP+ANX (p = .007), and iDEP and iDEP+ANX (p = .046). While not included in the table below, some participants in our clinical groups were taking psychotropic medications at the time of data collection (25% iANX, 42% iDEP, 27% iDEP+ANX, 50% iED, and 49% iSUD). Therefore, in our analyses of model parameters, we also confirm our results after controlling for these other factors.
Sample Comparison
Bayesian t-tests comparing clinical and demographic variables between the previous and current samples (by group) are reported in Supplemental Table S1. There were no significant differences (BFs < 2.01) apart from BMI in the iEDs, which was higher in the confirmatory sample than the exploratory sample (BF = 3.87). Figure 1 below shows the diagnostic breakdowns of the current and previous samples.
Interoceptive Perturbation Validation
Task-related self-report variables by group are shown in Table 4 split by condition (Panels A-D). As expected, across all participants self-reported heartbeat intensity, confidence in task performance, and task difficulty differed significantly between the three heartbeat tapping conditions in separate LMEs (intensity: F(2,567) = 58.99, p < .001; confidence: F(2,568) = 10.40, p < .001; difficulty: F(2,568) = 17.49, p < .001), reflecting a greater perceived intensity of heartbeat sensations and greater confidence during the breath-hold perturbation than in the other 2 conditions, as well as lower difficulty in the breath-hold condition than in the no-guessing condition (and greater difficulty in the no-guessing than guessing condition; p < .001 for all post-hoc comparisons). An LME analysis of heart rate revealed a significant difference in the number of heartbeats between conditions (F(2,568) = 4.11, p = .017), reflecting a faster heart rate in the breath-hold condition than in the no-guessing condition (p = .005). This effect did not differ by group.
Bayesian Model Comparison
Similar to our prior report, when comparing models (based on (37, 38)), there was more evidence for the “perception only” model than for the model that included learning prior beliefs for the no-guessing and breath-hold conditions (protected exceedance probability = 1), whereas there was not clear evidence favoring a single model for the guessing condition (protected exceedance probabilities = 0.42 vs. 0.58, slightly favoring the learning model). The learning model was favored in the tone condition (protected exceedance probability = 1). No group differences were observed when comparing model fits between groups.
For consistency/comparability, we use the “perception only” model parameters to compare conditions in our analyses below (as in our previous study), as this model best explained heartbeat tapping behavior overall. The accuracy of this model – defined as the percentage of choices to key press that matched the highest probability action in the model (e.g., a key press occurring when the highest probability percept in the model was a heartbeat) – was 77% across all conditions; by condition, model accuracy was: guessing condition = 67% (SD = 12%); tone condition = 68% (SD = 12%); no-guessing condition = 87% (SD = 13%); breath-hold condition = 85% (SD = 12%).
Relationship between parameters
Parameter values for each group and condition are listed in Table 5 (for associated plots, see Supplementary Figure S1). Across conditions, all parameters were normally distributed (skew < 2). Figure 2 shows inter-correlations for each parameter across conditions. As can be seen there, no significant correlations between IP across task conditions were found. Correlations between pHB estimates across conditions were moderate to strong, most notably between the no-guessing and breath-hold conditions (which also includes the no-guessing instruction). The tendency to tap in an anticipatory vs. reactionary manner (AvR) showed a positive relationship between the guessing and breath-hold conditions.
Supplementary Figure S2 shows correlations between parameters across conditions. Most notably, significant correlations were present between IP in the no-guessing condition and both pHB in the no-guessing (r = .25, p < .001) and breath-hold conditions (r = .26, p < .001).
Parameter face validity
Figure 3 shows the correlations, including some significant relationships (p < .05, uncorrected), across all participants between model parameters in each condition and several task-relevant variables. IP showed positive relationships with self-reported heartbeat intensity ratings in the no-guessing and breath hold conditions. Additionally, pHB was lower in those self-reporting greater difficulty in the no-guessing condition, and higher in those self-reporting higher confidence and higher heartbeat intensity in the breath-hold and guessing conditions. For IP in all heartbeat conditions, and for pHB in the breath-hold condition, parameters also related to the traditional counting accuracy measure as expected (33). Unlike in the original sample, these results showed that pHB was (weakly) negatively correlated with counting accuracy in the guessing condition. The sex difference in IP in the breath hold condition reported in our previous report did not replicate in this sample (t(280) = -1.09, p = .276).
Group differences
Comparison to previous sample
Figure 4 compares the values for IP and pHB in each condition within the previous and current datasets, separately for HCs and the combined patient group. For an analogous plot of AvR, see Supplemental Figure S3. All parameters in the tone condition are shown in Supplemental Figure S4. As can be seen there, values for each parameter were comparable. Before performing pre-registered analyses below, we tested for differences between our previous and current samples. Bayes factor analyses provided evidence against the presence of differences between samples in all cases (IP: BFs between .09 and .33; pHB: BFs between .10 and .69). Separate LMs for each condition that included transdiagnostic group, sample, and a group by sample interaction also showed a strong group effect only in the breath-hold condition for IP (F(1,705) = 26.94, p < .001) and no other significant effects (F < 2.56, p > .110 in all cases).
Interoceptive Precision
In Bayesian analyses predicting IP across the three HB tapping conditions in the current dataset (including group, condition, and their interaction as possible predictors), the transdiagnostic model (HCs vs. all patients) was favored over the diagnosis-specific model (BF = 43.23). Thus, we focus on transdiagnostic LMEs below (which allowed inclusion of the four individuals with anxiety alone). Analogous LMEs for diagnosis-specific groups, and associated Bayes factor analyses, are reported in Supplementary Materials. See Figure 4 and Figure 5 for plots of IP values by group and condition.
The initial LME predicting IP (without covariates) revealed a marginal effect of condition (F(1,560) = 2.54, p = .080) and a marginal group by condition interaction (F(2,560) = 2.46, p = .086), but no main effect of group (F(1,280) = 2.03, p = .155). However, in the model accounting for planned covariates (age, sex, precision in the tone condition, BMI, median PTT, medication status, number of heartbeats, and interactions between number of heartbeats and group and condition separately), results showed a marginal effect of group (F(1,348) = 3.46, p = .064) and gained a significant group by condition interaction (F(2,569) = 3.11, p = .045). Post-hoc comparisons indicated that these results were explained by the following: 1) IP significantly increased in HCs in the breath-hold condition when compared to the two resting conditions (breath-hold: estimated marginal mean [EMM] = .058; guessing: EMM = .038; t(570) = 2.31, p = .021; no-guessing: EMM = .036, t(583) = 2.58, p = .010); and 2) IP during breath-hold was significantly greater in HCs than in the transdiagnostic patient sample (HCs: EMM = .058; patients: EMM = .040; t(806) = -2.56, p = .011). There were no significant differences between groups in the other conditions (p > .698 in both cases). This LME also revealed a significant negative associations with number of heartbeats (F(1,348) = 6.90, p = .009) and precision in the tone condition (F(1,274) = 5.72, p = .017). All other effects were nonsignificant (Fs between .04 and 2.81, ps between .095 and .838).
Prior expectations
In Bayesian analyses predicting pHB across the three HB tapping conditions in the current dataset (including group, condition, and their interaction as predictors), a transdiagnostic model (HCs vs. all patients) had greater evidence when compared to a diagnosis-specific model (BF > 100). We therefore focus on transdiagnostic LMEs below (which allowed inclusion of the four individuals with anxiety alone). Analogous LMEs for diagnosis-specific groups are reported in Supplementary Materials. See Figure 4 and Figure 6 for plots of pHB values by group and condition.
An initial LME predicting pHB (without covariates) revealed a main effect of condition (F(2,559) = 206.18, p < .001), but no main effect of group (F(1,281) = 0.33, p = .563) or a group by condition interaction (F(2,559) = 1.06, p = .349). Post-hoc comparisons indicated greater pHB in the guessing condition than in the other two conditions (guessing: EMM = .401; no-guessing: EMM = .176; t(519) = -17.44, p < .001; breath-hold: EMM = .182; t(530) = -16.88, p < .001). An analogous model including planned covariates confirmed the condition effect (F(2,596) = 20.74, p < .001) with the same pattern of results as previously mentioned. This model also revealed a marginally negative association with age (F(1,274) = 3.86, p = .050).There were no other significant effects (Fs between 0.08 and 3.37, ps between .067 and .772).
Anticipating vs. reacting
All results for AvR are provided in Supplementary Materials. As in our prior study, there was no clear evidence for group differences. For plots of AvR by group and condition, see Supplementary Figure S5.
Group by condition interactions in secondary measures
Supplementary Figure S6 plots the values for all secondary measures in each condition within the previous and current datasets, separately for HCs and the combined patient group. As with model parameters, values for each measure were comparable. Bayes factor analyses provided evidence against the presence of differences between samples in all cases (BFs < 3), with the exception of self-reported confidence in the breath-hold condition for the transdiagnostic patient group, which was lower in the confirmatory sample (BF > 100).
In the current dataset, a Bayes factor analysis predicting heartbeat counting accuracy (including group, condition, and their interaction as predictors) found greater evidence for a model assuming transdiagnostic vs. diagnosis-specific effects (BF > 100).
An initial LME including only the effects of group, condition, and their interaction found evidence for a main effect of condition (F(2,562) = 19.34, p < .001) and a marginal effect of group (F(1,280) = 3.21, p = .074) on traditional heartbeat counting accuracy. Post-hoc contrasts revealed that accuracy scores were higher in the guessing condition than the two other HB conditions (guessing: EMM = .649; no-guessing: EMM = .274; t(559) = -5.15, p < .001; breath-hold: EMM = .336, t(560) = -4.29, p < .001). When accounting for covariates, there were also effects of group (F(1,334) = 6.68, p = .010) and condition (F(2,638) = 6.09, p = .002), but no interaction (F(2,568) = 0.52, p = .597). Post-hoc contrasts indicated that, again, accuracy was greater in the guessing condition than the other two conditions (guessing: EMM = .681; no-guessing: EMM = .316; t(559) = -5.32, p < .001; breath-hold: EMM = .367; t(567) = -4.60, p < .001) and that HCs had significantly greater accuracy than the transdiagnostic patient group across conditions (healthy: EMM = .524; patient: EMM = .385, t(273) = -2.31, p = .022). There was also a positive association with number of heartbeats (F(1,334) = 10.01, p = .002), a negative association with PTT (F(1,280) = 67.97, p < .001) and age (F(1,273) = 5.20, p = .023), and an effect of medication status (F(1,274) = 4.07, p = .045) such that those taking medication had better counting accuracy than participants who were not (t(274) = -2.02, p = .045). There were also significant interactions between number of heartbeats and condition (F(2,644) = 8.39, p < .001) and number of heartbeats and group (F(1,336) = 4.62, p = .032). Further post-hoc comparisons indicated that counting accuracy improved as the number of heartbeats increased in the guessing and no-guessing conditions compared to the breath-hold condition (breath-hold: EMM = -.002; guessing: EMM = .0130, t(667) = -3.74, p < .001; no-guessing: EMM = .011, t(641) = -3.25, p = .004) and that accuracy was higher in the patient groups as number of heartbeats increased (while this was not true for HCs; healthy: EMM = .002; patient: EMM = .012, t(336) = 2.15, p = .032).
Results of analogous LMEs with diagnosis-specific groupings are reported in the Supplementary Materials. To assess potential group differences in the effect of task condition on self-reported experience and physiology, we also carried out analogous LMEs assessing confidence, intensity, and difficulty, as well as heart rate. These results are reported in Supplementary Materials. No group by condition interactions in these variables were observed mirroring our IP results.
Associations with symptom severity measures and interoceptive awareness scales
Given the heterogeneous nature of our clinical sample, we ran subsequent exploratory correlational analyses with continuous scores on some of the clinical measures gathered, excluding HCs, to assess whether model parameters might provide additional information about symptom severity in specific domains related to anxiety, depression, substance use, interoception, and emotional awareness. No significant relationships were found.
As in the previous paper, we examined correlations between model parameters and measures often thought to relate to interoception: the Multidimensional Assessment of Interoceptive Awareness (MAIA; (34)), the Toronto Alexithymia Scale (TAS-20; (39)), and the Anxiety Sensitivity Index (ASI; (36)). The results of these analyses are reported in Supplementary Materials.
Combined sample analyses
After completing the pre-registered analyses above, we then combined the previous and current samples, which provided increased power to assess the possibility of narrower diagnostic differences. A specific diagnostic breakdown of the those with SUDs, and those with affective disorders without SUDs, in each sample is shown in Figure 6. In this combined sample, we tested if both model parameters (jointly) in each condition HB condition could predict the presence of specific substance use or affective disorders relative to HCs.
Substance use disorders
Comparison between specific disorders in iSUDs is shown in Figure 7. In models comparing specific SUDs to HCs (as shown in Supplemental Table S2) where sample size allowed, IP in the breath-hold condition could differentiate each SUD from HCs – including stimulant use disorder without comorbidities (Wald z = -4.18 to -2.42, p < .001 to .028). Prior beliefs (pHB) in the no-guessing condition differentiated those with cannabis use, opioid use, and sedative use (Wald z = 2.01 to 2.48, p = .013 to .044) from HCs, while in the breath-hold condition they could differentiate those with stimulant use, opioid use, and alcohol use (Wald z = 2.00 to 2.41, p = .016 to .046) from HCs. Still within iSUDs, IP in the breath-hold condition was also a significant differentiator for all affective disorders from HCs except for PTSD (Wald z = -3.52 to -2.16, p < .001 to .030). In the no-guessing condition, pHB uniquely differentiated generalized anxiety disorder (GAD) from HCs (Wald z = 2.75, p = .006). In all significant cases, iSUDs with specific diagnoses had lower interoceptive precision and higher prior bias estimates than HCs. When comparing each specific disorder to others in the substance use group (Supplemental Table S3), model parameters were unable to differentiate any disorders, with the exception of pHB in the no-guessing condition, which was able to differentiate GAD from other disorders (Wald z = 2.21, p = .027) due to significantly higher prior bias estimates in those with GAD.
Affective disorders without SUDs
Comparison between specific disorders in individuals with affective disorders without SUDs (iDEP, iANX, and iDEP+ANX; iADs) is shown in Figure 8. As detailed in Supplemental Tables S4 and S5, we found that IP estimates in the breath-hold condition differentiated HCs from most diagnoses: MDD, GAD, panic, and PTSD (Wald z = -3.78 to -2.38, p < .001 to .017) where iADs had lower precision estimates than HCs. In the no-guessing condition, IP also differentiated those with panic disorder from HCs (Wald z = -2.02, p = .044), again, due to significantly lower estimates in those with panic disorder. When comparing affective disorders to each other (excluding HCs), we found that 1) IP in the no-guessing condition differentiated GAD from other disorders (Wald z = 2.31, p = .021), and 2) IP in the breath-hold condition differentiated social anxiety from other disorders (Wald z = 2.18, p = .029). In both cases, precision estimates were higher than other disorders (i.e., closer to HCs). Prior bias, pHB, did not differentiate any specific AD diagnoses from HCs.
Eating disorders
When comparing participants with specific EDs (anorexia nervosa vs. bulimia nervosa) to HCs, and to each other (Supplemental Tables S6 and S7), parameter estimates did not differentiate groups (while noting the small sample sizes in these specific groups).
Predictive Categorization
Because of the transdiagnostic focus of our previous results, we first tested the predictive categorization ability of a model (with IP parameters in all three HB conditions as predictors) that was trained on the exploratory sample to see how well it could classify HCs vs. all patients together (as shown in Figure 9). This model had a predictive accuracy of .65 with an AUC of .57, indicating low discriminability with IP. Results of testing predictive models on the classification of each patient group (excluding iANX) separately are also shown in Figure 9. Accuracies were between .51 and .62 indicating, again, low discriminability with IP. However, all AUCs remained above chance (.55 - .58).
Discussion
Replication of Primary Findings
In this report, we replicated the computational modeling results of our prior study (15) examining behavior on a heartbeat tapping task in a transdiagnostic psychiatric patient sample, including individuals with anxiety, depression, substance use, and/or eating disorders. Most centrally, we sought to replicate the finding that, across clinical groups, patients did not assign greater precision (IP) to interoceptive signals during an inspiratory breath-hold manipulation designed to increase the magnitude of these signals. This contrasted with healthy participants, who did show these increases – suggesting greater interoceptive awareness when signals change (also observed in a second study; (16)). As expected, the healthy and patient groups in this new sample showed this same pattern. Thus, the group differences found in our previous report successfully replicated.
Bayesian analyses provided evidence against specific clinical group differences and instead favored a model assuming the difference in IP was transdiagnostic. As also observed previously, while prior biases (pHB) were sensitive to task instructions, there was no evidence of differences in these biases (or effects of instructions) between healthy comparisons and the transdiagnostic patient sample. This further confirmed that the observed interoceptive abnormality was due to blunted signal precision estimates and not biased (or rigid) prior beliefs.
Replication of Secondary Findings
As further validation of the model parameters, IP estimates again showed positive relationships with self-report ratings of heartbeat intensity, while pHB was lower in those who reported greater task difficulty. Prior bias estimates also showed strong associations with counting accuracy, consistent with previous concerns that the latter metric reflects beliefs about heart rate as opposed to detection accuracy (9, 14, 41). Notably, unlike in our previous report, we did also observe greater counting accuracy in HCs within the confirmatory sample.
When combining samples, which maximized the power to detect potential differences between narrower diagnostic categories, we also found a few notable results. Namely, logistic regressions within iSUDs found that IP in the breath-hold condition differentiated HCs from each substance use and affective disorder (with the exception of PTSD); but did not differentiate between disorders. This suggested the same transdiagnostic interpretation of IP results described above. Unlike the broader results above, these regressions found that pHB in the no-guessing condition also differentiated HCs from some disorders in iSUDs: cannabis use disorder, opioid use disorder, sedative use disorder, and GAD (i.e., greater values in each disorder than HCs). These findings may be consistent with previous models of drug craving in iSUDs that suggest over-weighting of prior expectations regarding physiological states and subsequent down-weighting of interoceptive signals (42, 43). These pHB estimates also distinguished iSUDs with GAD from those with other disorders (higher values in GAD), suggesting some potential diagnostic specificity not detected previously.
In affective disorders without SUDs (iADs), IP – again, in the breath-hold condition – differentiated each affective disorder from HCs except social anxiety; it also differentiated those with social anxiety from all other affective disorders. As these findings highlight social anxiety in particular, future research should test if precision-weighting in social anxiety is reliably comparable to HCs. Unlike in iSUDs, pHB did not differentiate any disorders from HCs or from each other.
Finally, after training a model on the exploratory sample and testing its predictive accuracy when classifying participants by diagnostic status in the confirmatory sample, we found above chance, but generally low discriminatory power, with the best predictive accuracy in differentiating patients transdiagnostically. Thus, while potentially insightful regarding mechanistic differences in subpopulations within HCs and patient groups, these findings may be of limited predictive clinical utility.
Comparative Insights and Clinical Implications of Replicated Findings
The present replication findings align with a recent report (44) that examined cardiac interoception (via heartbeat counting and heartbeat detection tasks) in individuals with various psychiatric conditions including affective disorders, personality disorders, and psychosis spectrum disorders. Both studies affirm the transdiagnostic importance of cardiac interoception to psychopathology and find abnormalities in interoceptive processing when compared to healthy or non-clinical participants. Additionally, both studies highlight the impact of these interoceptive deficits on affective symptoms like anxiety and depression. However, several unique aspects distinguish the current study. First, we utilized an interoceptive perturbation technique, involving breath-holding, to reveal specific abnormalities in interoceptive processing. Second, the Bayesian computational model allowed for a nuanced understanding of deficits in the precision-weighting of afferent cardiac signals. Third, our study benefits from a larger overall sample size. Lastly, the pre-registered approach confirms and extends our original findings, suggesting that these deficits are likely to be consistently observed across varying psychiatric samples, such as those with mood/anxiety, substance use, and eating disorders. Collectively, these studies contribute to a growing body of evidence supporting the role of abnormal interoceptive processing in psychiatric disorders, as also demonstrated by recent studies in generalized anxiety disorder (45, 46) and eating disorders (47–49). Given these converging findings, there is increased impetus for the translation of this knowledge into clinical practice. Future work might explore the utility of Bayesian modeling as a tool for longitudinally assessing interoceptive dysfunctions across various psychiatric disorders. Furthermore, interoceptive training interventions could be developed to target precision-weighting of cardiac signals, potentially offering a novel therapeutic approach for enhancing visceral regulation in affected individuals (see (50–54) for examples of some candidate approaches impacting the cardiac domain). These interventions may be especially pertinent for patient populations demonstrating significant interoceptive deficits, such as those with anxiety, substance use, and eating disorders, and could provide an initial framework for developing targeted therapeutic strategies.
Strengths, Limitations, and Conclusion
This study had most of the same strengths and limitations as our prior report. Strengths included the large sample size, the novel computational modeling approach, and the use of specific task conditions to selectively probe changes in distinct computational mechanisms and address concerns associated with other interoception measurement approaches. The task itself is also less time-consuming for participants than other common tasks, with each task condition lasting only 60 seconds. However, it is also possible that longer testing durations (e.g., a greater number of 1-minute trials per condition) could improve parameter estimates during perceptual modelling. Another limitation is that, because our modelling approach required distinct heartbeat states for comparison to behavior, we had to make choices about perceptual windows based on EKG signals. As mentioned in our prior report, other choices for defining these windows are possible.
For replication purposes, we used the same model as in our prior report; however, additional free parameters and model structures could be considered in future work. For example, alternative models (e.g., the Hierarchical Gaussian Filter; HGF) that also capture internal perceptual uncertainty could be considered (55). Lastly, the small sample sizes of the anxiety and eating disorder groups limited their involvement in group-comparison analyses and the confidence with which the results can be interpreted. Future work could focus on these anxiety and eating disorders to test if the transdiagnostic effects suggested here generalize to these groups.
Conclusion
In summary, this study replicated several effects found in a prior study comparing healthy individuals to a transdiagnostic patient sample using computational modeling of behavior on a heartbeat perception task. These results confirm a transdiagnostic failure to adapt beliefs about the precision of interoceptive signals, which could represent a vulnerability and/or maintenance factor for psychopathology broadly. Future research should investigate underlying neural and peripheral physiological mechanisms and evaluate whether this newly identified difference may be effectively targeted by current or novel clinical interventions.
Data Availability
All data produced in the present study are available upon reasonable request to the authors.
Software Note
All model simulations were implemented using standard routines (spm_MDP_VB_X.m) that are available as Matlab code in the latest version of SPM academic software: http://www.fil.ion.ucl.ac.uk/spm/. The specific code used for our model can be found in supplementary materials.
Funding
Primary funding for project design, data collection, and initial analyses was provided by the Laureate Institute for Brain Research. Effort for specific authors was also supported by the National Institute of Mental Health, K23MH112949 (SSK), the National Institute on Drug Abuse, R01DA050677 (JLS) and the National Institute of General Medical Sciences Center Grant, P20GM121312 (RS, MPP, SSK, JLS).
Conflict of Interest
None of the authors have any conflicts of interest to disclose.
Supplementary Files
Supplementary materials. This file includes results of supplementary analyses as well as supplementary figures, as referred to in the main text.
Supplementary code. This file contains the MATLAB code used for modelling task behavior.
Study data. This file includes all data used in the analyses reported in the manuscript.
Supplemental Materials
Supplementary Figures
Supplementary Analyses
Interoceptive Precision
In an initial LME predicting IP using diagnosis-specific groupings, now excluding the 4 iANX, there were no significant group or condition effects (Fs < 1.11, ps > .331). Accounting for age, sex, precision in the tone condition, BMI, median PTT, medication status, number of heartbeats, and interactions between number of heartbeats and group and condition separately, there were negative associations with tone precision (F(1,265) = 5.20, p = .023) and number of heartbeats (F(1,367) = 4.66, p = .031).
In transdiagnostic JZS Bayes Factor comparisons including all models including group, condition, and/or ID predicting IP, the winning model contained only group (where HCs had higher values) but with limited evidence relative to a model that only contained ID (BF = 2.40). When including covariates, the winning model included only tone precision and ID (where tone precision was positively associated with IP estimates), but the data provided less evidence for this model than a null model containing only ID (BF = .62). Analogous comparisons separating participants by specific diagnoses but removing covariates found that the winning model contained condition only and was not favored over the model with only ID (BF = .25). Finally, including covariates led to a winning model identical to that for the transdiagnostic grouping; the best model contained age, tone precision, and ID, yet the ID-only model was still favored (BF = .36).
Prior Expectations
In an LME with diagnosis-specific groups without covariates, there was a significant effect of condition (F(2,549) = 220.14, p < .001), but not group (F(4,277) = 0.93, p = .445), or a group by condition interaction (F(8,548) = 0.90, p = .519). Accounting for differences in age, sex, precision in the tone condition, BMI, median PTT, medication status, number of heartbeats, and interactions between number of heartbeats and group and condition separately confirmed the condition effect (F(2,582) = 19.85, p < .001) and revealed a marginal negative association with age (F(1,264) = 3.42, p = .065) and number of heartbeats (F(1,437) = 3.85, p = .050). In both models, post-hoc comparisons revealed that the condition effect was, again, driven by significantly greater pHB values in the guessing condition than in the other two conditions (ps < .001).
Bayes Factor comparisons of transdiagnostic models predicting pHB that contained combinations of group, condition, and/or ID found that the winning model contained condition (where estimates were higher in the guessing condition than the other two conditions) and ID and was preferred over a model containing only ID (BF > 100). When including covariates, the winning model contained condition (indicating that, again, estimates were higher in the guessing condition than the other two conditions) and ID as well as a group by heart rate interaction (reflecting the fact that pHB estimates were negatively associated with heart rate in patients but not in HCs) and was favored over the model containing only ID (BF > 100). Subsequently separating participants into diagnosis-specific groupings and removing covariates found that the best model again contained condition (with higher estimates in the guessing condition) and ID – as with the transdiagnostic grouping – and evidence strongly favored this model over the model containing only ID (BF > 100). After including covariates, evidence best supported a model that included condition (where estimates were higher in the guessing condition), number of heartbeats (where pHB was negative associated with heart rate), and ID and was favored over the model that included only ID (BF > 100).
Anticipating vs Reacting
In Bayesian analyses predicting AvR across the three HB tapping conditions (including group, condition, and their interaction), the models that grouped participants transdiagnostically and by diagnostic groups were given equivalent evidence (BF = 1.29). Thus, for simplicity, we focus first on the transdiagnostic LMEs as with the other parameters, but include diagnosis-specific models for completeness.
An initial LME that grouped participants transdiagnostically (and included iANX), there were no significant effects of group, condition, or group by condition interaction (Fs < 1.53, ps > .218 for each). Including covariates, there were, again, no significant effects of any predictor (Fs < 1.79, ps > .182 in all cases). Subsequently, in an LME including diagnosis-specific groupings and no covariates, there were no significant effects (Fs < 0.82, ps > .586). However, an analogous LME including covariates found an effect of group (F(4,373) = 2.54, p = .039) and a group by heart rate interaction (F(4,378) = 2.55, p = .039), though post-hoc contrasts indicated no significant differences between groups (ps > .336) and only a marginal difference between iDEP+ANX and iSUDs by heart rate (t(417) = 2.71, p = .054) such that iSUDs reacted more as heart rate increased while iDEP+ANX reacted less.
Finally, in Bayes Factor comparisons of models including combinations of group, condition, and ID predicting AvR, we found that the winning model in the transdiagnostic grouping included an effect of group (where HCs had higher estimates than patients) and was favored over the model containing just ID (BF = 2.16). After including covariates, the most evidence was afforded to a model containing only sex, but the model containing ID alone was favored (BF = .27). An analogous comparison of models without covariates while grouping participants by diagnosis found the best evidence for a model including only condition but this model was not favored over a model containing only ID (BF = .266). Including covariates led to a winning model containing only age but was not favored over the model containing ID alone (BF = .22).
Other task measures
After performing LMEs predicting the traditional counting accuracy metric with the transdiagnostic grouping, we then separated the patient groups by diagnosis. In an initial LME without covariates, there was a main effect of condition (F(2,550) = 13.72, p < .001), such that accuracy was greater in the guessing condition (EMM = .621) than the two other conditions (no-guessing: EMM = .282, t(545) = -4.81, p < .001; breath-hold: EMM = .325, t(552) = -4.22, p < .001). When including covariates, there was a main effect of group (F(4,352) = 11.21, p < .001) and condition (F(2,621) = 6.24, p = .002), but no group by condition interaction (F(8,550) = 0.95, p = .473). Post-hoc comparisons found that iDEP+ANX (EMM = .328) were significantly different from HCs (EMM = .505, t(263) = 2.65, p = .009). No other group differences were present. There were also significant differences between the guessing condition (EMM = .629) and the two other conditions such that accuracy was highest in the guessing condition (no-guessing: EMM = .300, t(544) = 5.08, p < .001; breath-hold: EMM = .342, t(551) = 4.46, p < .001). There were additional negative associations with PTT (F(1,270) = 69.77, p < .001) and age (F(1,263) =, p = .032), as well as interactions between number of heartbeats and condition (F(2,625) = 8.29, p < .001) and number of heartbeats and group (F(4,356) = 10.22, p < .001). Post-hoc comparisons showed that this interaction was driven by a negative relationship between number of heartbeats in the breath-hold condition and counting accuracy (breath-hold: ET = - .004) and positive relationships in the guessing (ET = .01, t(648) = 3.63, p < .001) and no-guessing conditions (ET = .008, t(622) = 3.35, p = .003). However, when separating the patient groups, these comparisons indicated that the number of heartbeats by group interaction was driven by significant differences in iDEP+ANX (EMM = .023) and iDEP (EMM = -.004, t(385) = -4.02, p < .001), iSUD (EMM = .007, t(388) = 4.61, p < .001), iED (EMM = -.006, t(354) = 3.78, p = .002), and HC (EMM = .002, t(352) = 4.23, p < .001).
In analogous LMEs, we investigated potential group differences in the self-report measures of heartbeat intensity, confidence, and difficulty, as well as heartrate. These models were first examined only including condition, group, and their interaction as predictors and then were performed again including the same covariates as in our previous analyses. These models, as before, excluded the tone condition.
Self-reported heartbeat intensity
Bayesian analyses strongly favored the transdiagnostic model over the diagnosis-specific model (BF > 100). Grouping participants transdiagnostically (HCs vs. all patients, including iANX), an initial LME predicting self-reported intensity indicated significant effects of condition (F(2,565) = 45.04, p < .001). Post-hoc contrasts revealed that each condition was different from the others (p < .03 in all cases) such that intensity ratings were highest in the breath-hold condition and lowest in the no-guessing condition. A subsequent LME containing covariates again revealed a significant effect of condition (F(2,601) = 9.13, p < .001), as well as a positive effect of number of heartbeats (F(1,488) = 7.09, p = .008) and a marginal effect of sex(F(1,276) = 3.58, p = .060). Post-hoc contrasts confirmed that intensity was greater in the breath-hold condition (EMM = 41.1) than in the guessing condition (EMM = 28.8, t(583) = 6.51, p < .001) which was greater than intensity ratings in the no-guessing condition (EMM = 25.0; t(566) = 2.04, p = .042) and that male participants gave marginally higher intensity ratings (EMM = 34.1) than female participants (EMM = 29.1, t(276) = 1.89, p = .060).Separating groups by diagnosis (removing the 4 iANX), we found further evidence for a main effect of condition (F(2,552) = 39.28, p < .001) such that intensity ratings in the breath-hold condition (EMM = 38.2) were significantly higher than those in the other two conditions (guessing: EMM = 25.2; t(552) = 7.32, p < .001; no-guessing: EMM = 23.9, t(552) = 8.01, p < .001). When including covariates, there was an effect of condition (F(2,585) = 7.29, p < .001) and a marginal positive association with number of heartbeats (F(1,577) = 3.75, p = .053). Post-hoc contrasts revealed that these results were driven by higher intensity ratings in the breath-hold condition than the other two conditions (p < .001 in both cases).
Self-reported confidence
Bayes factor analyses found greater evidence for a model that grouped participants transdiagnostically than by separate diagnosis-specific groups (BF > 100). Grouping participants transdiagnostically, we found a significant effect of condition (F(2,566) = 8.84, p < .001), such that confidence ratings in the breath-hold condition (EMM = 45.1) were significantly higher than the other two HB conditions (guessing: EMM = 37.0, t(566) = 3.78, p < .001; no-guessing: EMM = 37.6; t(566) = 3.49, p < .001). In a subsequent LME that included covariates, there was a marginal effect of condition (F(2,612) = 2.85, p = .059) and an effect of number of heartbeats (F(1,440) = 5.88, p = .016). Confidence ratings were greater in the breath-hold condition (EMM = 45.2) than the other two conditions (guessing: EMM = 37.7; t(581) = 3.45, p < .001; no-guessing: EMM = 38.6; t(598) = 3.03, p = .003) and were associated with a greater number of heartbeats. When separating the patient groups by diagnosis and removing iANX, there were main effects of condition (F(2,552) = 5.23, p = .006) and group (F(4,276) = 3.36, p = .010). Post-hoc contrasts revealed that, again, ratings were higher in the breath-hold condition than the others, that iDEP (EMM = 32.8) and iED (EMM = 28.6) were less confident than iSUD (EMM = 42.1) and HCs (EMM = 41.7), and that iED were less confident that iDEP+ANX (EMM = 39.0; p < .05 in all cases). When including covariates, there were no significant effects (Fs < 2.08, ps > .126).
Self-reported difficulty
In Bayesian analyses, the model including the transdiagnostic grouping was favored over the model that separated the patient groups (BF > 100). Grouping participants transdiagnostically without covariates, condition had a main effect on self-reported difficulty (F(2,566) = 14.39, p < .001), such that ratings were higher in the no-guessing condition (EMM = 65.3) than the other two HB conditions (guessing: EMM = 57.4, t(566) = 3.57; breath-hold: EMM = 53.6; t(566) = 5.25, p < .001 for both). When including covariates, there was only a marginal effect of condition (F(2,605) = 2.64, p = .072) such that the no-guessing condition was rated as more difficult than the other two conditions (ps < .001). When separating the groups by diagnosis, there was again an effect of condition (F(2,552) = 10.33, p < .001), with higher ratings in the no-guessing condition (EMM = 64.2; guessing: EMM = 56.9, t(552) = 3.53; breath-hold: EMM = 55.4; t(552) = 4.25, p < .001 in both cases), but no significant predictors were found after adding in covariates.
Heart rate
Bayes factor analyses favored the transdiagnostic model over one that separated patients by diagnosis (BF > 100). First, in an LME without covariates and grouping participants transdiagnostically, there was a main effect of condition (F(2,566) = 4.80, p = .008) such that heart rate was significantly higher in the breath-hold condition (EMM = 69.6) than the no-guessing condition (EMM = 66.6, t(566) = 3.09, p = .002). When including covariates (except number of heartbeats), there was again an effect of condition (F(2,566) = 4.80, p = .009) as well as a negative effect of PTT (F(1,277) = 5.93, p = .016). Post-hoc contrasts confirmed that heart rate was higher in the breath-hold condition than the no-guessing condition (p = .002). A similar pattern of results emerged when separating the patient groups by diagnosis. In a simple LME, there was an effect of condition (F(2,552) = 5.46, p = .005) with higher heart rates in the breath-hold condition (EMM = 69.5) than the no-guessing condition (EMM = 66.5, t(552) = 3.30, p = .001). After including covariates, condition (F(2,552) = 5.46, p = .005) and PTT (F(1,270) = 6.19, p = .013) predicted heart rate. Post-hoc contrasts revealed significantly greater heart rates in the breath-hold condition (EMM = 69.4) than the no-guessing condition (EMM = 66.4, t(552) = 3.30, p = .001) and PTT was again negatively associated with heart rate.
Associations with interoceptive awareness scales
Sensory precision (IP) in the no-guessing condition was positively associated with two MAIA subscales: Attention Regulation (r = .18, p = .01) and Trust (r = .13, p = .05). No relationships were found for pHB and any self-report measures; however, multiple relationships were found between AvR values and various interoceptive awareness scales. In the no-guessing condition, AvR values were negatively associated with Self-Regulation (r = -.15, p = .03). In the guessing condition, AvR values were negatively associated with MAIA Emotional Awareness (r = -.18, p = .01), Body-Listening (r = -.14, p = .04), and Not-Distracting (r = -.23, p < .001). Finally, AvR in the breath-hold condition was positively associated with MAIA Not-Worrying (r = .15, p = .02). However, none of these findings replicate the results from our prior study.
Sample comparisons and combined sample analyses
As reported in Table S1 below, there was no evidence for differences in any measure between samples within each clinical group, with the exception of BMI, which was greater in iEDs in the confirmatory sample.
Supplementary Tables
Acknowledgements
Other contributors to the Tulsa 1000 project include the following: Robin L. Aupperle, Justin S. Feinstein, Jonathan B. Savitz, and Teresa A. Victor. Teresa A. Victor is the lead author for this group and can be reached at TVictor{at}laureateinstitute.org.