Abstract
Chronic pain conditions present in various forms, yet all feature symptomatic impairments in physical, mental, and social domains. Rather than assessing symptoms as manifestations of illness, we used them to develop a chronic pain classification system. A cohort of real-world treatment-seeking patients completed a multidimensional patient-reported registry as part of a routine initial evaluation in a multidisciplinary academic pain clinic. We applied hierarchical clustering on a training subset of 11448 patients using nine pain-agnostic symptoms. We then validated a three-cluster solution reflecting a graded scale of severity across all symptoms and eight independent pain-specific measures in additional subsets of 3817 and 1273 patients. Negative affect-related factors were key determinants of cluster assignment. The smallest subset included follow-up assessments that were predicted based on baseline cluster assignment. Findings provide a cost-effective classification system that promises to improve clinical care and alleviate suffering by providing putative markers for personalized diagnosis and prognosis.
Introduction
Chronic pain is a global epidemic reflecting a health care crisis for the person suffering from it, their family, and society as a whole (1–3). More than 100 million individuals are impacted by various chronic pain conditions in the US alone, with medical expenses and lost productivity costing more than $635 billion annually and projected to become much worse (4–6). Primary chronic pain conditions present in various shapes and forms, commonly classified by anatomical location of experienced pain, from low-back pain and headaches to pelvic or bladder pain, including wide spread non-specific or overlapping pain (7). However, common to all conditions is a global functional impairment that is manifested in the experience of multiple physical, “mental”, and social health symptoms, reflective of the biopsychosocial model of shared etiological factors across chronic pain conditions (7–10). While various studies aimed to uncover and classify sub-groups of chronic pain (11–19), little is known whether a combination of domain-general symptoms agnostic to pain can be used to classify one’s chronic pain condition, and subsequently serve as potential markers for clinical diagnosis and prognosis (20, 21). A symptom-based approach may also reveal potentially modifiable factors as targets for therapeutic interventions. Therefore, we suggest a reversal of the paradigm – instead of assessing patient-reported symptoms as features of the a-priori determined pain condition, we examined whether such symptoms may serve to classify current and predict future pain condition. If confirmed, our approach could be used to support personalized and efficient treatment of individuals with chronic pain.
We implemented unsupervised machine learning, specifically agglomerative hierarchical clustering analysis (22–24), on multidimensional patient-reported symptoms that assess physical, mental, and social health status factors, to identify idiosyncratic groups, or clusters of patients with chronic pain. Patients reflected a real-world clinical population with a heterogeneous mix of pain conditions seeking treatment at a tertiary academic pain clinic. As part of their routine initial evaluation, they completed multidimensional patient-reported assessments using Stanford’s CHOIR registry-based learning health system (Figure 1A) (25, 26). We used nine symptoms for clustering based on the National Institute of Health’s (NIH) Patient-Reported Outcomes Measurement Information System (PROMIS), which was designed and validated for precise and efficient measurement of health-related symptoms in patients with a wide variety of chronic health conditions (27). These symptoms were agnostic to nine pain-specific measures that we subsequently used to validate the diagnostic-like nature of the data-driven clusters independently.
Mechanistically, we aimed to uncover whether the multivariate pattern of symptoms and pain-specific measures characterizing each identified cluster reflects a general graded scale of severity or a differential pattern. Furthermore, given the centrality and comorbidity of mental health related factors with chronic pain, predominantly negative affect-related symptoms such as anxiety, depression, and anger (9, 28, 29), we expected these symptoms to be key drivers for the determination of cluster assignment, thus highlighting them as targets for treatment. We based cluster discovery on a training dataset of 11448 patients and subsequently validated it in two additional datasets of 3817 and 1273 patients. The later dataset included follow-up assessments allowing us to examine if cluster assignment at baseline would be predictive of pain-related measures at follow-up, thus providing potential prognostic-like validation of the identified clusters. Finally, we examined the dynamics across assigned clusters between time-points.
Materials and Methods
General data acquisition procedures and dataset definition
Data were collected using Stanford University’s CHOIR (http://choir.stanford.edu), a registry-based, learning health care system that administers an electronic survey assessing self-reported demographic information, medical history, and multiple domains of health status in real-world clinical settings (Figure 1A) (30). Patients presenting for consultation at Stanford Pain Management Center locations throughout the San Francisco Bay Area and broader Norther California region, USA, with the main site located in Redwood City, complete the survey as part of their routine clinical care. While intended for completion at home using personal computers or hand-held devices, patients may complete the survey before their appointment at clinic check-in using a tablet computer. Survey completion is encouraged, yet optional, and based on patients’ willingness and ability to collaborate. Patients may therefore choose not to respond to certain items or assessments. These procedures were approved by the Stanford University School of Medicine Institutional Review Board (IRB). Informed consent was waived by the IRB, as CHOIR data were collected for clinical care and quality improvement purposes.
Data analyzed were from a retrospective review of all collected surveys since CHOIR’s inception in October 2013 through August 2019. Our initial data extraction included 24389 records, from which we removed records based on the following criteria: non-completed or test records (6002), missing data in any of the nine measures used for clustering (as detailed below; 1651), duplicated records (136), and age below 18 years (62). From the resulting 16538 surveys belonging to 16538 different patients, we extracted a longitudinal dataset of 1273 patients with a follow-up survey between 3-12 months later, and again with a minimal requirement of having complete data for the nine assessments used for clustering at both time points. We chose this time frame since 3 months is considered the minimal threshold for diagnosing primary chronic pain (7). The upper threshold of 12 months allowed to keep a substantially large proportion of the dataset for cluster discovery validation. The resulting 15265 patients were randomly split based on a 75%:25% allocation into a training dataset of 11448 patients used for cluster discovery and an additional validation dataset of 3817 patients.
Measures
Demographic characteristics
Demographic characteristics included age, sex, ethnicity, race, marital status, and years of education.
Clustering symptoms
The nine symptoms assessing health-related functionality and used as the basis for the clustering procedures were from the National Institute of Health’s (NIH’s) Patient-Reported Outcomes Measurement Information System (PROMIS) (27, 31–35). We divided these nine symptoms into three domains: (1) the physical domain (fatigue, sleep disturbance, and sleep impairment), (2) the mental or negative affect domain (depression, anxiety, and anger), and (3) the social domain (social isolation, emotional support, and satisfaction with social roles and activities). Response items are contextualized to the frequency of the experienced symptom in the past seven days (e.g., “in the past seven days how often did you feel tired?”, “in the past seven days I felt worthless”), and responses were marked on a 1-5 scale (1 = never, 5 = always). Each assessment was completed using computerized adaptive testing (CAT), based on item response theory-derived metrics. CAT reduces the time needed to complete each assessment because patients respond only to a subset of items for each PROMIS item bank, until resulting measurements meet preset criteria of below three standard errors (27, 36). Most patients complete 4-5 items per assessment, and take about 15 minutes to complete these nine symptoms.
Ultimately, a standardized T score for each PROMIS symptom is generated for each patient. A score of 50 reflects the mean of the US general population, with a standard deviation (SD) of 10. Higher scores reflect more of the measures’ symptom. We further extracted data of a PROMIS-based global health measure, specifically the Global Health Mental subscale that consisted of four items assessing general mental health, quality of life, satisfaction with social activities, and emotional problems (37). While for most measures such as fatigue or depression, higher T scores indicated a worse condition, for emotional support, satisfaction with social roles, global health mental, and physical function (see below), higher T scores reflected a better condition. Further details regarding measure development and validation are available at http://www.healthmeasures.net.
Pain-specific measures
Pain-specific measures were used independently of the clustering process to validate the diagnostic-like nature of the data-driven generated clusters in terms of pain-related constructs. A composite score of pain intensity was calculated by averaging three self-reported pain intensity measures. These measures used a common and validated (38) 11-point numeric rating scale of 0-10 (0=no pain, 10=pain as bad as you can imagine) for worst and average pain in the last seven days, and current pain. The number of body segments in which chronic pain is experienced were self-reported by patients, who were asked to mark locations of pain on a reliable and valid CHOIR body map scheme that included 36 anterior and 38 posterior symmetrical body segments for a maximum total of 74 segments (Figure 1A) (39). This measure was used to reflect the extent of pain throughout the body. A group of physicians recoded these 74 body segments into 11 body regions (Table S1, Figure S1) subsequently used to examine specific locations in which patients experienced pain. Pain duration was calculated as the number of months from onset of chronic pain that was self-reported by patients.
Additional measures assessed using PROMIS instrumentation included pain interference with daily life activities, pain behavior, and physical function (27, 33). In September 2016 and moving forward, physical function was assessed using two separate measures in CHOIR, reflecting physical function of the upper extremity, and lower mobility (40). Across the entire dataset, most patients had the two separate measures (61.32%). We analyzed each of the three physical function measures separately to be able to differentiate between them.
The last pain-specific measure was pain catastrophizing, reflecting maladaptive cognitions such as rumination, magnification, and helplessness, in response to actual or anticipated pain. Pain catastrophizing has been associated with poor outcomes, maintenance, and worsening of chronic pain illness (41–43). We used the Pain Catastrophizing Scale (PCS) which previously demonstrated sound psychometric properties (44–46) to measures the frequency with which a patient engages in catastrophic thought patterns, and consists of 13 self-reported items on a 0-4 scale (0=not at all, 10=all the time).
Statistical analysis
Programming and analyses were conducted using a combination of R version 4.0.0 (47), RStudio version 1.2.5042 (48), and IBM® SPSS® version 26. Relevant open source R codes are available at TBA.
Cluster discovery
Hierarchical clustering is a well-established unsupervised machine learning technique that aims to discover groups or clusters of observations within a dataset without needing to a-priori determine the specific characteristics of each cluster (22–24). Observations within the same cluster are expected to have similar characteristics, while different clusters are expected to have dissimilar characteristics. A cluster-tree diagram or dendrogram is a mathematical and pictorial representation of a cluster solution.
We implemented an AHCA on the training dataset and using the nine clustering symptoms, to assign each patient to a cluster. AHCA implements an iterative process in which the two most similar observations (i.e., patients, or groups of patients) are fused to form a superordinate cluster until all observations belong to one single cluster. Two parameters important for this process are a distance metric that determines how similar observations are to each other, and a linkage method to fuse similar observations. The agglomerative coefficient can then assess how tightly packed each cluster is within a cluster solution. We used the Euclidian distance metric combined with the Ward linkage method as it optimized the agglomerative coefficient compared to four other linkage methods (Table S5).
We subsequently used the gap statistic to determine the optimal number of clusters, k (50). The gap statistic compares the within-cluster sum of squares of a certain k-clusters solution to the expected within-cluster sum of squares under a null distribution with no clusters. An ideal solution will have a small within cluster sum of squares, and therefore a large gap statistic. We calculated the gap statistic for k between 1 and 10. The smallest value of k that is within one standard deviation of the value of k that maximizes the gap statistic should be chosen as the optimal number of clusters.
We next aimed to determine the relative importance of each clustering symptom to the clustering process, i.e. to the separability between clusters. We computed the cluster centroid (22, 24), which is the average value of each clustering symptom for all of the observations in that cluster, and then calculated the total Euclidian distance between all cluster centroids. The average amount each clustering symptom contributed to the distance between each clusters’ centroid, divided by the total sum of all clustering symptoms’ contribution to the total Euclidean distance between all cluster centroids, provides a percent contribution to the overall separability between clusters.
Cluster characterization, reliability, and validity
Univariate analysis of variance (ANOVA) and subsequent t-tests were used to examine differences in clustering symptoms and in pain-specific measures between the identified clusters, with Bonferroni correction applied to account for multiple comparisons. Chi2 tests were used to examine the differential distribution of demographic factors and of body regions between the clusters and determine whether specific body regions were associated with any of the identified clusters. This sequence of tests was conducted initially on the training dataset to assess the clusters’ diagnostic-like potential, and subsequently on the validation and the baseline of the longitudinal datasets, to assess the reliability of cluster assignment and validate the cluster’s characteristics in other sets of patients. A nearest centroid classifier (22, 24) was generated to assign or label a cluster to a “new” patient, based on the shortest Euclidian distance between the values of the clustering symptoms of that patient and each clusters’ centroids.
Predictive validation and cluster dynamics over time
Univariate analyses as described above were used to examine differences in clustering symptoms and in pain-specific measures between clusters as assigned at baseline, using data from the follow-up. To control for time-related effects, we added the number of days between the two assessments as covariate. This provided prognostic-like validation of the clusters.
Next, the nearest centroid classifier was implemented on the follow-up dataset to assess patient movement across clusters between the baseline and follow-up time points. Also, we used a bootstrap procedure (23, 24) to assess whether patients’ movement across clusters over time was due to potential error in measurement of the clustering symptoms, or potentially to the clusters’ ability to portray real improved or worsening of their condition. Since the PROMIS CAT engine uses a criterion of below three standard errors to calculate the final T score (27, 36), we randomly jittered the original T score for each patient and for each of the clustering symptoms at baseline, within ± three standard errors. The nearest centroid classifier was implemented on each patient’s simulated data to assign a cluster. We then assessed movement across clusters between the simulated dataset and the follow-up dataset, and calculated the number and subsequently the percent of patients moving across clusters. This procedure repeated 1000 times to generate a bootstrapped distribution of the percent of patients moving across clusters within measurement error. This distribution allowed us to calculate the probability of the actual percent of patients moving across clusters between the baseline and follow-up time points being attributed to measurement error.
Results
Demographic characteristics
Demographic characteristics of study participants are described in Table 1 (see also Table S1).
Cluster discovery, characterization, reliability, and validity
The dendrogram reflecting results of the AHCA as implemented on the training dataset is shown in Figure 1B. Based on the gap statistic, we grouped patients into an optimal number of three clusters (Figure 1C). In line with our expectation, the negative affect-related clustering symptoms of depression, anxiety, and anger were the most important factors driving the clustering process, ranked 1st, 2nd, and 4th, respectively, and contributing 42.4% to the overall separability between clusters (Figure 1D).
We labeled the clusters Cluster1, Cluster2, and Cluster3 to reflect the graded scale of severity that characterized all clustering symptoms (Table 2; Figure2A-I), as well as all pain-specific measures (Table 2; Figure 2J-R). These results provided initial validation of these clusters such that: Cluster1 reflects the least severe condition, Cluster3 the worst, and Cluster2 in between, with substantial effect sizes across all comparisons (Table 2). Since PROMIS instruments are normed to the general US population, we could inform that Cluster1 was on average 0.60 standard deviations (SD) better than the norm in the clustering symptoms, but 0.46SD worse than the norm in the subset of PROMIS-based pain-specific measures. Cluster2 was 0.36SD and 1.05SD, and Cluster3 was 1.20SD and 1.54SD, all worse than the norm in the clustering symptoms and pain-specific measures.
Although the pattern of severity also manifested in the number of self-reported body segments in pain, with Cluster3 indicative of potential widespread and/or overlapping chronic pain conditions, we found no significant associations between specific body regions (Table S2, Figure S1) and any of the clusters (Chi2=3.25, p=0.99; Figure2S). Cluster1 was only descriptively associated with more pain in the Front of Head (13.89%) compared to Cluster2 (9.47%) and Cluster3 (8.07%; Chi2=1.76, p=0.41). Similarly, none of the demographic characteristics were significantly associated with any specific cluster (Table 1). We replicated the same pattern of results across clustering symptoms and pain-specific measures in the validation (Table S3, Figure S2) as well as the longitudinal datasets (Table S4, Figure S3), except for pain duration, for which we found no differences between the three clusters (p-values>0.12). This supported the reliability and validity of the identified clusters. However, two critical questions arose that we addressed in the following two sections.
Can we obtain a similar clustering solution using only pain intensity?
The clustering solution identified three clusters portraying a graded scale of severity across all clustering symptoms and, importantly, all pain-specific measures. To address whether one variable could be used to obtain a similar clustering solution, we applied AHCA on the training dataset using a common measure for assessing the severity of pain, namely pain intensity. The dendrogram reflecting the results of the AHCA is shown in Figure 3A. The gap statistic indicated an optimal number of one cluster (Figure 3B). We nevertheless selected the non-optimal three-cluster solution to compare it with the clustering symptoms-based solution directly. To visualize this comparison, we applied Principal Components Analysis (PCA) (23, 24) on the nine-dimensional clustering symptoms (see Figure S4A for scree plot and Figure S4B for clustering symptoms’ contribution to the first three principal components). To evaluate the separability between the clusters of the two solutions, we plotted the entire training data-set using the first three principal components and colored the data points based on the three clusters of the clustering symptoms solution (Figure 3C) and of the pain intensity solution (Figure 3D). The separability between clusters is clearly seen in the clustering symptoms’ solution, while a substantial overlap is seen in the pain intensity solution, indicating that using pain intensity alone cannot capture a similar solution.
Does the clustering solution reflect a latent mental health-related construct?
As we initially expected, the negative affect- or mental health-related clustering symptoms were the most important factors driving the clustering process, and contributing most to the first principal component (42.2% of the explained variance; Figure S4B). This may suggest that the underlying structure of the dataset and subsequent clustering solution reflected a mere latent mental health construct. Therefore, by using a measure of mental health we might obtain similar results. To examine this alternative hypothesis, we calculated the Pearson correlation coefficient between the first principal components and the PROMIS Global Health Mental subscale. To note, this measure was available for n=10835 of the training dataset. The first principal component explained 55.42% of the variance in the data structure (Figure S4A), and had a correlation of r=-0.78 with the PROMIS Global Health Mental subscale (Figure 3E). The correlation with the second and third principal components, which explained 12.28% and 8.83% of the variance in the data structure (Figure S4) were r=0.11 and r=-0.02, respectively. As expected, this reconfirms mental health as a key construct in the data’s underlying structure, but not the only.
To further illustrate this point, we split the range of possible PROMIS Global Health Mental scores into tertiles and labeled them in order of severity (1, 2, and 3) to match the clustering symptoms’ cluster solution labeling. We then quantified the level of congruence between these two sets of labels by counting how many patients were assigned by each of the solutions to the same cluster label and how many were mismatched between the clusters (Figure 3F). The level of congruence was 76.73%, 50.44%, and 71.77% for Cluster1, Cluster2, and Cluster3, respectively, and with an overall 62.26% congruence. Together, it is clear that mental health is a primary component in the data’s underlying structure. Still, the proposed clustering solution reflects more than a mere latent mental health-related construct, particularly at the intermediate Cluster2.
Predictive validation and cluster dynamics over time
After controlling for the time between the two assessments (3-12months), we were able to demonstrate substantial differences between clusters as identified at baseline in all clustering symptoms (Table 3; Figure 4A-I), and pain-specific measures at follow-up (Table 3; Figure 4J-Q). Cluster1 continued to reflect the least severe condition, Cluster3 the worst, and Cluster2 in between, and again with substantial effect sizes across all comparisons (Table 3). These results validate the prognostic-like nature of the clusters and suggest that the graded scale of severity remains consistent at follow-up at the group-level. Nevertheless, cluster identification at follow-up demonstrates that while most patients (n=879, 69.05%) remained within their same cluster between the two time-points, there were movements across clusters (Figure 5A): 180 patients (14.14%) had an improvement in their condition and moved from Cluster3 to Cluster2 (n=69, 5.42%) or to Cluster1 (n=6, 0.47%), and from Cluster2 to Cluster1 (n=105, 8.25%); and 214 patients (16.81%) had a worsening in their condition and moved from Cluster1 to Cluster2 (n=115, 9.03%) or to Cluster3 (n=4, 0.31%), and from Cluster2 to Cluster3 (n=95, 7.46%). We compared the total movement of patients across clusters between time-points (n=394, 30.95%) to a bootstrapped distribution of patients moving across clusters within potential measurement error (M=5.81%±0.54SD; Figure 5B), which indicated a significant amount of movement (t(df=999)=1477.18, p<0.0001; Figure 5C). This suggests that the changes across clusters are meaningful, potentially indicating an interaction between treatment effects and regression to the mean (51). Importantly, cluster assignment is not a static condition; rather, various factors might impact the long-term dynamics across clusters, offering a window of opportunity for personalized interventions.
Discussion
In our study we offer a novel biopsychosocial-inspired approach to classify patients with chronic pain, resulting in the identification of three robust idiosyncratic groups of patients, and generating putative markers that can classify current and predict future severity of chronic pain in a graded manner, regardless of their formal diagnosis or their underlying etiology. We applied a data-driven clustering algorithm on multidimensional self-reported symptom assessments that are agnostic to pain. These assessments were collected through CHOIR, Stanford’s registry-based, learning health care system (30), and belonging to more than 16 thousand real-world patients seeking treatment at a tertiary academic pain clinic. These findings can be instrumental in supporting treatment selection and pain management in a personalized healthcare platform, especially in the current forward-triage approach to healthcare in which a clinician might not be able to physically examine a patient (52). Moreover, findings inspire further research into the biological and behavioral mechanisms that characterize the identified clusters.
The three identified groups reflected a graded scale of severity. They were therefore labeled Cluster1, Cluster2, and Cluster3, with higher numbers indicative of a more severe condition, as shown in all assessments, including those used for clustering as well as those specific for pain, except for pain duration since onset of chronic pain. No apparent demographic factors significantly differed between clusters. The overall group characteristics initially discovered on a subset of more than 11 thousand patients reliably reproduced in two additional subsets consisting of about five thousand patients. The sample size utilized in the development and validation of these clusters (>16K) is consdireably larger compared to even the largest samples (∼5K) used in previous efforts (11, 12, 14). Moreover, one of our subsets comprising 1273 patients included follow-up assessments, the severity of which were predicted based on the baseline cluster assignment. Examining the dynamics across clusters between baseline and follow-up assessments indicated that cluster assignment is not a static condition, suggesting that various factors might impact improvement or worsening of the pain condition. Thus, beyond the diagnostic- and prognostic-like nature of these symptom-based putative markers, future clinical and research efforts can examine whether and to what extent they will indicate response to various treatments (20, 21).
One of the biggest challenges for chronic pain healthcare is identifying safe and effective treatments tailored to the patient’s particular needs. The symptom-based classification system proposed here addresses this challenge by reversing the common paradigm – it uses the endpoint, as in patient-reported symptoms, to classify the condition of pain and aiming in subsequent research to provide the underlying mechanistic basis. Similar evidence-based approaches have called for chronic pain classification systems that focus on the fine-grained multidimensional and mechanistic substrates of chronic pain conditions (53, 54). However, having potentially too many dimensions for classification, and the costs and burdensome nature of medical tests required for such classification processes may make translating them into clinically interpretable and applicable tools challenging (55). Subsequently, in most clinical settings, as in research, chronic pain continues to be diagnosed predominantly by the relevant anatomical location of pain (56). In contrasts, our computational approach is easily interpretable and substantially cost-effective since it relies on a minimal set of self-reported assessments that can be completed using an electronic device from almost any place, in about 15 minutes, and with hardly any need for assistance from staff. Moreover, unlike the dominant diagnostic system, we find no associations between location of experienced pain and identified cluster. Nevertheless, the number of body regions in pain increased with severity, indicating that patients with widespread and/or overlapping chronic pain conditions suffer more than those with a more localized pain condition. In line with previous research (12, 57), findings thus evidence a diminished reliance on specific anatomical locations of experienced pain when assessing and classifying the severity of impairment in primary pain conditions, and potentially when considering treatment avenues.
As expected, negative affect-related symptoms emerged as key factors driving the clustering process. Researchers have previously demonstrated similar negative affect-related metrics to be central in clustering patients with chronic pain (11, 12). As we further confirmed, a global measure of mental health was a key construct in the data’s underlying structure. This reverberates with the crucial role of mental health in chronic pain (9, 28). Currently, there is an ongoing paradigm shift in psychology and psychiatry, calling for the classification of psychopathology as a hierarchy of continuous dimensions rather than describing it through discrete diagnostic categories (58). On top of the hierarchy is a global factor termed the “p factor”, generally ranging from low to high psychopathological severity, and cutting through all psychopathological disorders to account for their nonspecific and overlapping manifestation of symptoms (59, 60). The resemblance to chronic pain is astounding. The empirical findings presented here suggest that pain as a field should consider establishing a similar hierarchy of continuous global transdiagnostic dimensions to improve the ability to address the challenges of chronic pain. Moreover, this echoes our contemporary perspective on the need for more synergistic interactions between the research and clinical fields of pain and mental health, specifically regarding the centrality of affective components to these fields (28).
Our findings build on existing notions of a general graded scale of severity of chronic pain illness (57, 61, 62). However, our approach extends previous efforts in terms of the combination of scale, scope, computational approach, and especially in that we use multidimensional domain-general symptoms that are agnostic to pain. This is advantageous for two main reasons. First, it may highlight potentially modifiable targets for intervention. As we anticipated, negative affect-related factors, namely depression, anxiety, and anger, were key drivers in cluster assignment at the group level. Fortunately, there is a flourishing of treatment strategies aimed to reduce negative affect-related symptomatology (63–68). Moreover, findings indicate that the distribution of symptoms severity and of patients across clusters is to a certain extent blended (e.g., Figure 2A-R, Figure 3C). This suggests that a healthcare clinician may consider the particular pattern of symptoms at the individual patient level regarding the assigned cluster and utilize this information to guide and support clinical decision-making contextually. For example, we may envision a patient assigned to the lower severity Cluster1, but has relatively high levels of sleep dysfunction that a clinician could address with specific sleep-related treatments (69).
A second advantage of the domain-general symptoms approach is that it may be implemented and potentially generalizable to other chronic illnesses requiring symptom management beyond the specific pathophysiology of their disease, like cancer, immune disorders, and cardiovascular diseases among many others. Notably, the graded classification of severity resonates with other illnesses that are characterized by a staged progression of disease, like cancer (70), heart (71) and kidney diseases (72), diabetes (73), and more. Here, however, we did not use objective and etiological based metrics, and future integration of genetic, metabolic, inflammatory, and/or anatomical and functional neuroimaging metrics can substantially improve our understanding of the biological mechanisms underlying the identified symptom-based clusters and potentially lead to improved (bio)marker properties (20, 21).
The US National Pain Strategy has drawn attention to a sub-group of chronic pain patients – those with persistent high-impact chronic pain. These patients suffer from the most severe and debilitating illness, substantially restricting and interfering with daily life activities, and requiring increased healthcare expenditure (3, 74, 75). The prevalence of high-impact chronic pain is estimated to be between 5-15% of the adult US population (10-30 million people) (3, 5, 75). Compared to lower but still clinically significant chronic pain, high-impact chronic pain was associated with unfavorable health outcomes, limitations in daily activity, negative coping strategies, elevated distress, increased health care costs, and higher usage and dosage of opioid medication (62, 76). With the potential collateral personal, societal, and financial impact of long-term opioid medication (77), it is particularly crucial to better identify and understand people suffering from and at increased risk of high-impact chronic pain. Within our proposed classification system, Cluster3 may be reflective of such a group of patients: an overall most severe condition characterizes it, manifested at the group level by highest levels of pain interference, widespread and/or overlapping chronic pain conditions, low levels of physical function, fatigue, depression, and basically in every measure that we assessed. Early identification of these patients is essential for the provision of more comprehensive and costly pain assessments (e.g., psychological, medical, etcetera) that better inform treatment approaches (e.g., physical or psychosocial therapy, medical interventions, etcetera).
There are notable limitations to our study. While our cohort is uniquely large, it is restricted to the San Francisco Bay Area and the outlining Northern California region, and potentially also to patients who can afford specialized medical treatment in a tertiary academic clinic. Future efforts will need to generalize our findings to other locations with different demographic, socio-cultural, and economic characteristics. In this regard, there are known demographic disparities related to pain healthcare (78, 79) that were not captured by the identified clusters. This may be attributed to the particular characteristics of the cohort (Table S3), for example being primarily White (53.87%) and with above college level of education (61.92%). However, there are some descriptive trends worth noting. Across datasets (Table S3), Cluster3 was generally characterized by younger age, lower education level, more females, more patients identifying as Hispanic or Latino, less of them identifying as White, and less reporting being married. It is essential to highlight that we can only address most of these factors through a systematic change in healthcare. Additionally, in terms of cohort characteristics, we had no data on formal diagnoses within our cohort. Although findings indicate no association between the identified clusters and anatomical pain location, which is commonly used to support formal definitions of chronic pain conditions, this should still be tested and confirmed. Notably, previous CHOIR studies were able to indicate a multitude of formal diagnoses (25, 26), including neuropathic, thoracolumbar, orofacial, visceral, and various musculoskeletal pain conditions, as well as fibromyalgia and complex regional pain syndrome, amongst many others, and these can be assumed to be part of the current cohort. Thus, unlike previous clustering efforts that were restricted to specific chronic pain conditions (12, 13, 15, 17, 19), our findings seem to be generalizable at least to varying types of chronic pain conditions.
That the assessments used for clustering are based on NIH’s PROMIS system has its limitations since although they were validated for their psychometric properties (27, 31–36), they are still based on self-reported assessments and thus prone to potential biases and demand characteristics. Other studies using various clustering approaches have incorporated more objective measurements, with a better characterization of their underlying mechanistic substrates. Most have used various multimodal pain sensory testing that map on to various nociceptive pathways (12, 13, 15, 16, 18, 19). While incorporating objective measures with better understanding of their underlying pathophysiology is a clear next step for this research, using the PROMIS system offers substantial advantages. PROMIS-based T scores are normed to the general US population and thus easily comparative across cohorts. PROMIS is also an inexpensive and easily administered system, using short forms or computerized adaptive testing to reduce time and patient burdens, and is already in wide usage in many settings, even beyond chronic pain, thus allowing others to take a similar approach as ours, or to engage with our freely available cluster-classifier (TBA) for additional utilization in clinical and research settings. Moreover, previous findings show associations between various PROMIS measures and potential biomarkers in both pain and non-pain clinical contexts (80–82). Finally, that the clusters differ in pain-specific measure that are non-PROMIS based, such as pain intensity and pain catastrophizing, solidifies the validity and generalizability beyond PROMIS-based measures.
In conclusion, our symptom-based approach and findings offer significant diagnostic- and prognostic-like utility for a cost-effective, graded severity classification system of patients with chronic pain, potentially generalizable to other chronic illnesses. Our study’s exploratory nature requires further research to reconfirm and generalize the identified clusters in different chronic pain cohorts, as well as experimental and mechanistic studies to uncover their etiological basis. Nevertheless, this system promises to support clinical decision-making, impacting the day-to-day functioning of patients with chronic pain, and encourages investigations into new treatment opportunities oriented towards a precision- and evidence-based approach to relieve the burdens of people suffering from chronic illness, and improve their quality of life. It thus reflects a synergy between theory-driven scientific research, clinical care, and technological advancement that aims to facilitate personalized healthcare by closing on the bedside-to-bench-to-bedside loop.
Funding
This study was supported by The Redlich Pain Research Endowment (SCM) and The Feldman Family Foundation Pain Research Fund (SCM and GG). The following National Institutes of Health grants provided further support: R01 NS11865 (SCM), R01 NS109450 (SCM), K24 DA029262 (SCM), P01 AT006651 (SCM), K23 NS104211 (KAW), L30 NS108301 (KAW), and K23 DA047473 (MSZ). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Author contributions
Original conceptualization: GG; Contribution to conceptualizations: GG, EMC, KAW, MZ, MCK, SCM; Methodology: GG, EMC, MCK; Software and visualization: EMC, GG; Analyses: GG, EMC; Resources and Supervision: SCM; Writing—original draft: GG; Writing—review & editing: GG, EMC, KAW, MZ, MCK, SCM; All authors discussed the analyses, the results, and their interpretation, and approved the final manuscript.
Competing interests
Authors declare that they have no competing interests.
Data and materials availability
All data needed to evaluate the conclusions in the paper are present in the paper. Additional data related to this paper may be requested from the authors.
Supplementary Materials
Acknowledgements
References
- 1.↵
- 2.
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.
- 33.↵
- 34.
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.
- 43.↵
- 44.↵
- 45.
- 46.↵
- 47.↵
- 48.↵
- 49.
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.
- 65.
- 66.
- 67.
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.
- 82.↵
- 83.↵