ABSTRACT
Background A primary goal of precision medicine is to identify patient subgroups and infer their underlying disease processes, with the aim of designing targeted interventions. However, few methods automatically identify both patient subgroups and their co-occurring characteristics simultaneously, measure their significance, and visualize the results. Such methods could enhance the interpretability of patient subgroups, and inform the design of classification and predictive models.
Objectives To analyze patient subgroups in hospital readmitted patients using a three-step modeling approach. (1) Visual analytical modeling to automatically identify patient subgroups and their co-occurring comorbidities, and determine their statistical significance and clinical interpretability. (2) Classification modeling to classify patients into subgroups and measure its accuracy. (3) Prediction modeling to predict a patient’s risk of readmission and compare its accuracy with and without patient subgroup information.
Methods We extracted 2013-2014 Medicare data related to hospital readmission in three conditions: chronic obstructive pulmonary disease (COPD), congestive heart failure (CHF), and total hip/knee arthroplasty (THA/TKA). For each condition, we extracted cases defined as patients readmitted within 30 days of hospital discharge, and controls defined as patients not readmitted within 90 days of discharge, matched by age, gender, race, and Medicaid eligibility (n[COPD]=29,016, n[CHF]=51,550, n[THA/TKA]=16,498). These data were analyzed using: (1) bipartite networks to identify patient subgroups based on frequently co-occurring high-risk comorbidities; (2) multinomial logistic regression to classify patients into subgroups; and (3) hierarchical logistic regression to predict the risk of hospital readmission using subgroup membership, compared to standard logistic regression without subgroup membership.
Results In each condition, the visual analytical model identified patient subgroups that were statistically significant (Q=0.17, 0.17, 0.31; P<.001, <.001, <.05), were significantly replicated (RI=0.92, 0.94, 0.89; P<.001, <.001, <.01), and were clinically meaningful to clinicians. (2) In each condition, the classification model had high accuracy in classifying patients into subgroups (mean accuracy=99.60%, 99.34%, 99.86%). (3) In two conditions (COPD, THA/TKA), the hierarchical prediction model had a small but statistically significant improvement in discriminating between the readmitted and not readmitted patients as measured by net reclassification improvement (NRI=.059, .11), but not as measured by the C-statistic or integrated discrimination improvement (IDI).
Conclusions While the visual analytical models identified statistically and clinically significant patient subgroups, the results pinpoint the need to analyze subgroups at different levels of granularity for improving the interpretability of intra- and inter-cluster associations. The high accuracy of the classification models reflects the strong separation of the patient subgroups despite the size and density of the datasets. Finally, the small improvement in predictive accuracy suggests that comorbidities alone were not strong predictors for hospital readmission, and the need for more sophisticated subgroup modeling methods. Such advances could improve the interpretability and predictive accuracy of patient subgroup models for reducing the risk of hospital readmission and beyond.
INTRODUCTION
Background
A wide range of studies [1-9] on topics ranging from molecular to environmental determinants of health have shown that most humans tend to share a subset of characteristics (e.g., comorbidities, symptoms, genetic variants), forming distinct patient subgroups. A primary goal of precision medicine is to identify such patient subgroups and infer their underlying disease processes to design interventions targeted to those processes [2, 10]. For example, recent studies in complex diseases such as breast cancer [3, 4], asthma [5-7] and COVID-19 [11] have revealed patient subgroups, each with different underlying mechanisms precipitating the disease, and therefore each requiring different interventions.
A critical requirement for designing such interventions is the clinical interpretability of patient subgroups. Such interpretability requires clinicians to understand (a) how characteristics (e.g., comorbidities, symptoms, genetic variants) frequently and significantly co-occur across patients, and (b) the risk for adverse outcomes (e.g., mortality, hospital readmission) of patient subgroups that have those co-occurrences. An integration of the co-occurrence of characteristics, with the risk of outcomes in patient subgroups, is critical to infer the disease processes underlying each patient subgroup, and to design precision interventions targeted to those patient subgroups. However, few methods automatically identify both patient subgroups and their co-occurring characteristics simultaneously, which is important for measuring the risk for adverse outcomes and inferring their mechanisms. Such integrated methods could enhance the interpretability of patient subgroups by clinicians for designing interventions, and for informing the design of classification and predictive models that provide clinical decision support.
To address this need, we used a visual analytical method to identify and analyze patient subgroups in hospital readmitted patients. While we have previously demonstrated [12] the use of visual analytics to identify patient subgroups and their characteristics in hospital readmission, here we explore how the approach generalizes across three hospital readmission conditions and its use in classification and predictive modeling. This was done through an analytical framework for Modeling and Interpreting Patient Subgroups (MIPS) which used a three-step modeling approach: (1) Visual analytical modeling through bipartite networks to automatically identify patient subgroups and their co-occurring characteristics, and determine their statistical significance and clinical interpretability. (2) Classification modeling through multinomial logistic regression to classify patients into subgroups. (3) Prediction modeling through logistic regression with and without subgroup information to predict the risk of hospital readmission. Application of the MIPS analytical framework to three datasets helped pinpoint methodological and data limitations in our approach, which provided implications for improving the interpretability of patient subgroups in large and dense datasets, and for the design of clinical decision support systems to prevent adverse outcomes such as hospital readmissions.
Current Approaches for Identifying Patient Subgroups
A patient subgroup is a subset of patients drawn from a population (e.g., older adults) that share one or more characteristics (e.g., renal failure and diabetes). Patients have been divided into subgroups by using (a) investigator-selected variables such as race for developing hierarchical regression models [13], or assigning patients to different arms of a clinical trial, (b) existing classification systems such as the Medicare Severity-Diagnosis Related Group (MS-DRG) [14] to assign patients into a disease category for purposes of billing, and (c) computational methods such as classification [15-17] and clustering [5, 18] to discover patient subgroups from data (also referred to as subtypes or phenotypes depending on the condition and variables analyzed).
One of the simplest computational methods to identify patient subgroups is by enumerating conjunctions (identify all pairs, all triples, etc.) of variables such as comorbidities [19] that co-occur across patients and then examining the most prevalent subgroups. While such approaches are intuitive, they can lead to a combinatorial explosion (e.g., enumerating combinations of the 31 Elixhauser comorbidities would lead to 231 or 2147483648 combinations), and most combinations do not incorporate the full range of comorbidities (e.g., the most frequent pair of comorbidities ignores other comorbidities that might exist in the profile of patients with that pair). Other approaches use unipartite clustering methods [17, 18] (e.g., clustering patients or comorbidities, but not both together) such as k-means and hierarchical clustering; and dimensionality-reduction methods such as principal component analysis (PCA) [17] to identify principal components to define a reduced dimensionality plane on which patients or comorbidities are projected, which are then clustered using unipartite methods such as k-means (together referred to as spectral clustering).
However, because these methods are unipartite, there is no agreed-upon method to identify the patient subgroup defined by a cluster of characteristics, or vice-versa, which substantially reduces the interpretability of the results. Furthermore, such methods have well-known limitations, including (a) requiring a user-defined input for a similarity measure (e.g., Jaccard distance) to calculate the similarity between pairs of patients based on their profiles, or pairs of characteristics based on how their co-occurrence across patients, (b) requiring a user-defined input for the expected number of clusters, and (c) the absence of a quantitative measure to measure the quality of the clustering, critical for measuring the statistical significance of the clustering.
More recent bipartite network analytical methods [20] have attempted to address these limitations by automatically identifying biclusters [18, 21, 22] (e.g., clustering of patients and comorbidities simultaneously). A network consists of nodes and edges; nodes represent one or more types of entities (e.g., patients or comorbidities), and edges between the nodes represent a specific relationship between the entities. Figure 1A shows a unipartite network, where nodes are of the same type (often used to analyze co-occurrence of comorbidities [23]). In contrast, Figure 1B shows a bipartite network where nodes are of two types, and edges exist only between different types such as between patients (circles) and comorbidities (triangles). This approach uses bipartite modularity maximization [20, 24-26], a graph-theoretic approach to (a) quantitatively output the number, size, and statistical significance [18, 27] of biclusters (consisting of a patient subgroup and its most frequently co-occurring comorbidities), and (b) visualize those biclusters using layout algorithms [28, 29] to enable their clinical interpretation [11, 12, 30-36]. As shown in Fig. 1C, a bipartite visualization could enable clinicians to inspect the bicluster associations, infer potential mechanisms in each patient subgroup, and design targeted interventions. Our prior use of bipartite networks have enabled three types of discoveries related to subgroups: (1) novel subtypes (e.g., in asthma [33]); (2) frequency of known subtypes in a new condition (e.g., in COVID-19 [11]), and (3) risk of subtypes for adverse outcomes (e.g., in hip fracture hospital readmission [12]). Furthermore, the above subgroups could be used to train classifiers for classifying a new patient into a subgroup, and to build predictive models that leverage such patient subgroups to predict an outcome in a new patient.
Leveraging Patient Subgroups in Predictive Modeling
Patient subgroups are leveraged in predictive modeling using two common approaches [37] that trade-off simplicity with accuracy: (1) Hierarchical Modeling adds subgroup information (e.g., a subgroup membership variable specifying to which subgroup a patient belongs, predicted by a classifier) to a Standard Model without subgroup membership information to improve accuracy. However, while this approach is simple, it potentially trade-offs accuracy as the model’s parameters (e.g., slope and intercept of a regression model) are fixed for all patients, regardless of subgroup membership. (2) Subgroup-Specific Modeling develops multiple models, one for each subgroup, allowing each model to have different model parameters, potentially improving accuracy. However, this improved accuracy trade-offs simplicity as the evaluation requires several additional steps: build multiple predictive models, predict the outcomes for each patient using the appropriate model (predicted by a classifier), aggregate the accuracy of predictions across all patients, and compare it to the predictive accuracy of all patients generated from the Standard Model. Given this complexity, we used the simpler Hierarchical Modeling approach as a preliminary step for leveraging patient subgroups.
The Need for Automatic Identification of Patient Subgroups in Hospital Readmission
An estimated one in five elderly patients (over 2.3 million Americans) is readmitted to a hospital within 30-days after being discharged [38]. While many readmissions are unavoidable, an estimated 75% of readmissions are unplanned and mostly preventable [39], imposing a significant burden in terms of mortality, morbidity, and resource consumption. Across all conditions, unplanned readmissions cost almost $17 billion annually in the US [39], making them an ineffective use of costly resources, and therefore closely scrutinized as a marker for the poor quality of care by organizations such as the Centers for Medicare & Medicaid Services (CMS) [40].
To address this epidemic of hospital readmission, CMS sponsored the development of models to predict the patient-specific risk of readmission in specific index conditions such as chronic obstructive pulmonary disease (COPD) [41], congestive heart failure (CHF) [42], and hip/knee arthroplasty (THA/TKA) [43]. These models have two characteristics that are pertinent to the current study:
1. Inclusion of Comorbidities as Independent Variables
The independent variables (predictors) in the CMS models were prior comorbidities (as recorded in Medicare claims data), and demographics (age, gender, and race). The use of comorbidities was based on extensive literature showing the critical role comorbidities play in increasing the risk for adverse outcomes in older adults [38]. For example, almost two-thirds of older adults have two or more comorbid conditions, resulting in a heightened risk for adverse health outcomes such as hospital readmission and mortality [44]. Furthermore, multiple comorbidities often do not act independently, but rather interact with each other, resulting in processes that can precipitate readmission [45]. For example, due to the systemic nature of renal disease, a hip fracture patient with congestive heart failure and renal failure is at a higher risk of renal failure exacerbation, precipitating a hospital readmission, compared to one who only had renal failure [12]. To enable a head-to-head comparison with the CMS predictive models, we used the same independent variables for our predictive models.
2. Exclusion of Patient Subgroups
None of the CMS models used information related to patient subgroups. Therefore, while such models provide the risk of readmission for an individual patient, they do not leverage the existence of patient subgroups known to be present among patients with hospital readmission [12]. Such patient subgroups could be used in hierarchical regression models to potentially achieve higher predictive accuracy. Furthermore, while the primary focus of the CMS models was on predicting the risk of readmission of a patient, they provide little clinical guidance for the design of clinical interventions to address that risk. In contrast, if a patient belongs to a previously-identified patient subgroup with a comorbidity profile (often referred to as a phenotype), such information could be leveraged to classify patients into the best-fitting phenotype, and then to use that classification as a starting point to design clinical interventions targeted to the patient.
Here we demonstrate the development and use of an analytical framework for Modeling and Interpreting Patient Subgroups (MIPS) by using a three-step modeling approach: (1) bipartite networks to automatically identify subgroups of readmitted patients and their frequently co-occurring comorbidities, (2) classifiers to classify patients into a best-fitting subgroup, and (3) hierarchical predictive models which leverage the subgroup information to predict each patient’s risk of readmission. This analytical framework was tested across three index conditions where readmission frequently occurs.
METHOD
Overview of Method
Fig. 2 provides a conceptual description of the data inputs and outputs from the three-step modeling in MIPS. As shown, the visual analytical model identifies the patient subgroups, and visualizes them through a network. The classification model predicts subgroup membership for cases and controls, and uses it to measure the risk of readmission within each subgroup based on its proportion of cases. This risk information is juxtaposed with the visualization to enable clinicians interpret the readmitted patient subgroups. Finally, the predictive model uses the subgroup membership assignment of cases and controls to predict the readmission risk of a patient.
Data Description
Study population
We analyzed patients hospitalized for chronic obstructive pulmonary disease (COPD), congestive heart failure (CHF), and total hip/knee arthroplasty (THA/TKA). We selected these three index conditions because: (a) hospitalizations for each of these conditions are highly prevalent in older adults [38]; (b) hospitals report very high variations in their readmission rates [38]; and (c) there exist well-tested readmission prediction models for each of these conditions that did not consider patient subgroups [41-43, 46, 47].
For each index condition, we used the same inclusion and exclusion criteria used to develop the CMS models, but with the most recent years (2013-2014) provided by Medicare when we started the project. We used 100% of the 30-day readmitted patients in 2013 and 2014 Medicare claims data, from which we extracted all patients that were admitted to an acute care hospital on or after July 2013-August 2014 with a principal diagnosis of the index condition, were 66 years of age or older, and were enrolled in both Medicare parts A and B fee-for-service plans in the 6 months before admission. Furthermore, we excluded patients who were transferred from other facilities, died during the hospitalization, or transferred to another acute care hospital. Similar to the CMS models, we selected the first admission for patients with multiple admissions during the study period, and did not use Medicare Part D (related to prescription medications).
Next, we extracted 100% controls who were not readmitted for at least 90 days since discharge. CMS uses this 90-day window of no re-admittance to ensure that the controls are substantially free of complications that result in readmission during this period [48, 49]. A small percentage (0.8%) of Medicare patients had “unknown race” for the Race attribute, so we grouped “unknown race” and “other race” and ensured that there was an equal number of them in the cases and control datasets. The low rate of missing data on race had too low a risk for bias to warrant a sensitivity analysis. Appendix-1 shows the detailed inclusion and exclusion criteria used to extract cases and controls for COPD, CHF, and THA/TKA, and the respective numbers of patients extracted at each step, in addition to the International Classification of Diseases, Ninth Version codes (ICD-9) codes for each of the three index conditions selected for analysis. Each modeling method used appropriate subsets of the above data described in the sections below.
Variables
The dependent variable (outcome) was whether a patient with an index admission (COPD, CHF, THA/TKA) had an unplanned readmission to an acute-care hospital within 30 days of discharge, as was recorded in the MEDPAR file (inpatient claims) in the Medicare database.
The independent variables included comorbidities, and patient demographics (age, gender, race). Comorbidities common in older adults were derived from three established comorbidity indices: Charlson Comorbidity Index (CCI) [50], Elixhauser Comorbidity Index (ECI) [51], and the Center for Medicare and Medicare Services Condition Categories (CMS-CC) used in the CMS readmission models [52] (the variables in the CMS models varied across the index conditions). As these indices had overlapping comorbidities, we derived a union of them, which was verified by the clinician stakeholders. They recommended that we also include the following additional variables as they were pertinent to each index condition: COPD (history of sleep apnea, mechanical ventilation); CHF (history of coronary artery bypass graft surgery); THA/TKA (congenital deformity of the hip joint, post-traumatic osteoarthritis). For each patient in our cohort, we extracted the above comorbidities and variables from the physicians, outpatient, and inpatient Medicare claims data in the 6 months before (to guard against miscoding), and on the day of the index admission.
Analytical and Evaluation Approach
Overview of the MIPS Framework
Table 1 provides a summary of the inputs and outputs of the three-step modeling approach in the MIPS framework, which was applied across the three index conditions.
Visual Analytical Modeling
The data used to build the visual analytical model consisted of 100% cases, and an equal number of 1:1 matched controls extracted by randomly selecting a control without replacement to match each case based on age, gender, race/ethnicity, and Medicaid eligibility [53]. The resulting dataset was divided randomly into a training (50%) and replication (50%) dataset (we use the term replication to avoid confusion with the term validation typically used in classification and prediction models). We used a bipartite network to model the cases (30-day readmitted patients) and significant comorbidities in each index condition using the following steps:
A. Model Training
The training of the bicluster network model consisted of the following two steps:
I. Feature Selection
Given the large number of patients and comorbidities in the dataset, we used feature selection to identify comorbidities with the strongest signal and therefore interpretability for readmission using the following steps: (1) excluded comorbidities with prevalence less than 1% (as is commonly done in studies to reduce noise [54]); (2) selected significant comorbidities in the training dataset based on a 2-way interaction test using odds ratio (OR) with directionality, and correcting for multiple testing using Bonferroni, and (3) tested the surviving comorbidities for replication in the replication dataset, and selected those that were significant in both datasets. Appendix-2 shows the number of comorbidities, and variables that were included in the analysis for each of the three index conditions. The above feature selection generated a single set of significant and replicated comorbidities used for the following bipartite network analysis.
II. Biclustering
We used bipartite networks on the training dataset to analyze heterogeneity in readmission using the following steps. (1) Removed all cases that did not have any comorbidities (as the modularity maximization algorithm will trivially put disconnected nodes into a separate cluster). (2) Represented the cases (30-day readmitted patients in the training dataset) and their significant and replicated comorbidities (selected in Step A) as a bipartite network. As shown in Fig. 1, the nodes represented cases or comorbidities, and edges represented which case had which comorbidity. (3) Used a bipartite modularity maximization algorithm to identify the number of biclusters, their boundaries, and degree of biclusteredness using modularity. Modularity is defined as the fraction of edges falling within a cluster, minus the expected fraction of such edges in a network of the same size with randomly assigned edges [20]. Modularity ranges from -0.5 to +1, with values >0 indicating biclustering that is higher than can be expected by chance. We used the bipartite version of modularity [55, 56] to find biclusters in the network. (4) Measured the significance of the bicluster modularity by comparing it to a distribution of the same quantity generated from 1000 random permutations of the network, by preserving the network size (number of nodes) and the network density (number of edges).
B. Model Replication
Repeated the above biclustering steps 1-4 to identify subgroups in the replication dataset, and compared the comorbidity co-occurrence in the training dataset, to that in the replication dataset using the Rand index (RI) [57]. RI measures the proportion of comorbidity pairs that co-occurred and did not co-occur in a cluster in the training and replication datasets (where 0=no inter-network cluster similarity, and 1=total inter-network cluster similarity). The significance of RI was measured by comparing it to a distribution of the same quantity generated from 1000 random permutations of the training and replication networks. All tests of statistical significance in Steps A and B were 2-sided.
C. Model Interpretation
The model interpretation consisted of the following steps:
I. Visualization
We used the following steps to visualize the network generated from the training dataset. (1) Used Fruchterman-Reingold (FR) [58], a force-directed algorithm to lay out the bipartite network. This layout algorithm pulls together nodes that are strongly connected, and pushes apart nodes that are not. This results in nodes with a similar pattern of connections to be placed close to each other in Euclidean space, and those that are dissimilar are pushed apart. (2) As the FR algorithm often cannot entirely separate clusters in large and dense networks, the network layout needs to be visually enhanced before it is interpretable by clinician stakeholders. Therefore, we used the ExplodeLayout algorithm [28, 29] to separate the biclusters to reduce their visual overlap. This algorithm preserves the distances of nodes within a bicluster, but increases the distance of nodes between clusters to improve interpretability. (3) Juxtaposed the risk of readmission with the network visualization (in response to a request from the clinical stakeholders). This was done by (a) displaying comorbidity labels with their univariable ORs for readmission (measured in Step A) ranked by their odds ratios (ORs) for each subgroup, and (b) measuring the readmission risk for each patient subgroup based on the full case-control population (explained in more detail in the section on predictive modeling), and juxtaposing it with the respective subgroup.
II. Clinical Interpretation
We used the following steps to solicit clinical interpretations of the above bipartite network. (1) Recruited a pulmonologist specializing in COPD and hospital readmission to interpret the COPD results, and a geriatrician with expertise in treating older adults in CHF and THA/TKA to interpret the respective results. (2) Requested each clinician stakeholder to interpret the patient subgroups, their mechanisms, and potential interventions to reduce the risk for readmission.
Classification Modeling
As shown in the bipartite network example in Fig. 1, the biclusters identified through the modularity maximization algorithm contain patient subgroups and their most frequently co-occurring comorbidities with respect to other patients in the network. However, there are often many edges between biclusters, revealing that many patients within a bicluster have comorbidities that exist in other biclusters. As is true for most partitioning cluster methods, including modularity, membership of a new patient to each bicluster is therefore probabilistic. The classification of a patient into a cluster is therefore not defined by the inclusion or exclusion of comorbidities (e.g., hypertension and diabetes), but rather by the probability of being in a patient subgroup. Patients are therefore similar or different, not just in a handful of carefully-selected comorbidities while ignoring others, but based on all of their recorded comorbidities. This overall profile of patients reflects the reality of comorbid conditions.
To model the above complexity, we used multinomial logistic regression [17] to develop classification models in each index condition. This approach has the advantage of generating probabilities (“soft labels”) for a patient to belong to each patient subgroup. The models were trained, internally validated, and then applied to generate information for the other two modeling methods, as described below:
A. Model Training
The data used to build the classification model consisted of the training dataset and subgroup membership from the visual analytical model. We trained a multinomial logistic regression model using the above data, with independent variables that included comorbidities identified through feature selection done for the visual analytical modeling. Accuracy of the trained model was measured by calculating the percentage of times the model correctly classifed the cases into the subgroups, using the highest predicted probability across the subgroups (“hard labels”).
B. Model Internal Validation
To internally validate the classifier, we randomly split the above data into training (75%) and testing (25%) datasets, 1000 times. For each iteration, we trained a model using the training dataset, and measured its accuracy on the testing dataset. This was done by predicting the subgroup membership using the highest predicted probability among all the subgroups. The overall predicted accuracy was then estimated by calculating the mean accuracy across the 1000 models.
C. Model Application
Using the 100% cases, in addition to the 100% controls from July 2013-August 2014 (representing the entire Medicare population of each index condtion from those years), we generated the following two types of information for use in the other models. (1) Used the classifier trained in Step A above, to classify 100% cases and 100% controls into a subgroup. This information was used by the subsequent predictive modeling. (2) While the visual analytical model used the 1:1 matched controls for feature selection, this cohort did not represent the entire population. Therefore, to accurately measure the subgroup risk, we used the entire case-control population classified into the subgroups (as described in the above step), and measured the proportion of cases in each subgroup. Furthermore, as requested by the clinicians, we juxtaposed these subgroup risks next to the respective subgroups in the bipartite network visualization, to improve their interpretability.
Predictive Modeling
The data used to build the predictive models consisted of 100% cases and 100% controls, in addition to their subgroup membership generated from the above classification models. These data were randomly spilt into a training (75%) and validation (25%) dataset. The predictive models were trained, internally validated, and compared for predictive accuracy, as described below:
A. Model Training
We used the training dataset to train a Standard Model (binary logistic regression without subgroup membership similar to the CMS models), and a Hierarchical Model (binary logistic regression with subgroup membership), with 30-day unplanned readmission (yes vs. no) as the outcome. Independent variables for both models included comorbidities identified through the feature selection in each index condition (see Appendix-2), and demographics. The Hierarchical Model additionally included subgroup membership.
B. Model Internal Validation
We used the validation dataset to internally validate the models through the following two measures:
I. Discrimination (model’s ability to distinguish readmitted patients from those not readmitted) was measured using the C-statistic, which is identical to the area under the receiver operating characteristic (ROC) curve. Model discrimination was examined using box plots to show the average risk prediction for patients with and without readmission.
II. Calibration (model’s agreement of the predicted probabilities with the observed risk) was measured using calibration-in-the-large, and calibration slope, which was examined through a calibration plot showing the proportion of patients actually admitted, versus deciles of predicted probability of having readmission. Good calibration is when calibration-in-the-large is close to zero, and the calibration slope is close to one. Since the large sample size overpowered the study, we did not measure the calibration based on statistical significance (e.g., P values of the Hosmer-Lemeshow and calibration indices).
C. Model Comparisons
We used the chi-squared test to compare the C-statistic of the Standard Model to that of the Hierarchical Model. We also measured the C-statistic of the Standard Model applied to each subgroup separately. This enabled examination of how the Standard Model performed on patient subgroups to identify, for example, which subgroups underperformed when using the current Standard Model.
Because the above models used the feature selection step to select comorbidities for use as independent variables, they differed from those used in the published CMS models. Therefore, to perform a head-to-head comparison with the published CMS models, we additionally developed a logistic regression model using independent variables that were identical to the published CMS model (CMS Standard Model), which was compared to the same model that included subgroup membership (CMS Hierarchical Model). We used the chi-squared test to compare the C-statistic of the CMS Standard Model to that from the CMS Hierarchical Model, in addition to the following measures of model accuracy:
I. Net Reclassification Improvement (NRI) measured the proportion of patients whose predicted probability of readmission improved with reference to actual readmission status. We used two NRI statistics: (a) categorical NRI, which predicted readmission probabilities divided into 10 sequential categories ranging from 0-1, with improvement requiring a shift between categories; and (b) continuous NRI which is based on the proportions of patients with any improved predicted probability of readmission, regardless of the size of that improvement.
II. Integrated Discrimination Improvement (IDI) measured the difference in the average improvement in predicted risks between the CMS Standard Model and the CMS Hierarchical Model.
RESULTS
Data
Table 2 provides a summary of the number of cases and/or controls used to develop the three models in each condition.
Visual Analytical Modeling
The visual analytical modeling of readmitted patients in all three index conditions produced statistically and clinically significant patient subgroups and their most frequently co-occurring comorbidities, which were significantly replicated. Results from each condition are described below:
COPD
The inclusion and exclusion selection criteria (see Appendix-1) resulted in a training dataset (n=14,508 matched case/control pairs, of which 51 patient pairs with no dropped comorbidities), and a replication dataset (n=14,508 matched case/control pairs, of which 51 patient pairs with no dropped comorbidities), matched by age, sex, race, and Medicaid eligibility (a proxy for economic status). The feature selection method (see Appendix-2) used 45 unique comorbidities identified from a union of the three comorbidity indices, plus 2 condition-specific comorbidities. Of these, 3 were removed because of <1% prevalence. Of the remaining, 30 survived the significance and replication testing with Bonferroni correction. The visual analytical model used these surviving comorbidities (d=30), and cases consisting of CHF readmitted patients with at least one of those comorbidities (n=14,457). As shown in Fig. 3, the bipartite network analysis identified 4 biclusters, each representing a subgroup of readmitted COPD patients and their most frequently co-occurring comorbidities. The biclustering had significant modularity (Q=0.17, z=7.3, P<.001), and significant replication (RI=0.92, z=11.62, P=<.001) of comorbidity co-occurrence. Furthermore, as requested by the clinician stakeholders, we juxtaposed a ranked list of comorbidities based on their ORs for readmission in each bicluster, in addition to the risk for each of the patient subgroups.
The pulmonologist inspected the visualization and noted that the readmission risk of the patient subgroups had a wide range (12.7% to 19.6%) with clinical (face) validity. Furthermore, the co-occurrence of comorbidities in each patient subgroup was clinically meaningful with interpretations for each subgroup. Subgroup-1 had a low disease burden with uncomplicated hypertension leading to the lowest risk (12.7%). This subgroup represented patients with early organ dysfunction and would benefit from using checklists such as regular monitoring of blood pressure in pre-discharge protocols to reduce the risk of readmission. Subgroup-3 had mainly psychosocial comorbidities, which could lead to aspiration precipitating pneumonia leading to an increased risk for readmission (15.9%). This subgroup would benefit from early consultation with specialists (e.g., psychiatrists, therapists, neurologists, and geriatricians) that had expertise in psycho-social comorbidities, with a focus on the early identification of aspiration risks and precautions. Subgroup-2 had diabetes with complications, renal failure and heart failure and therefore had higher disease burden leading to an increased risk for readmission (17.8%) compared to Subgroup-1. This subgroup had metabolic abnormalities with greater end-organ dysfunction and would therefore benefit from case management from advanced practice providers (e.g., nurse practitioners) with rigorous adherence to established guidelines to reduce the risk of readmission. Subgroup-4 had diseases with end-organ damage including gastro-intestinal disorders, and therefore had the highest disease burden and risk for readmission (19.6%). This subgroup would also benefit from case management with rigorous adherence to established guidelines to reduce the risk of readmission. Furthermore, as patients in this subgroup typically experience complications that could impair their ability to make medical decisions, they should be provided with early consultation with a palliative care team to ensure that care interventions align with patients’ preferences and values.
CHF
The inclusion and exclusion selection criteria (see Appendix-1) resulted in a training dataset (n=25,775 matched case/control pairs, of which 103 patient pairs with no dropped comorbidities) and a replication dataset (n=25,775 matched case/control pairs, of which 104 patient pairs with no dropped comorbidities), matched by age, sex, race, and Medicaid eligibility (a proxy for economic status). The feature selection method (see Appendix-2) used 42 unique comorbidities identified from a union of the three comorbidity indices, plus 1 condition-specific comorbidity. Of these, 1 comorbidity was removed because of <1% prevalence. Of those remaining, 37 survived the significance and replication testing with Bonferroni correction. The visual analytical model (Fig. 4) used these surviving comorbidities (d=37), and cases consisting of CHF readmitted patients with at least one of those comorbidities (n=25,672). As shown in Fig. 4, the bipartite network analysis of the CHF cases identified 4 biclusters, each representing a subgroup of readmitted CHF patients and their most frequently co-occurring comorbidities. The analysis revealed that the biclustering had significant modularity (Q=0.17, z=8.69, P<.001), and significant replication (RI=0.94, z=17.66, P<.001) of comorbidity co-occurrence. Furthermore, as requested by the clinicians, we juxtaposed a ranked list of comorbidities based on their ORs for readmission in each bicluster, in addition to the risk for each of the patient subgroups.
The geriatrician inspected the visualization and noted that the readmission risk of the patient subgroups, ranging from 15.1% to 19.9%, was wide with clinical (face) validity. Furthermore, the co-occurrence of comorbidities in each patient subgroup was clinically meaningful. Subgroup-1 had chronic but stable conditions, and therefore had the lowest risk for readmission (15.1%). Subgroup-3 had mainly psychosocial comorbidities, but were not as clinically unstable or fragile compared to subgroups 2 and 4, and therefore had medium risk (16.6%). Subgroup-2 had severe chronic conditions, making them clinically fragile (with potential benefits from early palliative and hospice care referrals), and were therefore at high risk for readmission if non-palliative approaches were used (19.9%). Subgroup-4 had severe acute conditions which were also clinically unstable, associated with substantial disability and care debility, and therefore at high risk for readmission and recurrent intensive care unit (ICU) use (19.9%).
THA/TKA
The inclusion and exclusion selection criteria (see Appendix-1) resulted in a training dataset (n=8,249 matched case/control pairs, of which 1239 patient pairs with no dropped comorbidities) and a replication dataset (n=8,249 matched case/control pairs, of which 1264 patient pairs with no dropped comorbidities), matched by age, sex, race, and Medicaid eligibility (a proxy for economic status). The feature selection (see Appendix-2) used 39 unique comorbidities identified from the three comorbidity indices plus 2 condition-specific comorbidities. Of these, 11 comorbidities were removed because of <1% prevalence. Of the remaining, 11 survived the significance and replication testing with Bonferroni correction. The visual analytical model (Fig. 5) used these surviving comorbidities (d=11), and cases consisting of readmitted patients with at least one of those comorbidities (n=7,010).
As shown in Fig. 5, the bipartite network analysis of the THA/TKA cases identified 7 biclusters, each representing a subgroup of readmitted THA/TKA patients and their most frequently co-occurring comorbidities. The analysis revealed that the biclustering had significant modularity (Q=0.31, z=2.52, P=.011), and significant replication (RI=0.89, z=3.15, P=.002) of comorbidity co-occurrence. Furthermore, as requested by the clinician stakeholders, we juxtaposed a ranked list of comorbidities based on their ORs for readmission in each bicluster, in addition to the risk for each of the patient subgroups.
The geriatrician inspected the network and noted that TKA patients, in general, were healthier compared to THA patients, and therefore the network was difficult to interpret when the two index conditions were merged together. While our analysis was constrained because we were using the conditions as defined by CMS, these results nonetheless suggest that the interpretations did not suffer from a confirmation bias (manufactured interpretations to fit the results). However, he noted that the range of readmission risk had clinical (face) validity.
Furthermore, subgroups 2, 4, and 5 had more severe comorbidities related to lung, heart, and kidney, and therefore had a higher risk for readmission compared to subgroups 1, 6, and 7 that had less severe comorbidities with a lower risk for readmission. In addition, subgroups 2, 5, 6 and 7 would benefit from chronic care case management from advanced practice providers (e.g., nurse practitioners). Finally, subgroups 2 and 5 could benefit from using well-established guidelines for CHF and COPD, subgroup 7 would benefit from mental health care and management of psycho-social comorbidities, and subgroup 6 would benefit from care for obesity and metabolic disease management.
Classification Modeling
The classification model used multinomial logistic regression in each index condition (see Appendix-3 for the model coefficients) to predict the membership of patients using subgroups (identified from the above visual analytical models). The results revealed that in each index condition, the classification model had high accuracy in classifying all the cases in the full dataset (training dataset used in the visual analytical modeling). Similarly, the internal validation results using a 75%-25% split of the above dataset also had high classification accuracy (Table 3 with classification accuracy divided into quantiles). We report both results for each index condition:
COPD
The model correctly predicted subgroup membership for 99.90% of the cases (14443/14457) in the full dataset. Furthermore, as shown in Table 3, the internal validation results revealed that the percentage of COPD cases correctly assigned to a subgroup in the testing dataset, ranged from 99.10% to 100.00%, with a median (Q.50) of 99.60%, and with 95% being in the range from 99.30% to 99.80%.
CHF
The model correctly predicted subgroup membership for 99.20% of the cases (25476/25672) in the full dataset. Furthermore, as shown in Table 3, the internal validation results revealed that the percentage of CHF cases correctly assigned to a subgroup in the testing dataset, ranged from 98.70% to 99.70%, with a median (Q.50) of 99.30%, and with 95% being in the range from 99.00% to 99.60%.
THA/TKA
The model correctly predicted subgroup membership 100.00% of the cases (7010/7010) in the full dataset. Furthermore, as shown in Table 3, the internal validation results revealed that the percentage of CHF cases correctly assigned to a subgroup in the testing dataset, ranged from 99.40% to 100.00%, with a median (Q.50) of 99.90%, and with 95% being in the range from 99.70% to 100.00%.
Application of the Classification Model to Generate Information for Other Models
The above classification model was used to classify 100% cases and 100% controls for use in the prediction model (described below). Furthermore, the proportion of cases and controls classified into each subgroup was used to calculate the risk of readmission for each subgroup (see Appendix 3). As this subgroup risk information was requested by the clinicians to improve interpretability of the visual analytical model, the values were juxtaposed next to the respective subgroups in the bipartite network visualizations (see blue text in Fig. 3-5).
Prediction Modeling
For each of the three index conditions, we developed two binary logistic regression models to predict readmission, with comorbidities in addition to sex, age, and race: (1) Standard Model representing all patients without subgroup membership, similar to the CMS models; and (2) Hierarchical Model with an additional variable that adjusted for subgroup membership.
COPD
The inclusion and exclusion selection criteria (see Appendix-1) resulted in a cohort of 186,041 patients (29,026 cases and 157,015 controls). As shown in Fig. 6A, the Standard Model had a C-statistic of 0.624 (95% CI: 0.617-0.631) which was not significantly (P=.8578) different from the Hierarchical Model that had a C-statistic of 0.625 (95% CI: 0.618-0.632). The calibration plots revealed that both models had a slope close to 1, and an intercept close to 0 (see Appendix-4).
As shown in Fig. 6B, the Standard Model was used to measure the predictive accuracy of patients in each subgroup separately. The results showed that Subgroup-1 had a lower C-statistic compared to Subgroup-3 and Subgroup-4. While the C-statistics in Fig. 6A and Fig. 6B cannot be compared as they are based on models developed from different populations, these results reveal that the current CMS readmission model for CHF might be underperforming for one COPD patient subgroup, pinpointing which one might benefit by a Subgroup-Specific Model.
CHF
The inclusion and exclusion selection criteria (see Appendix-1) resulted in a cohort of 295,761 patients (51,573 cases and 244,188 controls). As shown in Fig. 7A, the Standard Model had a C-statistic of 0.600 (95% CI: 0.595-0.605), which was not significantly different (P=.2864) from the Hierarchical Model that also had a C-statistic of 0.600 (95%CI: 0.595-0.605). The calibration plots revealed that all models had a slope close to 1, and an intercept close to 0 (see Appendix-4).
As shown in Fig. 7B, the Standard Model was used to measure the predictive accuracy of patients in each subgroup separately. The results showed that Subgroup-1 had a lower C-statistic compared to Subgroup-4. While the C-statistics in Fig. 7A and Fig. 7B cannot be compared as they are based on models developed from different populations, but similar to the results in COPD, these results reveal that the current CMS readmission model for CHF might be underperforming for one CHF patient subgroup, pinpointing which one might benefit by a Subgroup-Specific Model.
THA/TKA
The application of the inclusion and exclusion selection criteria (see Appendix-1) resulted in a cohort of 356,772 patients (16,520 cases and 340,252 controls). As shown in Fig. 8A, the Standard Model had a C-statistic of 0.638 (95% CI: 0.629-0.646), which was not significantly different (P=.6817) from the Hierarchical Model that had a C-statistic of 0.638 (95% CI: 0.629-0.647). The calibration plots (see Appendix-4) revealed that both models had a slope close to 1, and an intercept close to 0 (see Appendix-4).
As shown in Fig. 8B, the Standard Model was used to measure the predictive accuracy of patients in each subgroup separately. The results showed that Subgroup-1 had a lower C-statistic compared to Subgroup-4. Again, while the C-statistics in Fig. 8A and Fig. 8B cannot be compared as they are based on models developed from different populations, similar to the results in COPD, these results reveal that the current CMS readmission model for THA/TKA might be underperforming for 4 patient subgroups, pinpointing which ones might benefit by Subgroup-Specific Models.
CMS Standard Model vs. CMS Hierarchical Model
Unlike the CMS published models, the above models used only the comorbidities that survived feature selection. Therefore, to perform a head-to-head comparison with the published CMS models, we also developed a CMS Standard Model (using the same variables from the published CMS model), and compared it to the corresponding CMS Hierarchical Model (with an additional variable for subgroup membership) in each condition. Similar to the models in Fig. 6-8, there were no significant differences in the C-statistics between the two modeling approaches in any condition (see Appendix-4). However, as shown in Table 4, the CMS Hierarchical Model for COPD had significantly higher NRI, but not significantly higher NDI compared to the CMS Standard Model; the CMS Hierarchical Model for CHF had a significantly lower NRI and IDI compared to the CMS Standard Model, and the CMS Hierarchical Model for THA/TKA had a significantly higher NDI and IDI compared to the CMS Standard Model. Furthermore, similar to the results in 6B-8B, when the CMS Standard Model was used to predict readmission separately in subgroups within each index condition, it identified subgroups that underperformed, pinpointing which ones might benefit by a Subgroup-Specific Model (See Appendix-4). In summary, the comparisons between the CMS Standard Models and the respective CMS Hierarchical Models showed that in two conditions (COPD and THA/TKA), there was a small but statistically significant improvement in discriminating between the readmitted and not readmitted patients as measured by NRI, but not as measured by the C-statistic or IDI, and that a subgroup in each index condition might be underperforming when using the CMS Standard Model.
DISCUSSION
Overview
Our overall approach of using the MIPS framework for identifying patient subgroups through visual analytics, and using those subgroups to build classification and prediction models, revealed strengths and limitations for each modeling approach, and for our data source. This examination led to insights for developing future clinical decision support systems, and a methodological framework for improving the clinical interpretability of subgroup modeling results.
Strengths and Limitations of Modeling Methods and Data Source
Visual Analytical Modeling
The results revealed three strengths of the visual analytical modeling: (1) the use of bipartite networks to simultaneously model patients and comorbidities, enabled the automatic identification of patient-comorbidity biclusters, and the integrated analysis of co-occurrence and risk; (2) the use of a bipartite modularity maximization algorithm to identify the biclusters enabled the measurement of the strength of the biclustering, critical for gauging its significance; and (3) the use of a graph representation enabled the results to be visualized through a network. Furthermore, the request from the domain experts to juxtapose the risk of each subgroup with their visualizations appeared to be driven by a need to reduce working memory loads (from having to remember that information spread over different outputs), which could have enhanced their ability to match bicluster patterns with chunks (previously-learned patterns of information) stored in long-term memory. The resulting visualizations enabled them to recognize subtypes based on co-occurring comorbidities in each subgroup, reason about the processes that precipitate readmission based on the risk of each subtype relative to the other subtypes, and propose interventions that were targeted to those subtypes and their risks. Finally, the fact that the geriatrician could not fully interpret the THA/TKA network because it mixed two fairly different conditions, suggests that the clinical interpretations were not the result of a confirmation bias (interpretations leaning towards fitting the results).
However, the results also revealed two limitations: (1) while modularity is estimated using a closed-form equation (formula), no closed-form equation exists to estimate the modularity variance, which is necessary to measure its significance. To estimate modularity variance, we therefore used a permutation test by generating 1000 random permutations of the data, and then compared the modularity generated from the real data to the mean modularity generated from the permuted data. Given the size of our datasets (ranging from 7K-25K patients), this computationally-expensive test took approximately 7 days to complete, despite the use of a dedicated server with multiple cores; and (2) while bicluster modularity was successful in identifying significant and meaningful patient-comorbidity biclusters, the visualizations themselves were extremely dense, and therefore potentially concealed patterns within and between the subgroups. Future research should explore a closed-form equation to estimate modularity variance, with the goal of accelerating the estimation of modularity significance, and more powerful analytical and visualization methods to reveal intra- and inter-cluster associations in large and dense networks.
Classification Modeling
The results revealed two strengths of the classification modeling: (1) the use of a simple multinomial classifier was adequate to predict with high accuracy to which subgroup a patient belonged; (2) because the model produced membership probabilities for each patient for each subgroup, the model captured the dense inter-cluster edges observed in the network visualization; and (3) the coefficients of the trained classifier could be inspected by an analyst making it more transparent (relative to most deep-learning classifiers which tend to be a black box).
However, because we dichotomized the classification probabilities into a single subgroup membership, our approach did not fully leverage the membership probabilities for modeling and visual interpretation. For example, some patients have high classification probabilities (representing strong membership) to a single subgroup (as shown by patients in the outer periphery of the biclusters with edges only within their bicluster), whereas others have equal probabilities to all subgroups (as shown in the inner periphery of the biclusters with edges going to multiple clusters). Future research should explore incorporating the probability of subgroup membership into the design of hierarchical models to improve predictive accuracy, and visualization methods to help clinicians interpret patients with different profiles of membership strength, with the goal of designing patient-specific interventions.
Predictive Modeling
The results revealed two strengths of the predictive modeling: (1) the use of the Standard Model to measure predictive accuracy across the subgroups helped to pinpoint which subgroups tend to have lower predictive accuracy compared to the rest, and therefore which of them could benefit from a more complex but accurate subgroup-specific model; and (2) despite the use of a simple Hierarchical Model with a dichotomized membership label for each patient, the predictive CMS models detected significant differences in the prediction accuracy as measured by NRI in two of the conditions, when compared to the CMS Standard Models. However, the results also revealed that the differences in predictive accuracy as measured by the C-statistic and NDI were small, suggesting that comorbidities on their own were potentially insufficient for accurately predicting readmission. Future research should explore the use of electronic health records, and multiple subgroup-specific models targeted to each subgroup, to improve the predictive accuracy of the models.
Data Source
The Medicare claims data had four key strengths: (1) scale of the datasets which enabled subgroup identification with sufficient statistical power; (2) spread of the data collected from across the US which enabled generalizability of the results; (3) data about older adults which enabled examination of subgroups in an underrepresented segment of the US population; and (4) data used by CMS to build predictive readmission models, which enabled a head-to-head comparison with the hierarchical modeling approach.
However, the data had two critical limitations. (1) As we compared our models with the CMS models, we had to use the same definition for controls (90 days with no readmission) that had been used, which introduced a selection bias that exaggerates the separation between cases and controls. Similarly, by excluding patients who died, this exclusion criterion potentially biased the results towards healthier patients. (2) Administrative data have known limitations such as the lack of comorbidity severity and test results, which could strongly impact the accuracy of predictive models. Future research should consider the use of national-level electronic health record (EHR) data such as those being assembled by the National COVID Cohort Collaborative (N3C) [59], and the TriNetX [60] initiatives, which could overcome the above limitations by providing laboratory values and comorbidity severity, but could also introduce new as yet unknown limitations.
Implications for Clinical Decision-Support Systems that Leverage Patient Subgroups
While the focus of this project was to develop and evaluate the MIPS framework, its application to three index conditions coupled with extensive discussions with clinicians led to insights for designing a future clinical decision support system. Such a system could integrate outputs from all three models. As we have shown, the visual analytical model automatically identified and visualized the patient subgroups, which enabled the clinicians to comprehend the co-occurrence and risk information in the visualization, reason about the processes that lead to readmission in each subgroup, and design targeted interventions. The classification model leveraged the observation that many patients have comorbidities in other biclusters (shown by a large number of edges between biclusters), and accordingly generated a membership probability of a patient belonging to each bicluster, from which the highest was chosen for bicluster membership. Finally, the predictive model predicts the risk for readmission for a patient, by using in the future the most accurate model designed for the bicluster to which the patient belongs.
The outputs from the above models could be integrated into a clinical decision support system to provide recommendations for a specific patient using the following algorithm: (1) use the classifier to generate the membership probability (MP) of a new patient belonging to each subgroup; (2) multiply the MP in each subgroup with the patient’s risk (R) for readmission provided by the predictive model for that subgroup, to generate an importance score [IS = f(MP) X g(R)] for the respective intervention; (3) rank the subgroups and their respective interventions using IS; and (4) use the ranking to display in descending order, the subgroup comorbidity profiles along with their respective potential mechanisms, recommended treatments, and the respective IS. Such model-based information, displayed through a user-friendly interface, could enable a clinician to rapidly scan the ranked list to (a) determine why a specific patient’s profile fits into one or more subgroups, (b) review the potential mechanisms and interventions ranked by their importance, and (c) use the combined information to design a treatment that is customized for the real-world context of the patient. Consequently, such a clinical decision support system could not only provide a quantitative ranking of membership to different subgroups, and the importance score for the associated interventions, but also enable the clinician to understand the rationale underlying those recommendations, making the system interpretable and explainable. Comparative evaluation of such a system to standard care could determine its clinical efficacy.
Implications for Analytical Granularity to Enhance the Interpretability of Patient Subgroups
While the visual analytical model enabled the clinicians to interpret the patient subgroups, they were unable to interpret the associations within and between the subgroups due to the large number of nodes in each bicluster and the dense edges between them. Several network filtering methods [61, 62] have been developed to “thin out” such dense networks such as by dropping or bundling nodes and edges based on user-defined criteria, to improve visual interpretation. However, such filtering could bias the results, or modify the clusters resulting from the reduced data.
An alternate approach that preserves the full dataset leverages the notion of analytical granularity, where the data is progressively analyzed at different levels. For example, we have analyzed COVID-19 patients [11] at the cohort, subgroup, and patient levels, and we are currently using the same approach to examine symptom co-occurrence and risk at each level in Long COVID patients. Our preliminary results suggest that analyzing data at different levels of granularity enables clinicians to progressively interpret patterns such as within and between subgroups, in addition to guiding the systematic development of new algorithms. For example, at the subgroup level, we have designed an algorithm that identifies which patient subgroups have a significantly higher probability for having characteristics that are clustered in another subgroup, providing critical information to clinicians about how to design interventions for such overlapping subgroups; at the patient level, we have identified patients that are outliers to their subgroups based on their pattern of characteristics inside and outside their subgroup. Such patient outliers could be flagged to examine if they need individualized interventions versus those recommended for the rest of their subgroup. Such analytical granularity could therefore inform the design of interventions by clinicians, in addition to the design of decision support systems that provide targeted and interpretable recommendations to physicians, who can then customize them to fit the real-world context of a patient.
Implications of the MIPS Framework for Precision Medicine
While we have demonstrated the application of the MIPS framework across multiple readmission conditions, its architecture has three properties that should enable its generalizability across other medical conditions. First, as shown in Fig. 2, the framework is modular with explicit inputs and outputs, enabling the use of other methods at each of the three modeling steps. For example, the framework could use other biclustering (e.g., Non-negative Matrix Factorization [63]), classification (e.g., deep learning [64]), and prediction methods (e.g., subgroup-specific modeling [17]). Second, the framework is extensible, enabling an elaboration of the methods at each modeling step to improve the analysis and interpretation of subgroups. For example, as discussed above, the analytical granularity at the cohort, subgroup, and patient levels could improve the interpretability of subgroups in large and dense datasets. Third, the framework is integrative as it systematically combines the strengths of machine learning, statistical, and precision medicine approaches. For example, the visual analytical modeling leverages search algorithms to discover co-occurrence in large datasets; the classification and prediction modeling leverages probability theory to measure the risk of co-occurrence patterns; and clinicians leverage medical knowledge and human cognition to interpret patterns of co-occurrence and risk for designing precision-medicine interventions. Such integration of different models and their interpretation operationalizes team-centered informatics [65] designed to facilitate data scientists, biostatisticians, and clinicians in multidisciplinary translational teams [66] to work more effectively across disciplinary boundaries, with the goal of designing interventions for precision medicine. Our current research tests the generality of the MIPS framework in other conditions such as Long COVID and Post-Stroke Depression, with the goal of designing and evaluating precision medicine interventions targeted to patient subgroups.
CONCLUSIONS
Although a primary goal of precision medicine is to identify patient subgroups and to design targeted interventions, few methods automatically identify both patient subgroups and their co-occurring characteristics simultaneously, measure their significance, and visualize the results. Here we demonstrated the use of the MIPS framework, which used a three-step approach to model and interpret patient subgroups. A visual analytical method automatically identified statistically significant and replicated patient subgroups and their frequently co-occurring comorbidities. Next, a multinomial logistic regression classifier had high accuracy in correctly classifying patients into the patient subgroups identified by the visual analytical model. Finally, despite using a simple hierarchical logistic regression model to incorporate subgroup information, the predictive models had a statistically significant improvement in discriminating between the readmitted and not readmitted patients in two of the three readmission conditions, and further analysis pinpointed for which patient subgroups the current CMS model might be underperforming. Finally, by integrating the co-occurrence and risk patterns in a visualization, the MIPS framework enabled clinicians to interpret the patient subgroups, reason about mechanisms precipitating hospital readmission, and design targeted interventions.
However, evaluation of the methods across three readmission index conditions also helped to identify limitations of the models and the data. The visual analytical model was too dense to enable the clinicians to interpret the associations within and between the subgroups, and the absence of a closed-form equation to measure modularity variance required a computationally-expensive process to measure the significance of the biclustering. Furthermore, the small improvement in predictive accuracy suggested that comorbidities on their own were insufficient for predicting hospital readmission.
By leveraging the modular and extensible nature of the MIPs framework, future research should address the above limitations by developing more powerful algorithms which analyze subgroups at different levels of granularity to improve the interpretability of intra-and inter-cluster associations, and the evaluation of subgroup-specific models to predict outcomes. Furthermore, EHR data made available through national-level data initiatives such as N3C and TriNetX now provide access to critical variables including laboratory results and comorbidity severity, which should lead to higher predictive power for predicting adverse outcomes. Finally, extensive discussions with clinicians provided implications for the design of future decision support systems, which could integrate outputs from the three models to provide for a specific patient, predicted subgroup memberships, ranked interventions, along with associated subgroup profiles and mechanisms. Such interpretable and explainable systems could enable clinicians to use patient subgroup information for informing the design of precision medicine interventions, with the goal of reducing adverse outcomes such as unplanned hospital readmissions and beyond.
Data Availability
All data used in this study were available from the Centers of Medicare and Medicaid Services (CMS) after application with a fee, and signing a data use agreement (DUA) to analyze the deidentified data.
AKNOWLEDGEMENTS
We thank Tianlong Chen, Clark Andersen, Yu-Li Lin, and Emmanuel Santillana for performing the analyses on this project. This study was supported in part by the Patient-Centered Outcomes Research Institute (ME-1511-33194), the Clinical and Translational Science Award (UL1 TR001439) from the National Center for Advancing Translational Sciences at the National Institutes of Health, and by the National Library of Medicine (R01 LM012095) at the National Institutes of Health. The content is solely the responsibility of the authors, and does not necessarily represent the official views of the Patient-Centered Outcomes Research Institute, or the National Institutes of Health. The Medicare data were analyzed using a CMS data-use agreement (CMS DUA RSCH-2017-51404).