Abstract
Background Critically ill patients are managed with complex medication regimens that require medication management to optimize safety and efficacy. When performed by a critical care pharmacist (CCP), discrete medication management activities are termed medication interventions. The ability to define CCP workflow and intervention timeliness depends on the ability to predict the medication management needs of individual intensive care unit (ICU) patients. The purpose of this study was to develop prediction models for the number and intensity of medication interventions in critically ill patients.
Methods This was a retrospective, observational cohort study of adult patients admitted to an ICU between June 1, 2020 and June 7, 2023. Models to predict number of pharmacist interventions using both patient and medication related predictor variables collected at either baseline or in the first 24 hours of ICU stay were created. Both regression and supervised machine learning models (Random Forest, Support Vector Machine, XGBoost) were developed. Root mean square derivation (RMSE), mean absolute error (MAE), and symmetric mean absolute percentage error (sMAPE) were calculated.
Results In a cohort of 13,373 patients, the average number of interventions was 4.7 (standard deviation (SD) 7.1) and intervention intensity was 24.0 (40.3). Among the ML models, the Random Forest model had the lowest RMSE (9.26) while Support Vector Machine had the lowest MAE (4.71). All machine learning models performed similarly to the stepwise logistic regression model, and these performed better than a base model combining severity of illness with medication regimen complexity scores.
Conclusions Intervention quantity can be predicted using patient-specific factors. While inter-institutional variation in intervention documentation precludes external validation, our results provide a framework workload modeling at any institution.
Introduction
Medication therapy in the intensive care unit (ICU) is notoriously complex and high-risk requiring nuanced oversight and management.1–6 It has been shown that one in six ICU medication errors require intervention by a critical care pharmacist (CCP).7 Tailored medication interventions in the context of comprehensive medication management (CMM) performed by CCPs reduce preventable adverse drug events (ADEs) by 70%.8–11 However, staffing constraints and shortages combined with unpredictable and sometimes surging patient volumes may result in ICU patients receiving inconsistent CCP care.12, 13 Thus, predicting and prioritizing those patients with the highest need for CCP medication intervention(s) has important ramifications for workload justification and redesign at the administrative level and patient triage at the clinician level.1
The medication regimen complexity-intensive care unit (MRC-ICU) score was designed to quantify the complexity of a critically ill patient’s medication regimen in a way that reflects the cognitive services provided by CCPs. This score has been related to both patient-centered outcomes, including mortality,14 length of stay15, ICU complications (e.g., fluid overload16–18), duration of mechanical ventilation, and drug-drug interactions, and pharmacist workload, including intervention quantity and intensity. 19–25 Moreover, application of machine learning approaches in the context of MRC-ICU have revealed new insights in how medications relate to patient outcomes.26, 27 Finally, this score showed a stronger relationship to medication interventions as compared to traditional severity of illness indicators.28
However, the predictive capability of medication complexity scores like MRC-ICU for CCP interventions that considers mediating factors like patient severity of illness has never been evaluated. The purpose of this study was to develop prediction methods for total medication interventions during a patient’s ICU stay. We hypothesized that higher MRC-ICU scores and higher severity of illness would be predictive for greater CCP medication interventions.
Methods
Study Population
This was a retrospective, observational study that was reviewed by the University of Georgia (UGA) Institutional Review Board (IRB) and determined to be exempt from IRB oversight (Project00001541). All methods were performed in accordance with the ethical standards of the UGA IRB and the Helsinki Declaration of 1975. Patient data were obtained via the Oregon Clinical and Translational Research Institute, which houses Epic→ electronic health record (EHR) data from Oregon Health and Science University (OHSU) Hospital. OHSU is a 576 bed academic medical center with cardiothoracic, trauma/surgery, neurocritical care, and medical intensive care units. Pharmacists are integrated into the medical teams to provide CMM, including presence on multiprofessional rounds. A total of 13,373 patients aged 18 years or older were identified between June 1, 2020 and June 7, 2023. Data from the first ICU admission per each patient were included. Patients were excluded if it was not their index ICU admission, the ICU stay was less than 24 hours, or if the patient was placed on comfort care within the first 24 hours of their ICU stay.
Variables
The primary outcome was total medication interventions logged in Epic→ as i-Vents by a critical care pharmacist (CCP) during the patient’s ICU stay. The EHR was queried for relevant patient demographic information, medication information, and patient outcomes. Baseline characteristics including age, sex, race, ICU type, admission diagnosis, APACHE II score at 24 hours, and SOFA score at 24 hours, 48 hours, and 72 hours were collected. The MRC-ICU score was calculated at 24 hours, 48 hours, and 72 hours of the ICU stay. The MRC-ICU consists of 35 discrete medication categories with each category assigned a weighted value and then summed to create a score for a patient’s regimen at the given time point.29 For a patient prescribed vancomycin, norepinephrine, and docusate, they would be given 3 points for vancomycin, 1 point for norepinephrine, and 1 point for docusate for a total score of 5.
Interventions are logged by CCPs as part of routine care. The general expectation is that CCPs log interventions performed and categorize them using a standard categorization system that consists of 49 distinct categories.20 These categories are provided in Supplemental Table 1.
Additionally, these interventions were categorized into low, moderate, and high intensity intervention categories using a previously validated rating system to calculate intervention intensity.13 The composite score was equal to: (the number of low-intensity interventions) + 5*(the number of moderate-intensity interventions) + 25*(the number of high-intensity interventions). The weights were chosen for the three intensities of intervention to minimize an overlap of scores for different compositions of number of interventions.13
Following a literature review of pharmacist interventions in the ICU, potential predictor variables were identified by investigator consensus to include in each regression model. These variables included the following: baseline characteristics (age, sex, body mass index, ICU type, and admission diagnosis category) and 24-hour variables (sequential organ failure assessment (SOFA) at 24 hours, Acute Physiology and Chronic Health Evaluation II (APACHE II) at 24 hours, MRC-ICU at 24 hours, presence of delirium as indicated by positive confusion assessment method for the ICU (CAM-ICU) score.
Analysis
Due to the hypothesis-generating nature of this study, no attempt was made to estimate the sample size. All eligible patients from the available database were included to maximize statistical power of the predictive models developed. Descriptive statistics were performed for relevant variables. Continuous variables were summarized by the mean and standard deviation, and categorical variables reported count and proportion of the total population. Clinical characteristics between patients with high and low interventions (defined as ≤ 5 (median) or not) were compared using either Student’s t-test or Chi-square test, as appropriate. Univariate analysis was performed on baseline variables to detect potential important variables for predicting intervention quantity and intensity. A two-sided p-value less than 0.05 was used to determine statistical significance for all outcomes. All analyses were performed using R (version 4.1.2).
A histogram for intervention count was plotted (see Supplemental Figure 1). MRC-ICU score was plotted by ICU day (see Supplemental Figure 2 and 3). High and low interventions were differentiated based on median and visual analysis. Mortality and intervention total were plotted by MRC-ICU decile (see Supplemental Figure 4 and 5, respectively). A composite score of severity of illness (as described by SOFA or APACHE II) and medication regimen complexity (as described by MRC-ICU) was plotted against interventions (see Supplemental Figure 6).
A total of six different prediction models were developed to predict the primary outcome (number of interventions during hospital stay) and secondary outcome (intensity score for interventions during hospital stay). These models included traditional regression (negative binomial regression with full predictors, stepwise selected predictors, and a simple model of two key predictors (MRC-ICU and SOFA score only)), and three machine learning models including Random Forest, Support Vector Machine (SVM), and XGBoost. Collinearity assessment was conducted via the variance inflation factor (VIF) or correlation matrix (see Supplemental Table 2).
Multiple imputation with 10 imputations per variable was applied for all missing data. Each model was trained on 10 imputed training datasets, and predictions were made on the corresponding 10 imputed testing datasets. The results from the 10 sub-models were pooled together based on Rubin’s rule. Univariate and multivariate analysis on full variables were performed on the 10 different imputed datasets. For each model, 80% of the observations were used as training data, and 20% were reserved as testing data.
Traditional regression models
Multivariable generalized linear models (GLM) were developed to evaluate the relationship of MRC-ICU on ICU day 1 with total interventions. Both GLM with Poisson distribution and negative binomial distribution were explored (see Supplemental Appendix A). For variable selection, to ensure a narrower and more precise selection, model selection using Bayesian information criterion (BIC) with the additional option k=log(n) in a stepwise algorithm for each imputed dataset was performed. In the case of different variables being selected across the imputed datasets, the frequency of each variable being selected was recorded followed by the Wald test to determine whether each variable should remain in the final model in a stepwise process. A temporal analysis was conducted for changes in SOFA and MRC-ICU score (see Appendix B).
Supervised machine learning models
During the model training for Random Forest, SVM and XGBoost, 5-fold cross validation was used to select hyperparameters. With those optimal hyperparameters, the trained model was fitted again on the whole training set. For Random Forest, number of trees and number of variables randomly sampled as candidates at each split were tuned. For SVM, linear kernel and cost of constraints violation were tuned. For XGBoost, maximum depth of a tree and maximum number of boosting iterations were tuned. Feature importance plots for Random Forest, SVM and XGBoost were plotted. Because ten different models were used on each imputed dataset, ten different feature importance lists were generated for each.
Model evaluation
RMSE, MAE, and sMAPE were calculated for each set of predictions, and the results were pooled together. To mitigate the influence of outliers and skewed error distributions on performance metrics, the 95% quantile Root Mean Squared Error (RMSE), mean absolute error (MAE), and symmetric mean absolute percentage error (sMAPE) were calculated to provide a more robust assessment. Rate ratios (RR=eβ, β is the estimated coefficient from the regression model) for the simple and stepwise model were calculated after variable selection and were reported for predictors of interest along with 95% confidence intervals (CI). The 95% quantile for both RMSE and MAE refer to the values below which 95% of the error values fall (with the remaining 5% as extreme errors), with lower values indicating better performance.
SHAP (SHapley Additive exPlanations) is a method based on game theory for explaining individual predictions from machine learning models.30 SHAP assigns a value to each feature, representing its impact on the prediction when that feature is included in the model compared to its absence, and thus provides a means to understand the contribution of each feature to the model’s prediction for a particular instance. SHAP values were calculated for each individual prediction, and SHAP bee swarm charts were used to visualize the feature impact.31, 32 Given that the feature importance plots and associated predictions were highly uniform across imputed datasets, the bee swarm chart were plotted based on on-fold of imputed dataset with the lowest RMSE for concise interpretation.
Subgroup Analysis and Error Distribution
A subgroup analysis was conducted by categorizing interventions into high, medium, or low groups to evaluate the performance of prediction models across different subgroups. This approach was chosen because despite similar overall performance metrics found for each model, the distribution of predicted values varied significantly. We hypothesized that model performance might vary across intervention groups (e.g., one model might perform well in the low intervention group but poorly in the high intervention group whereas another model might show the opposite performance trend, resulting in similar overall performance metrics despite differing subgroup performance). To explore these differences, we analyzed the error distribution for each model. For models with comparable overall Root Mean Squared Error (RMSE), we conducted detailed subgroup analyses and plotted the error distributions for each model. Additional details are available in Appendix C.
Results
A total of 13,373 patients were included in the analysis. The average age was 59.8 ± 17.4 years with an APACHE II score of 9.1 ± 4.1 at 24 hours and MRC-ICU score of 5.3 ± 3.9 at 24 hours. The overall mortality rate was 10.2% with 55.8% undergoing mechanical ventilation at 24 hours. Population characteristics are summarized in Table 1. The mean number of pharmacist interventions during the ICU stay was 4.7 ± 7.1 (see Table 2). Additionally, Supplemental Table 3 provides a characterization of medication interventions by type and intensity level with discontinuation of clinically unwarranted medications (13.4%) and dosage adjustments (12.1%) being the top two categories.
Following multivariable analysis, it was observed that for every one point increase in the MRC-ICU, SOFA, and APACHE II score, there was approximately 10% more interventions made for the patient (see Table 3). Supplemental Table 4 reports the final regression model for prediction of total interventions with a total of 12 variables, and Supplemental Table 5 provides the model for a simple model of SOFA and MRC-ICU.
A total of 6 models were developed: a base model of MRC-ICU and APACHE II, full and stepwise regression, and three supervised machine learning models (see Table 4). All models performed similarly, while the Support Vector Machine and XGBoost models performed most consistently across all three metrics. The Random Forest model had the lowest RMSE (9.26) while Support Vector Machine had the lowest MAE (4.71). Notably, the best performing models all outperformed a parsimonious model combining severity of illness with medication regimen complexity, although these metrics do seem to give some indication as to projected CMM needs. Figure 1 depicts bee swarm plots with SHApley Additive exPlanations (SHAP) to depict the relative feature importance for the models. Regression models (see Supplemental Tables 6-7) and machine learning models (see Table 3) were also developed for intervention intensity.
An exploratory analysis was conducted to evaluate the role of outliers in intervention quantity, given that reporting of interventions is dependent on a healthcare professional in the context of daily practice. With a 95% quantile that excluded extreme outliers, all sMAPE values were reduced, with Support Vector Machine and XGBoost performing the best at predicting interventions in the normal range and most errors due to more extreme outliers (see Table 4). The full regression, selected regression, and Random Forest performed better with the extreme values in their predictions (see Appendix C).
Discussion
Using a specifically curated large dataset of CMM activities in critically ill patients, a series of machine learning based models were developed and found to successfully predict CMM activities, as quantified by pharmacist intervention quantity and intensity. The use of patient-specific data including traditional severity of illness scores plus CMM-oriented scores in the form of the MRC-ICU and granular pharmacist CMM activity in a large cohort marks this paper to be the first evaluation of its kind and may be useful for institutions looking to predict pharmacist workload. Performance between a stepwise regression and machine learning based models did not differ substantially; however, Random Forest and Support Vector Machine had the best performance when measured by RMSE and MAE, respectively.
The model results were reasonable, and the relationships among the covariates and outcomes were clear. Given the skewness of the intervention distributions, models showed reduced accuracy for the patients at the tail ends of intervention counts. Generally, this study suggests that severity of illness (in the form of SOFA score) and medication regimen complexity are strongly associated with critical care pharmacist interventions; however, more sophisticated models are likely necessary for improving predictive power. The similar performance between supervised machine learning methods and traditional regression has been observed in other medication related evaluations.33, 34 This may be due to the relatively small sample size given the heterogeneity of critically ill patients and medication use in the ICU, but also serves as an important reminder that AI based evaluations should be benchmarked against existing clinical standards and more interpretable models.
Workload optimization is important for patient and healthcare worker outcomes.25, 35 While cognitive overload is well known to reduce the quality of work across a variety of domains, higher patient care burdens are associated with worse patient outcomes such that standards exists for ICU physicians and nurses.11, 25, 35–37 The potential for additional personnel on an ‘as needed’ basis or back-up teams to be accessed for high patient care burden days in an objective and reproducible manner may be an important quality improvement strategy. CMM is a cognitive service performed with the goal to optimize medication therapy; however, a challenge to tracking such services is the lack of physical intervention.2, 38, 39 Reviewing a medication therapy in the context of the patient’s disease course and clinical status and comparing against current guidelines and emerging evidence is decidedly ephemeral compared to the more tangible services of central lines placed or procedures completed.40 Intervention tracking as a metric of CMM has been critiqued because it does not capture the cognitive labor of the process.40 Intervention intensity is intended to striate between lower and higher levels of such cognitive labor, but again, fails to capture times when ultimately no intervention was made, even after some amount of cognitive effort, and importantly times when a CCP made a recommendation that was not ultimately accepted by the ICU team.29 As such, prediction models that track intervention quantity or intensity cannot capture this element. However, they do have the potential to serve as markers of workload or potentially unsafe care processes that require improvement. For example, repeated interventions related to the use of neuromuscular blockers may open a discussion among ICU clinicians/leaders regarding standardized order set development.
Limitations of our evaluation include that it is a retrospective and single center study, precluding causal assessment and external validation. However, high quality, granular datasets with both CMM data, medication data, and patient data are not widely available, and this evaluation provides a novel foundation for future curation of these valuable datasets. While external validation is a gold standard for model building and testing, there is a high rate of practice variation in how institutions record pharmacist CMM activity such that readily comparable datasets are not available. However, steps were taken to increase the generalizability our results, notably through the re-coding of intervention categories into previously published categories supported by a systematic review and a previously used intensity scoring system. This methodology supports future investigations into external validation.
Conclusion
Pharmacist interventions have a direct relationship with patient severity of illness and medication regimen complexity and may be predicted using machine learning approaches. Workload prediction has the potential to improve patient outcomes following the appropriate implementation and evaluations of such models.
Data Availability
All data produced in the present study are available upon reasonable request to the authors
Supplemental Digital Content
Co-linearity via variance inflation factor (VIF) or correlation matrix was assessed for suspected variables of interest (e.g., CRRT, mechanical ventilation, MRC-ICU, SOFA, and APACHE II score). VIF: Values above 5 indicate significant collinearity. 1 < VIF ≤ 5: Indicates moderate multicollinearity. Generally considered acceptable. VIF > 5: Indicates high multicollinearity. This suggests that the predictor variable is highly correlated with other predictors. The VIF for CRRT, Mechanical Ventilation, MRC-ICU, SOFA, APACHE all <2, which means each variable are not highly correlated with other predictors. Correlation Matrix (for continuous variables only) look for pairs of predictors with high correlation coefficients (e.g., above 0.8 or below −0.8). For the correlation matrix of the three continuous scores, their coefficients are all < 0.6, indicating minimal co-linearity.
Appendix A. Model Selection
Multivariable generalized linear models (GLM) were developed to evaluate the relationship of MRC-ICU on ICU day 1 in relation to total interventions. Both GLM with Poisson distribution and negative binomial distribution were explored. Poisson regression is a common approach when the response variable is a count like number of interventions. However, it assumes that the mean and variance of the count data are equal. This is often not the case in real-world data.
Overdispersion occurs when the observed variance is greater than the mean. An alternative approach to handle overdispersion is a negative binomial model. The negative binomial model introduces an additional parameter (the dispersion parameter) to account for the extra variability in the data. By accommodating overdispersion, the negative binomial model provides a better fit for the data, leading to more reliable estimates and inference. Because the variance of the total intervention (93.22) was much greater than its mean (7.40), and the Pearson statistics from full multivariate Poisson regression without missing imputation also indicated overdispersion (Pearson statistics=95076, df=11676, p-value<0.001), the Negative Binomial regression model was selected as a substitute for the Poisson regression. Similarly, because the variance of the intervention intensity score (2774.91) was much greater than its mean (37.68), and the Pearson statistics from full multivariate Poisson regression without missing imputation also indicated overdispersion (Pearson statistics=620060, df=11676, p-value<0.001), the Negative Binomial regression model was selected in substitute of the Poisson regression.
Appendix B. Temporal Analysis
Mixed Model on Daily Effect
The daily or temporal time effect of SOFA score and MRC-ICU score on the daily number of interventions was analyzed. One unit higher in MRC-ICU score was associated with 10.6% increase in number of interventions at baseline. There was no significant effect of SOFA score on number of interventions at baseline. Compared with the previous day, the effect of MRC score on number of interventions is 0.5% lower, but the effect of SOFA score on number of interventions is 1.5% higher.
Appendix C. Subgroup analysis
Acknowledgements
Amoreena Most, PharmD, BCCCP; OHSU Oregon Clinical and Translational Research Institute
Footnotes
Disclaimers: The authors have no conflicts of interest.
Funding: Funding through Agency of Healthcare Research and Quality for Dr. Sikora was provided through R21HS028485 and for Drs. Sikora and Shen through R01HS029009. Dr. Shen’s work was also partially supported by R35GM146612 from the National Institute of General Medical Sciences.