ABSTRACT
Background Despite the well-known impact of delirium on long-term clinical outcomes, identification of delirium in electronic health records (EHR) remains difficult due to inadequate assessment or documentation of delirium. The purpose of this research is to present a classification model that identifies delirium using retrospective EHR data. The classification model would support the additional identification of delirium cases otherwise undocumented during routine practice.
Methods Delirium was confirmed with the Confusion Assessment Method for the Intensive Care Unit (CAM-ICU). Age, sex, Elixhauser comorbidity index, drug exposures, and diagnoses were used as features to train the logistic regression and multi-layer perceptron models. The clinical notes from the EHR were parsed to supplement the features that were not recorded in the structured data. The model performance was evaluated with a 5-fold cross-validation area under the receiver operating characteristic curve (AUC).
Results Seventy-six patients (17 cases and 59 controls) with at least one CAM-ICU evaluation result during ICU stay from January 30, 2018 to February 20, 2018 were included in the model. The multi-layer perceptron model achieved the best performance in identifying delirium; mean AUC of 0.967 ± 0.019. The mean positive predictive value (PPV), mean negative predicted value (NPV), mean sensitivity, and mean specificity of the MLP model were 0.9, 0.88, 0.56, and 0.95, respectively.
Conclusion A simple classification model showed a mean AUC over 0.95. This model promises to identify delirium cases with EHR data, thereby enable a sustainable infrastructure to build a retrospective cohort of delirium in the ICU. The cohort would be useful for the evaluation of long-term sequelae of delirium in ICU.
INTRODUCTION
Delirium is a frequent complication among intensive care unit (ICU) patients, with its incidence ranging between 45% and 87% of all ICU patients.[1, 2] There are short-term and long-term impacts of delirium during an ICU stay on patients’ clinical outcomes. For instance, delirium is known to be associated with prolonged hospitalization, short and long term cognitive impairment, and increased healthcare costs.[3-5] Delirium has also been associated with increased short and long-term mortality.[6-8] Nevertheless, according to ICU delirium practice guidelines, there still exists a significant research gap regarding the long-term outcomes of delirium.[3] The establishment of retrospective delirium cohorts would be useful for long-term surveillance. However, the under-coding of delirium diagnoses and the burden of delirium screening in clinical practice inhibit the identification of delirium in the electronic health records (EHR) and the establishment of retrospective cohorts.[9, 10] A number of delirium prediction models have been developed.[11, 12] Some developed multivariable models using 4-9 preoperative variables [13-15] and other recent models used machine learning or deep learning algorithms including neural net.[16, 17] However, as stated, all of them used pre-operative (or before admission) characteristics since the goal of these models was to predict ahead of time patients who may develop delirium after certain interventions such as hip surgery, in-patient admission, or ICU stay. Therefore, new diagnoses or drug prescriptions during hospitalization periods were not included in the clinical prediction model.
In contrast, the focus of this research was to retrospectively identify ICU patients who experienced delirium during hospitalization using a classification model. Considering that the occurrence of delirium would elicit a change in treatment pattern during hospitalization, the inclusion of variables recorded during hospitalization in the model could potentially increase the accuracy of the classification model. The study population included only patients who had been evaluated for delirium in the ICU using the standard Confusion Assessment Method for the Intensive Care Unit (CAM-ICU).
METHODS
Data
This study was approved by Columbia University Irving Medical Center (CUIMC) institutional review board and informed consent was waived. The dataset for model development included patients from either the Surgical or Cardiothoracic ICU with at least one Confusion Assessment Method for the Intensive Care Unit (CAM-ICU) evaluation result during their ICU stay at New York Presbyterian Hospital (NYP) / CUIMC from January 30, 2018 to February 20, 2018. The CAM-ICU data were obtained as a part of a quality improvement project that aimed to improve recognition of delirium in the two ICUs. Raters received training (in the form of videos and a written manual) and performed CAM-ICU assessments on a convenience sample of patients. Interrater reliability was assessed using Gwet’s kappa in a sample of 15 patients and found to be high (0.9295, 95% confidence interval (CI) 0.7689-1.000).[18] If the patient was ever positive from at least one of the CAM-ICU evaluations, that patient was counted as having post-operative delirium.
Model Implementation
Following clinician guidance, we included the following features for model development: patients’ age at the time of admission, sex, Elixhauser comorbidity index, diagnoses (e.g., heart failure, gout, etc.), and drug prescription records. EHR data for these features were extracted from the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) version 5.3 formatted clinical data warehouse of NYP/CUIMC.[19] Different drug forms were regarded as different drug exposures, even if the drug ingredient was the same. For example, oral sodium bicarbonate and intravenous sodium bicarbonate were treated as different drug exposures. The Elixhauser comorbidity index was calculated with the records of diagnoses from six months prior to the admission to the date of admission. Age and Elixhauser comorbidity index were normalized to range from 0 to 1. Drug exposures and diagnoses were one-hot encoded (i.e., 1 denotes the presence of a drug exposure or diagnosis, while 0 denotes no drug exposure or diagnosis) and represented as a vector. A subset of EHR notes (e.g., transfer notes, delirium nurses’ notes, etc.) considered pertinent to ICU medical practice by clinicians was parsed. We selected notes written by the ICU team or consultants that would potentially be making assessments or recommendations for delirium. The MetaMap Lite version 3.6.2rc6 with NegEx algorithm was used in feature extraction from notes.[20, 21] The extracted concepts were normalized using the Human Phenotype Ontology[22] and further mapped to the OMOP CDM standard concepts in order to identify duplicate terms from the structured EHR data. Supplementary Table S1 lists the titles of the notes that were parsed.
We used two simple machine learning methods: logistic regression and multi-layer perceptron (MLP). We evaluated each of the models using two different inputs. The first input included the features from the structured EHR data. The second input included the phenotypes extracted from clinical notes as well as the features from the structured EHR data. The performance of each model was evaluated with 5-fold cross-validation. In each fold, the test set area under the receiver operating characteristic curve (AUC) was calculated, and the mean ± standard error of AUCs was presented. The validation set was not used in the model development. Hyperparameters were chosen using grid search based on the training loss. The number of epochs was set to 20. The learning rate was set to 0.001 for LR models and 0.0001 for MLP models. MLP models have a single hidden layer with 128 hidden units. The Adam optimizer was used in all models.[23] Python version 3.6.9 and Tensorflow version 2.3.1 were used.[24] The code is available in the following GitHub repository: https://github.com/WengLab-InformaticsResearch/delirium. The R package comorbidity was used to calculate the Elixhauser comorbidity index.[25]
RESULTS
We identified 76 patients who were admitted to NYP/CUIMC ICU and had evaluation results for the CAM-ICU for delirium from January 30, 2018 to February 20, 2018. Table 1 shows the characteristics of patients according to the delirium status. Their mean age was 63.4 ± 15.1 years and 59.2% (n = 45) of patients were male. The mean Elixhauser comorbidity index was 4.1 ± 3.2. The mean hospitalization duration was 28 ± 36 days. The CAM-ICU evaluation was conducted 123 times in 76 patients, with 1.6 times per patient on average. Among 76 patients, 17 patients (22.4%) had delirium in at least one of the CAM-ICU evaluations. 1,318 unique features were extracted from the structured data from the 76 patients. These patients had 2,650 EHR notes of the types included in this study (Supplementary Table S1). From these notes, 657 unique concepts were extracted, 499 of which (76.0%) were only in notes. The concepts from structured data and clinical notes are listed in the Supplementary Table S2.
Figure 1 shows the mean 5-fold cross-validation AUC of all evaluated models. The MLP model showed the highest mean AUC (0.967 ± 0.019), followed by MLP+notes (0.965 ± 0.032), LR+notes (0.962 ± 0.038), and LR (0.957 ± 0.014). Models with note concepts had higher standard deviation than models without notes. Figure 2 shows the receiver operating characteristic curve (ROC) of all evaluated models. When the threshold of the MLP model was 0.81, mean positive predictive value (PPV), mean negative predictive value (NPV), mean sensitivity, and mean specificity were 0.9, 0.88, 0.56, and 0.95, respectively. Figure 3 shows the precision-recall curve of all evaluated models. The precision-recall curve of the MLP model was well above the baseline of 0.22 (proportion of positive cases among evaluated samples).
DISCUSSION
We applied the logistic regression and MLP model to classify ICU patients into delirium cases or controls using retrospective EHR data. We expect our models to have potential in the identification of patients with delirium in ICU while minimizing the human effort to manually review the accumulated patients’ records. Such a model would be useful for patient follow-up for better determining the long-term medical sequelae of ICU patients with a history of delirium.
A few clinical prediction models have been developed to predict delirium either prospectively or retrospectively.[11, 12] Previous studies focused more on the development of predictive models rather than classification models. Therefore, existing predictive models used features mostly at baseline and did not fully incorporate the diagnosis or drug prescription during admission even with retrospective data. One study that used features during admission also limited the drug prescription records that were extracted 1 day before the diagnosis of delirium.[26] The higher AUC of our model can be attributed to the inclusion of patients’ comorbidities before and during admission thereby comprehensively incorporating clinical status into the model. Besides the higher mean AUC of the model, another advantage is that our model used the CAM-ICU to evaluate the patients as delirium positive or negative, while others used chart-based methods.[26, 27] The CAM-ICU is the recommended screening tool for delirium in ICU according to the practice guideline.[3]
We also compared the model with features extracted from the EHR notes with the model without the notes-derived concepts. Interestingly, inclusion of these additional concepts did not lead to improvement of model performance. Possible reasons include low frequency of each concept documented in notes. If more patients were included in the training data set, additional data from the unstructured EHR notes could add more value to the model. In addition, automated natural language processing using MetaMap Lite is known to have lower precision and recall when compared to manual review of clinical notes.[28] The use of automated natural language processing may have introduced noise into the model.
Our classification model would be useful in the identification of missed patients with delirium and thereby augments the clinical diagnosis of delirium in combination with patient screening at bedside, albeit retrospectively. Patients with delirium could be under-documented due to multiple reasons. First, factors including misinterpretation of patients’ status, documentation errors, and education gap may lead to inappropriate documentation of CAM-ICU evaluation results.[29, 30] Second, in clinical practice, not all patients in the ICU receive the CAM-ICU evaluation for delirium, despite quality improvement efforts.[31] Third, according to the study by Chanques et al., CAM-ICU had a sensitivity of 83%, and among the 108 CAM-ICU ratings, seven cases were false negatives.[32] For these reasons, a subset of delirium patients remains unevaluated, undocumented, or otherwise unidentified and hence lose a chance at being followed for the occurrence of complications of the delirium. Moreover, the evaluation of delirium was even more difficult in the recent COVID-19 pandemic.[33] The COVID-19 survivors who had ICU care may possess high-risk for long-term cognitive sequelae, with a recent multi-center study finding a delirium prevalence of 54.9%.[34]
This study has limitations. First, our model only included a small number of patients (n = 76) compared to previous studies that developed machine learning algorithms to classify delirium. Because of the small sample size, we used 5-fold cross validation to evaluate model performance instead of using a held-out test set to prevent overfitting. The small amount of data can also restrict developing or applying complex machine learning models that have a lot of parameters to train. Second, some information that is relevant to a comprehensive delirium evaluation, including neurological examination results (verbal, motor, and eye response), performance statuses, or magnetic resonance imaging (MRI) reports of the brain (FLAIR [Fluid-Attenuated Inversion Recovery] signal intensity, hippocampal atrophy, etc.), were not systematically available from the whole patient cohort. Third, the biomarkers that have recently been studied as having an association with delirium were not included as features in our model. Our cohort consisted of patients in early 2018 and the levels of biomarkers including tau, interleukin 8, and neurofilament light (NfL) protein were not available in most of the patients, as these tests are not yet routinely performed in a clinical setting. Further research is required to evaluate the role of these biomarkers in the classification model. Also, this study is subject to the traditional limitations of observational data. Finally, the portability of this model to other EHR data in other institutions should be further tested.
Conclusion
We present a classification model that identifies patients with a delirium episode during their ICU stay using retrospective data. The classification model showed high accuracy with a mean AUC over 0.95. The model could be used in the retrospective identification of undiagnosed delirium cases and the establishment of a delirium cohort for long-term evaluation and surveillance.
Data Availability
The datasets generated and/or analyzed during the current study are not publicly available due to patient privacy.
Author contributions
JHK wrote the manuscript; JHK, MH, TEG, RAW, and CW designed the research; MH provided the CAM-ICU evaluation results; MH selected the notes in the electronic health records for parsing; JL prepared the machine learning codes; JHK, JL, CL, and CNT refined the machine learning algorithm; JHK analyzed the data; CL contributed to the natural language processing of notes in the electronic health records; JHK, MH, RAW, JL, CL, CNT, ERM, TEG, and CW edited and approved the manuscript.
Funding
This study was sponsored by National Library of Medicine grant 5R01LM009886-11 and National Center for Advancing Clinical and Translational Science grants UL1TR001873 and 1OT2TR003434-01.
Availability of data and materials
The datasets generated and/or analyzed during the current study are not publicly available due to patient privacy.
Ethics approval and consent to participate
This study was approved by Columbia University Irving Medical Center (CUIMC) institutional review board and informed consent was waived.
Consent for publication
Not applicable.
Competing interests
The authors declare no conflict of interests
Acknowledgements
The authors would like to thank Kenrick D. Cato and Sarah C. Rossetti for reviewing this paper.