1 Abstract
Purpose Predicting 30-day readmission risk is paramount to improving the quality of patient care. Previous studies have examined clinical risk factors associated with hospital readmissions. In this study, we compare sets of patient, provider, and community-level variables that are available at two different points of a patient’s inpatient encounter (first 48 hours and the full encounter) to train readmission prediction models in order to identify and target appropriate actionable interventions that can potentially reduce avoidable readmissions.
Methods Using EHR data from a retrospective cohort of 2460 oncology patients, two sets of binary classification models predicting 30-day readmission were developed; one trained on variables that are available within the first 48 hours of admission and another trained on data from the entire hospital encounter. A comprehensive machine learning analysis pipeline was leveraged including preprocessing and feature transformation, feature importance and selection, machine learning modeling, and post-analysis.
Results Leveraging all features, the LGB (Light Gradient Boosting Machine) model produced higher, but comparable performance: (AUROC: 0.711 and APS: 0.225) compared to Epic (AUROC: 0.697 and APS: 0.221). Given features in the first 48-hours, the RF (Random Forest) model produces higher AUROC (0.684), but lower AUPRC (0.18) and APS (0.184) than the Epic model (AUROC: 0.676). In terms of the characteristics of patients flagged by these models, both the full (LGB) and 48-hour (RF) feature models were highly sensitive in flagging more patients than the Epic models. Both models flagged patients with a similar distribution of race and sex; however, our LGB and random forest models more inclusive flagging more patients among younger age groups. The Epic models were more sensitive to identifying patients with an average lower zip income. Our 48-hour models were powered by novel features at various levels: patient (weight change over 365 days, depression symptoms, laboratory values, cancer type), provider (winter discharge, hospital admission type), community (zip income, marital status of partner).
Conclusion We demonstrated that we could develop and validate models comparable to existing Epic 30-day readmission models, but provide several actionable insights that could create service interventions deployed by the case management or discharge planning teams that may decrease readmission rates over time.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
Research reported in this publication was supported by the Louise Von Hess Medical Research Institute, the National Institute Of Nursing Research of the National Institutes of Health under Award Number F31NR019919 and the NationalCenter for Advancing Translational Sciences of the National Institutes of Health under Award Number UL1-TR001878. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
This study was reviewed and approved by the University of Pennsylvania (#834695) and the Lancaster General Hospital (LGH) Institute Review Boards (#2019-51), respectively. The dataset was de-identified for data analysis.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Footnotes
Funding: Research reported in this publication was supported by the Louise Von Hess Medical Research Institute, the National Institute Of Nursing Research of the National Institutes of Health under Award Number F31NR019919 and the National Center for Advancing Translational Sciences of the National Institutes of Health under Award Number UL1-TR001878. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
↵* shared co-senior authorship;
↵** shared co-first authorship
Data Availability
The dataset is not available.
10 Keywords
- ABBCI
- Ann B. Barshinger Cancer Institute
- ACP
- Advanced Care Planning
- ADL
- Activities of Daily Living
- ANN
- Artificial Neural Network
- APS
- Average Precision Score
- AUC
- Area Under the Curve
- AUPRC
- Area Under the Precision Recall Curve
- AUROC
- Area Under the Receiver Operating Characteristic Curve
- BMI
- Body Mass Index
- CV
- Cross Validation
- C4QI
- Comprehensive Cancer Center Consoritums for Quality Improvement
- ED
- Emergency Department
- EHR
- Electronic Health Record
- ExSTraCS
- Extended Supervised Tracking and Classifying System
- GI
- Gastrointestinal
- ICD-9 & 10
- International Classification of Diseases, 9th and 10th revision
- LGB
- Light Gradient Boosting Machine
- LGH
- Lancaster General Hospital
- LOINC
- Logical Observation Identifiers Names and Codes
- LOS
- Length of Stay
- ML
- Machine Learning
- MICE
- Multivariate Imputation by Chained Equations
- RF
- Random Forest
- SVM
- Support Vector Machine
- XGB
- XGBoost
- WBC
- White Blood Cell Count