Abstract
Background Sufficiently accurate predictions of hospital readmissions are necessary for the allocation of scare clinical resources to reduce preventable readmissions. We describe the use of a data-driven approach that relies on machine learning algorithms to predict readmission at the time of discharge.
Methods We employ random forests to clinical and administrative electronic health record data available from a cohort of 103,688 patients discharged from the acute inpatient settings of the University of Pennsylvania Health System between June 25th, 2011 and June 30th, 2013. We predict both 30-day all-cause readmissions and 7-day unplanned readmissions using only predictors available by the time of discharge. Using oversampling and undersampling of the different outcome classes of readmission and no readmission, we incorporate into our models the asymmetric costs of a false negative relative to a false positive from the perspective of a hospital. We calculate variable importance scores for included predictors. Our approach was derived and validated using split-sample internal validation.
Results We developed a machine learning-based model using random forests with a 5:1 relative cost ratio for 30-day all-cause readmissions that achieves a sensitivity of 65% and specificity of 71% on validation data, as well as a random forests model with a 20:1 cost ratio for 7-day unplanned readmissions that achieves a sensitivity of 62% and specificity of 66% on validation data. Prior health system utilization, clinical discharging service, and vital sign information were most predictive of readmissions.
Conclusion By modeling the complex relationships between many predictor variables and readmission data for a large health system, we demonstrate successful predictive models that can be used upon discharge to flag patients at high risk of readmission.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
Dr. Umscheids contribution to this project was supported in part by the National Center for Research Resources, Grant UL1RR024134, which is now at the National Center for Advancing Translational Sciences Grant UL1TR000003. The content of this paper is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. No other external funds supported this study.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The study received expedited approval and a Health Insurance Portability and Accountability Act (HIPAA) waiver from the University of Pennsylvania Institutional Review Board.
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
Aggregate data is presented in the manuscript, but our IRB approval does not currently authorize sharing of study data with those outside of the research team.
Abbreviations
- CMS
- Centers for Medicare and Medicaid Services
- EHR
- Electronic health record
- UPHS
- The University of Pennsylvania Health System
- RF
- Random forest
- OOB
- Out-of-bag