Abstract
Background Radiation and medical oncologists evaluate patients’ risk of imminent mortality with scales like Karnofsky Performance Status (KPS) and predicate treatment decisions on these evaluations. However, we hypothesized that statistical models derived from structured electronic health record (EHR) data could predict patient deaths within 30 days of radiotherapy consultation better than models developed only with patient age and physician-reported KPS.
Methods Clinical data from patients who consulted in a radiotherapy department from June 2018 – February 2024 were abstracted from EHR databases, including patient demographics, laboratory results, medications, comorbidities, KPS, cancer stages, oncologic treatment histories, oncologist notes, radiologist reports, and pathologist narratives. A subset of structured features known or believed to be associated with mortality were curated and used to train and test logistic regression, random forest, and gradient-boosted decision classifiers.
Results Of 38,262 patients, 951 (2.5%) died within 30 days of radiotherapy consultation. From 34.5 gigabytes of tabular data, 2,977 clinical features were chosen or derived by a radiation oncologist, then reduced to 1,000 features using ANOVA F values. Using an event probability classification threshold of 0.2, optimized logistic regression, random forest, and gradient-boosted decision classifiers tested with high accuracy (0.97, 0.98, and 0.98, respectively) and F1 scores (0.50, 0.54, and 0.52). The areas under receiver operating and precision-recall curves for the random forest model were respectively 0.94 and 0.55, which outperformed a model trained only with patient age and KPS (0.61 and 0.06). Models prominently weighed features that were rationally associated with mortality.
Conclusion Statistical models developed from a physician-curated feature space of structured EHR data predicted patient deaths within 30 days of radiotherapy consultation better than a model developed only with a patient’s age and physician-assessed KPS. With clinically explicable feature weights, these models could influence treatment decisions such as the length of palliative radiotherapy courses.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
Support for this work was provided in part by a National Institutes of Health grant (NIH Research Portfolio Online Reporting Tools project number 75N95021D00028-0-759502200002-1)
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The Institutional Review Board of Washington University School of Medicine in St. Louis gave ethical approval for this work.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
Conflicts of Interest The authors report no relevant conflicts of interest
Funding Statement Support for this work was provided in part by a National Institutes of Health grant (NIH Research Portfolio Online Reporting Tools project number 75N95021D00028-0-759502200002-1)
Data Sharing Statement This project involved secondary analysis of electronic health records. No data were collected as primary research data. To maintain patient record confidentiality, no data will be made publicly available.
Data Availability
This project involved secondary analysis of electronic health records. No data were collected as primary research data. To maintain patient record confidentiality, no data will be made publicly available.