Abstract
Electronic medical records (EMR) represent a rich informatics resource that remains largely unexploited for improving healthcare outcomes. Here we report a systematic text mining analysis of EMR correspondence for 4791 cancer patients treated between 2001 and 2017. Meaningful groups of text descriptors correlating with poor survival outcomes were systematically identified, and applying machine learning analysis to clinical text accurately predicted cancer patient survival at selected timepoints up to 12 months. In a validation cohort of 726 patients, inclusion of EMR descriptors to machine learning models outperformed the predictivity of conventional clinical symptom scores by 4.9% (p = 0.001). These results prove that labour-intensive EMR data collection can be repurposed to add clinical value. Extension of this approach to a broader spectrum of digital health data should transform the real-time utility of such latent informatics resources, enabling healthcare systems to be more adaptive and responsive to patient circumstances.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
FPL was supported by Shine Translational Fellowship 2016, Garvan Institute of Medical Research. This project was partly supported by a research project grant of Waikato Research Foundation 2018. FPL and RE acknowledge the support from the Wolf family.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
This study was approved by Northern Health & Disability Ethics Committee, New Zealand (#16/STH/251).
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
The original data set is not available, with the exception of summarised and non-reidentifiable datasets supplied as supplementary text. Computer code associated with this manuscript is released as open-source software and is freely available.