Abstract
Background Data on lines of therapy (LOTs) for cancer treatment is important for clinical oncology research, but LOTs are not explicitly recorded in EHRs. We present an efficient approach for clinical data abstraction and a flexible algorithm to derive LOTs from EHR-based medication data on patients with glioblastoma (GBM).
Methods Non-clinicians were trained to abstract the diagnosis of GBM from EHRs, and their accuracy was compared to abstraction performed by clinicians. The resulting data was used to build a cohort of patients with confirmed GBM diagnosis. An algorithm was developed to derive LOTs using structured medication data, accounting for the addition and discontinuation of therapies and drug class. Descriptive statistics were calculated and time-to-next-treatment analysis was performed using the Kaplan-Meier method.
Results Treating clinicians as the gold standard, non-clinicians abstracted GBM diagnosis with sensitivity 0.98, specificity 1.00, PPV 1.00, and NPV 0.90, suggesting that non-clinician abstraction of GBM diagnosis was comparable to clinician abstraction. Out of 693 patients with a confirmed diagnosis of GBM, 246 patients contained structured information about the types of medications received. Of those, 165 (67.1%) received a first-line therapy (1L) of temozolomide, and the median time-to-next-treatment from the start of 1L was 179 days.
Conclusions We also developed a flexible, interpretable, and easy-to-implement algorithm to derive LOTs given EHR data on medication orders and administrations that can be used to create high-quality datasets for outcomes research. We also showed that the cost of chart abstraction can be reduced by training non-clinicians instead of clinicians.
Importance of the study This study proposes an efficient and accurate method to extract unstructured data from electronic health records (EHRs) for cancer outcomes research. The study addresses the limitations of manual abstraction of unstructured clinical data and presents a reproducible, low-cost workflow for clinical data abstraction and a flexible algorithm to derive lines of therapy (LOTs) from EHR-based structured medication data. The LOT data was used to conduct a descriptive treatment pattern analysis and a time-to-next-treatment analysis to demonstrate how EHR-derived unstructured data can be transformed to answer diverse clinical research questions. The study also investigates the feasibility of training non-clinicians to perform abstraction of GBM data, demonstrating that with detailed explanations of clinical documentation, best practices for chart review, and quantitative evaluation of abstraction performance, similar data quality to abstraction performed by clinicians can be achieved. The findings of this study have important implications for improving cancer outcomes research and facilitating the analysis of EHR-derived treatment data.
Competing Interest Statement
AS reports stock ownership in Roche (RHHVF).
Funding Statement
This study did not receive any funding.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Ethics approval was granted through Stanford University IRB (#50031).
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
Email: alexren{at}stanford.edu
Email: jntwu{at}stanford.edu
Email: aarohi{at}stanford.edu
Email: ivlopez{at}stanford.edu
Email: ujwal{at}stanford.edu
Email: valex{at}stanford.edu
Email: rjpizzit{at}stanford.edu
Email: bbui{at}stanford.edu
Email: lalkhani{at}stanford.edu
Email: leesusan{at}stanford.edu
Email: mohitn{at}stanford.edu
Email: noelseo{at}stanford.edu
Email: nmacedo{at}stanford.edu
Email: winsonc{at}stanford.edu
Email: wwang28{at}stanford.edu
Email: edwardt3{at}stanford.edu
Email: reenat{at}stanford.edu
Email: ogevaert{at}stanford.edu
Funding Statement: No funding was obtained
COI Statement: AS reports stock ownership in Roche (RHHVF).
Data Availability: All data needed to evaluate the conclusions are present in the paper and in the Supplementary Materials. The datasets generated analyzed during the current study are not publicly available due to patient privacy but are available from the corresponding author (AS) on reasonable request.
Data Availability
All data needed to evaluate the conclusions are present in the paper and in the Supplementary Materials. The datasets generated analyzed during the current study are not publicly available due to patient privacy but are available from the corresponding author (AS) on reasonable request.