Abstract
Background Real-world evidence derived from the electronic medical record (EMR) is increasingly prevalent. How best to ascertain cardiovascular outcomes from EMRs is unknown. We sought to validate a commercially available natural language processing (NLP) software to extract bleeding events.
Methods We included patients with atrial fibrillation and cancer seen at our cancer center from 1/1/2016 to 12/31/2019. A query set based on SNOMED CT expressions was created to represent bleeding from 11 different organ systems. We ran the query against the clinical notes and randomly selected a sample of notes for physician validation. The primary outcome was the positive predictive value (PPV) of the software to identify bleeding events stratified by organ system.
Results We included 1370 patients with mean age 72 years old (SD 1.5) and 35% female. We processed 66,130 notes; the NLP software identified 6522 notes including 654 unique patients with possible bleeding events. Among 1269 randomly selected notes, the PPV of the software ranged from 0.921 for neurologic bleeds to 0.571 for OB/GYN bleeds. Patterns related to false positive bleeding events identified by the software included historic bleeds, hypothetical bleeds, missed negatives, and word errors.
Conclusions NLP may provide an alternative for population-level screening for bleeding outcomes in cardiovascular studies. Human validation is still needed, but an NLP-driven screening approach may improve efficiency.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This work was supported by a grant from the National Heart Lung and Blood Institute, K08HL136850 (to Dr. Shah). Research reported in this publication utilized the Research Informatics Shared Resource at Huntsman Cancer Institute at the University of Utah and was supported by the National Cancer Institute of the National Institutes of Health under Award Number P30CA042014. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The University of Utah Institutional Review Board exempted this retrospective study with a waiver of informed consent.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Footnotes
Funding: This work was supported by a grant from the National Heart Lung and Blood Institute, K08HL136850 (to Dr. Shah). Research reported in this publication utilized the Research Informatics Shared Resource at Huntsman Cancer Institute at the University of Utah and was supported by the National Cancer Institute of the National Institutes of Health under Award Number P30CA042014. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.
Disclosures/Conflicts of Interest: None reported by the authors
Data Availability
Protected patient data are not available for sharing. The code to generate the SNOMED CT expressions for the NLP software are publicly available on GitHub, as referenced in the manuscript.