ABSTRACT
Background Text in electronic health records (EHRs) and big data tools offer the opportunity for surveillance of adverse events (patient harm associated with medical care) (AEs) in the unstructured notes. Writers may explicitly state an apparent association between treatment and adverse outcome (“attributed”) or state the simple treatment and outcome without an association (“unattributed”). We chose to study EHRs from 2006-2008 because of known heparin contamination during this timeframe. We hypothesized that the prevalence of adulterated heparin may have been widespread enough to manifest in EHRs through symptoms related to heparin adverse events, independent of clinicians’ documentation of attributed AEs.
Objective Use the Shakespeare Method, a new unsupervised set of tools, to identify attributed and unattributed potential AEs using the unstructured text of EHRs.
Methods We studied 21,287 adult critical care admissions divided into three time periods. Comparisons of period 3 (7/2007 to 6/2008) to period 2 (7/2006 to 6/2007) were used to find admissions notes to review for new or increased clinical events by generating Latent Dirichlet Allocation topics among words in period 3 that were distinct from period 2. These results were further explored with frequency analyses of periods 1 (7/2001 to 6/2006) through 3.
Results Topics represented unattributed heparin AEs, other medical AEs, rare medical diagnoses, and other clinical events; all were verified with EHRs notes review and frequency analysis. The heparin AEs were not attributed in the notes, diagnosis codes, or procedure codes. Somewhat different from our hypothesis, heparin AEs increased in prevalence from 2001 through 2007, and decreased starting in 2008 (when heparin AEs were being published).
Conclusions The Shakespeare Method could be a useful supplement to AE reporting and surveillance of structured EHRs data. Future improvements should include automation of the manual review process.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
The only funding came from US Food and Drug Administration (FDA) in two forms. FDA supplied the salaries and computing resources for Drs. Bright, Bright-Ponte, and Palmer. FDA paid for contracts with Booz Allen Hamilton that supported salaries and research computing resources for Ms. Dowdy and Drs. Rankin and Blok.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Our use of the data was approved by the governing board for the data administrators: Massachusetts Institute of Technology IRB. Our study was deemed to not be human subjects research by the Food and Drug Administration IRB.
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Footnotes
We made minor corrections to Figure 1 to use generic terms for words (Word1, Word2, etc.).
Data Availability
The study is not a clinical trial. The data are available from the Massachusetts Institute of Technology at MIMIC-III Critical Care Database. https://mimic.physionet.org/about/mimic/.
ABBREVIATIONS USED MORE THAN ONCE
- AE
- Adverse events
- AF
- Atrial fibrillation
- BIDMC
- Beth Israel Deaconess Medical Center
- CABG
- Coronary artery bypass graft
- CCU
- Critical (or Intensive) Care Unit
- CPR
- Cardiopulmonary resuscitation
- DMII
- Diabetes mellitus, type 2
- DVT
- Deep vein thrombosis
- EHRs
- Electronic healthcare records
- FDA
- Food and Drug Administration
- HD
- Hospital day
- HIT
- Heparin induced thrombocytopenia
- IABP
- Intra-aortic balloon pump
- IPH
- Intraparenchymal hemorrhage
- IV
- Intravenous
- LDA
- Latent Dirichlet Allocation algorithm for topic modeling
- LR
- Logistic regression supervised learning algorithm
- MCA
- Middle cerebral artery
- MIMIC-III
- Medical Information Mart for Intensive Care III
- MRI
- Magnetic resonance image
- MVA
- Motor vehicle accident MVC Motor vehicle collision
- NB
- Naïve Bayes supervised learning algorithm
- NLP
- Natural language processing
- O2
- Oxygen
- OR
- Operating room
- PAE
- Potential adverse event
- PICC
- Peripherally inserted central catheter
- POD
- Post-operative day
- tPA
- Tissue plasminogen activator
- UTI
- Urinary tract infection