Abstract
Importance Individuals whose chronic pain is managed with opioids are at high risk of developing an opioid use disorder. Large data sets, such as electronic health records, are required for conducting studies that assist with identification and management of problematic opioid use.
Objective Determine whether regular expressions, a highly interpretable natural language processing technique, could automate a validated clinical tool (Addiction Behaviors Checklist1) to expedite the identification of problematic opioid use in the electronic health record.
Design This cross-sectional study reports on a retrospective cohort with data analyzed from 2021 through 2023. The approach was evaluated against a blinded, manually reviewed holdout test set of 100 patients.
Setting The study used data from Vanderbilt University Medical Center’s Synthetic Derivative, a de-identified version of the electronic health record for research purposes.
Participants This cohort comprised 8,063 individuals with chronic pain. Chronic pain was defined by International Classification of Disease codes occurring on at least two different days.18 We collected demographic, billing code, and free-text notes from patients’ electronic health records.
Main Outcomes and Measures The primary outcome was the evaluation of the automated method in identifying patients demonstrating problematic opioid use and its comparison to opioid use disorder diagnostic codes. We evaluated the methods with F1 scores and areas under the curve - indicators of sensitivity, specificity, and positive and negative predictive value.
Results The cohort comprised 8,063 individuals with chronic pain (mean [SD] age at earliest chronic pain diagnosis, 56.2 [16.3] years; 5081 [63.0%] females; 2982 [37.0%] male patients; 76 [1.0%] Asian, 1336 [16.6%] Black, 56 [1.0%] other, 30 [0.4%] unknown race patients, and 6499 [80.6%] White; 135 [1.7%] Hispanic/Latino, 7898 [98.0%] Non-Hispanic/Latino, and 30 [0.4%] unknown ethnicity patients). The automated approach identified individuals with problematic opioid use that were missed by diagnostic codes and outperformed diagnostic codes in F1 scores (0.74 vs. 0.08) and areas under the curve (0.82 vs 0.52).
Conclusions and Relevance This automated data extraction technique can facilitate earlier identification of people at-risk for, and suffering from, problematic opioid use, and create new opportunities for studying long-term sequelae of opioid pain management.
Question Can an interpretable natural language processing method automate a valid, reliable clinical tool in order to expedite the identification of problematic opioid use in the electronic health record?
Findings In this cross-sectional study of patients with chronic pain, an automated natural language processing approach identified individuals with problematic opioid use that were missed by diagnostic codes.
Meaning Regular expressions can be used in automatically identifying problematic opioid use in an interpretable and generalizable manner.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
Drs. Jeffery, Samuels, Sanchez-Roige, and Schirle received support from the National Institute on Drug Abuse (NIDA) under Award Number DP1DA056667. Dr. Jeffery received support for this work from the Agency for Healthcare Research and Quality (AHRQ) and the Patient-Centered Outcomes Research Institute (PCORI) under Award Number K12 HS026395. Dr. Sanchez-Roige was supported by funds from the California Tobacco-Related Disease Research Program (TRDRP; Grant Number T29KT0526 & T32IR5226), Dr. Sanchez-Roige was also supported by NIDA DP1DA054394. Dr. Schirle received support for this work from the National Institute of Nursing Research (NINR) under Award Number K2313242701. Mr. Chatham received support for this work from the Vanderbilt University Merit Scholarship Stipend Program.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
IRB of Vanderbilt University gave ethical approval for this work. (Institutional Review Board study #181443 and #201918.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
↵1 Co-First Authors
Data Availability
Although the data are probabilistically de-identified, our data use agreement prohibits sharing raw text data with external entities. Those who would like to review our aggregated data should contact the corresponding author to request a copy of the aggregated data.