Abstract
Background Acute and chronic low back pain (LBP) are different conditions with different treatments. However, they are coded in electronic health records with the same ICD-10 code (M54.5) and can be differentiated only by retrospective chart reviews. This prevents efficient definition of data-driven guidelines for billing and therapy recommendations, such as return-to-work options.
Objective To solve this issue, we evaluate the feasibility of automatically distinguishing acute LBP episodes by analyzing free text clinical notes.
Methods We used a dataset of 17,409 clinical notes from different primary care practices; of these, 891 documents were manually annotated as “acute LBP” and 2,973 were generally associated with LBP via the recorded ICD-10 code. We compared different supervised and unsupervised strategies for automated identification: keyword search; topic modeling; logistic regression with bag-of-n-grams and manual features; and deep learning (ConvNet). We trained the supervised models using either manual annotations or ICD-10 codes as positive labels.
Results ConvNet trained using manual annotations obtained the best results with an AUC-ROC of 0.97 and F-score of 0.69. ConvNet’s results were also robust to reduction of the number of manually annotated documents. In the absence of manual annotations, topic models performed better than methods trained using ICD-10 codes, which were unsatisfactory for identifying LBP acuity.
Conclusions This study uses clinical notes to delineate a potential path toward systematic learning of therapeutic strategies, billing guidelines, and management options for acute LBP at the point of care.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
I.N. and L.C. would like to thank the Pilot Projects Research Training Program of the NY and NJ Education and Research Center (ERC), National Institute for Occupational Safety and Health, for their funding (grant # T42 OH 008422). R.M. would like to thank the support from the Hasso Plattner Foundation and a courtesy GPU donation from NVIDIA.
Author Declarations
All relevant ethical guidelines have been followed; any necessary IRB and/or ethics committee approvals have been obtained and details of the IRB/oversight body are included in the manuscript.
Yes
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
The data used in this study is not publicly available.
Abbreviations
- AUC-PRC
- Area Under the Precision-Recall Curve
- AUC-ROC
- Area Under the Receiver Operating Characteristic Curve
- BoN
- Bag of N-grams
- CNN
- Convolutional Neural Network
- EHR
- Electronic Health Record
- HIPAA
- Health Insurance Portability and Accountability Act
- ICD-CM
- International Statistical of Diseases, Clinical Modification
- IRB
- Institutional Review Board
- LBP
- Low Back Pain
- LR
- Logistic Regression
- NLP
- Natural Language Processing
- NY
- New York
- PCP
- Primary Care Provider
- RTW
- Return To Work
- TF-IDF
- Term Frequency-Inverse Document Frequency