Abstract
Drugs.com provides users’ textual reviews and numeric ratings of drugs. However, text reviews may not always be consistent with the numeric ratings. Overly positive or negative rating may be misleading. In this project, to classify user ratings of drugs with their textual reviews, we built classification models using traditional machine learning and deep learning approaches. Machine learning models including Random Forest and Naive Bayesian classifiers were built using TF-IDF features as input. Also, transformer-based neural network models including BERT, BioBERT, RoBERTa, XLNet, ELECTRA, and ALBERT were built using the raw text as input. Overall, BioBERT model outperformed the other models with an overall accuracy of 87%. We further identified UMLS concepts from the postings and analyzed their semantic types in the postings stratified by the classification result. This research demonstrated that transformer-based models can be used to classify drug reviews and identify reviews that are inconsistent with the ratings.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This study was partially supported by the National Institute on Aging (NIA) of the National Institutes of Health (NIH) under Award Number R21AG061431; and in part by Florida State University-University of Florida Clinical and Translational Science Award funded by National Center for Advancing Translational Sciences under Award Number UL1TR001427. The first author would like to thank eHealth Lab at FSU and the Undergraduate Research Opportunity Program at Florida State University for the mentorship and guidance.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
This project did not involve human subjects therefore IRB is not needed.
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Footnotes
We have added the detailed analysis of the UMLS concepts and semantic types in the posts.
Data Availability
The dataset can be downloaded from https://archive.ics.uci.edu/ml/datasets/Drug+Review+Dataset+%28Drugs.com%29.
https://archive.ics.uci.edu/ml/datasets/Drug+Review+Dataset+%28Drugs.com%29