Abstract
Adverse drug reactions (ADRs) lead to high disease burden and health expenditure. Aside from traditional data sources used for pharmacovigilance, social media have emerged as an important supplemental data source for monitoring patients and consumers reported ADRs. Recently, there have been increasing concerns about the data veracity of ADRs extracted from social media. Our objective is to categorize different levels of data veracity and explore influential linguistic features and Twitter variables as they may be used for screening ADRs for high data veracity. We annotated a corpus of ADRs with linguistic features validated by clinical experts. Multinomial logistic regression was applied to investigate the associations between the linguistic features and levels of data veracity. We found that using first-person pronouns, expressing negative sentiment, ADR and drug name being in the same sentence were significantly associated with higher levels of data veracity (all p < 0.05), using medical terminology and less indications were associated with good data veracity (p < 0.05), less drug numbers were marginally associated with good data veracity (p = 0.053). These findings suggest an opportunity of developing machine learning models for automatic screening of ADRs from Twitter using identified key linguistic features, Twitter variables, and association rules.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
NA
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
No human subjects research. All relevant ethical guidelines have been followed.
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
NA