Abstract
Background Within a few months, the COVID-19 pandemic has spread to many countries and has been a real challenge for health systems all around the world. This unprecedented crisis has led to a surge of online discussions about potential cures for the disease. Among them, vaccines have been at the heart of the debates, and have faced lack of confidence before marketing in France.
Objective This study aims to identify and investigate the opinion of French Twitter users on the announced vaccines against COVID-19 through sentiment analysis.
Methods This study was conducted in two phases. First, we filtered a collection of tweets related to COVID-19 from February to August 2020 with a set of keywords associated with vaccine mistrust using word embeddings. Second, we performed sentiment analysis using deep learning to identify the characteristics of vaccine mistrust. The model was trained on a hand labeled subset of 4,548 tweets.
Results A set of 69 relevant keywords were identified as the semantic concept of the word “vaccin” (vaccine in French) and focus mainly on conspiracies, pharmaceutical companies, and alternative treatments. Those keywords enabled to extract nearly 350k tweets in French. The sentiment analysis model achieved a 0.75 accuracy. The model then predicted 16% of positive tweets, 41% of negative tweets and 43% of neutral tweets. This allowed to explore the semantic concepts of positive and negative tweets and to plot the trends of each sentiment. The main negative rhetoric identified from users’ tweets was that vaccines are perceived as having a political purpose, and that COVID-19 is a commercial argument for the pharmaceutical companies.
Conclusions Twitter might be a useful tool to investigate the arguments of vaccine mistrust as it unveils a political criticism contrasting with the usual concerns on adverse drug reactions. As the opposition rhetoric is more consistent and more widely spread than the positive rhetoric, we believe that this research provides effective tools to help health authorities better characterize the risk of vaccine mistrust.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
ADZ implemented the embeddings and the classifier, and performed the analysis. BA collected data from the Panacelab github repository and extracted tweets from github, administered the server, and installed the necessary libraries and tools. ADZ and CB drafted the manuscript. AGB provided support on designing the study and reviewing results. All co-authors revised the article critically for important intellectual content and provided final approval of the version to be submitted. This work was funded by the grant ANR-16-CE23-0011-01 from the ANR, the French Agence nationale de la Recherche through the PEGASE (Pharmacovigilance enrichie par des Groupements Améliorant la detection des Signaux Emergents) project. Exaion, a new subsidiary of the EDF group, a cloud provider of blockchain and high-performance computing solutions, lent one of its servers with two graphical processing units free of charge for the duration of the study. We thank the representatives of Exaion who helped us: Fatih Balyeli, Laurent Bernou-Mazars, Nicolas Meaux, and Vivien Sayve. We acknowledge the work of the Panacea Lab in collecting tweets related to the COVID-19 pandemics. The views mentioned in this article are those of the authors only.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The Ethics Committee of the CHU de Saint-Etienne has given a favorable opinion on the conduct of this study, and referenced the project under the number IRBN1412020/CHUSTE. This opinion was motivated by the fact that the study is based only on data extracted from Twitter which are openly accessible to the public.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
The original dataset providing the tweets ids is to be found on the Panacealab website.
Abbreviations
- BERT
- Bidirectional Encoder Representations from Transformers
- COVID-19
- coronavirus disease
- NLP
- Natural Language Processing
- Tf-idf
- Term Frequency – Inverse Document Frequency
- WHO
- World Health Organisation