Summary
Background ECDC performs epidemic intelligence activities to systematically collate information from a variety of sources, including Twitter, to rapidly detect public health events. The lack of a freely available, customisable and automated early warning tool using Twitter data, prompted ECDC to develop epitweetr.
The specific objectives are to assess the performance of the geolocation and signal detection algorithms used by epitweetr and to assess the performance of epitweetr in comparison with the manual monitoring of Twitter for early detection of public health threats.
Methods Epitweetr collects, geolocates and aggregates tweets to generate signals and email alerts. Firstly, we evaluated manually the tweet geolocation characteristics of 1,200 tweets, and assessed its accuracy in extracting the correct location and its performance in detecting tweets with available information on the tweet geolocation. Secondly, we evaluated signals generated by epitweetr between 19 October and 30 November 2020 and we calculated the positive predictive value (PPV). Then, we evaluated the sensitivity, specificity and timeliness of epitweetr in comparison with Twitter manual monitoring.
Findings The epitweetr geolocation algorithm had an accuracy of 30.1% and 25.9% at national and subnational levels, respectively. General and specific PPV of the signal detection algorithm was 3.0% and 74.6%, respectively. Epitweetr and/or manual monitoring detected 570 signals and 454 events. Epitweetr had a sensitivity of 78.6% [75.2% - 82.0%] and PPV of 74.6% [70.5% - 78.6%]; and the manual monitoring had a sensitivity of 47.9% [43.8% - 52.0%] and PPV of 97.9% [95.8% - 99.9%]. The median validation time difference between sixteen common events detected by epitweetr and manual monitoring was −48.6 hours [(−102.8) - (−23.7) hours].
Interpretation Epitweetr has shown to have sufficient performance as an early warning tool for public health threats using Twitter data. Having developed epitweetr as a free, open-source tool with several configurable settings and a strong automated component, it is expected to increase its usability and usefulness to public health experts.
Funding Not applicable
Evidence before this study Previous reviews have shown how social media, including Twitter, have been used for public health purposes. Most recent studies, in relation to the COVID-19 pandemic, have shown the added value of early warning tools based on Twitter and other social media platforms. They also noted the lack of an open-source tool for real-time monitoring and surveillance.
Added value of this study Epitweetr is a free, open-source and R-based early warning tool for automatic Twitter data monitoring that will support public health experts in rapidly detecting public health threats. The evaluation of epitweetr presented in this study shows the strengths of the tool which include having good performance, high degree of automation, being a near-real-time tool and being publicly available with various customisable settings. Furthermore, it shows which are the areas of improvement for the next versions of epitweetr.
Implications of all the available evidence This tool can be further developed to include more automation and machine learning components to increase usability and information processing time by users.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
No external funding was received
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Not applicable
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Footnotes
↵+ laura.espinosa{at}ecdc.europa.eu
Data Availability
All code used by epitweetr are available as an R package from CRAN. Source maintenance and interaction occurs through the GitHub repository. The historical Twitter data used in the present analysis cannot be shared. However, a dataset with the anonymised signals and events detected by epitweetr and the manual method for these data is publicly available.