Abstract
Background The coronavirus disease 2019 (COVID-19) has continued to spread in the US and globally. Closely monitoring public engagement and perception of COVID-19 and preventive measures using social media data could provide important information for understanding the progress of current interventions and planning future programs.
Objective To measure the public’s behaviors and perceptions regarding COVID-19 and its daily life effects during the recent 5 months of the pandemic.
Methods Natural language processing (NLP) algorithms were used to identify COVID-19 related and unrelated topics in over 300 million online data sources from June 15 to November 15, 2020. Posts in the sample were geotagged, and sensitivity and specificity were both calculated to validate the classification of posts. The prevalence of discussion regarding these topics was measured over this time period and compared to daily case rates in the US.
Results The final sample size included 9,065,733 posts, 70% of which were sourced from the US. In October and November, discussion including mentions of COVID-19 and related health behaviors did not increase as it had from June to September, despite an increase in COVID-19 daily cases in the US beginning in October. Additionally, counter to reports from March and April, discussion was more focused on daily life topics (69%), compared with COVID-19 in general (37%) and COVID-19 public health measures (20%).
Conclusions There was a decline in COVID-19-related social media discussion sourced mainly from the US, even as COVID-19 cases in the US have increased to the highest rate since the beginning of the pandemic. Targeted public health messaging may be needed to ensure engagement in public health prevention measures until a vaccine is widely available to the public.
- COVID-19 public perception
- COVID-19 social media
- infodemic
- social media research
- social media analysis
- natural language processing
- Reddit data
- Facebook data
- COVID-19 public health measures
- public health surveillance
Competing Interest Statement
Yuan Lu is supported by the National Heart, Lung, and Blood Institute (K12HL138037) and the Yale Center for Implementation Science. Rachel Dreyer is supported by an American Heart Association Transformational Project Award (#19TPA34830013) and a Canadian Institutes of Health Research Project Grant (RN356054401229). In the past three years, Harlan Krumholz received expenses and/or personal fees from UnitedHealth, IBM Watson Health, Element Science, Aetna, Facebook, the Siegfried and Jensen Law Firm, Arnold and Porter Law Firm, Martin/Baughman Law Firm, F-Prime, and the National Center for Cardiovascular Diseases in Beijing. He is an owner of Refactor Health and HugoHealth, and had grants and/or contracts from the Centers for Medicare & Medicaid Services, Medtronic, the U.S. Food and Drug Administration, Johnson & Johnson, and the Shenzhen Center for Health Information. The remaining authors have no disclosures to report.
Funding Statement
This work was supported by the project Insights about the COVID Pandemic Using Public Data IRES PD: 20-005872 with funding from the Foundation for a Smoke-Free World.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
This study was exempted from Institutional Review Board review by Yale University as it did not engage in research involving human subjects.
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Abbreviations
- COVID-19
- coronavirus disease 2019
- US
- United States
- API
- application programming interface
- NLP
- natural language processing
- EVALI
- e-cigarette or vaping use-associated lung injury
Paper in collection COVID-19 SARS-CoV-2 preprints from medRxiv and bioRxiv
The Chan Zuckerberg Initiative, Cold Spring Harbor Laboratory, the Sergey Brin Family Foundation, California Institute of Technology, Centre National de la Recherche Scientifique, Fred Hutchinson Cancer Center, Imperial College London, Massachusetts Institute of Technology, Stanford University, The University of Edinburgh, University of Washington, and Vrije Universiteit Amsterdam.