COVID-19 Twitter-based analysis reveals differential concerns across areas with socioeconomic disparities

Yihua Su; Aarthi Venkat; Yadush Yadav; Lisa B. Puglisi; Samah J. Fodeh

doi:10.1101/2020.11.18.20233973

ABSTRACT

Objective We sought to understand how U.S. residents responded to COVID-19 as it emerged, and the extent to which spatial-temporal factors impacted response.

Materials and Methods We mined and reverse-geocoded 269,556 coronavirus-related social media postings on Twitter from January 23^rd to March 25^th, 2020. We then ranked tweets based on the socioeconomic status of the county they originated from using the Area Deprivation Index (ADI); that we also used to identify areas with high initial disease counts (“hotspots”). We applied topic modeling on the tweets to identify chief concerns and determine their evolution over time. We also investigated how topic proportions varied based on ADI and between hotspots and non-hotspots.

Results We identified 45 topics, which shifted from early-outbreak-related content in January, to the presidential election and governmental response in February, to lifestyle changes in March. Highly resourced areas (low ADI) were concerned with stocks, social distancing, and national-level policies, while high ADI areas shared content with negative expression, prayers, and discussion of the CARES Act economic relief package. Within hotspots, these differences stand, with the addition of increased discussion regarding employment in high ADI versus low ADI hotspots.

Discussion Topic modeling captures the major concerns in COVID-19-related discussion on a social media platform in the early months of the pandemic. Our study extends previous studies that utilized topic modeling on COVID-19 related tweets and linked the identified topics to socioeconomic status using ADI. Comparisons between low and high ADI areas indicate differential Twitter discussions, corresponding to greater concern with economic hardship and impacts of the pandemic in less resourced communities, and less focus on general public health messaging.

Conclusion This work demonstrates a novel framework for assessing differential topics of conversation correlating to income, education, and housing disparities. This, with integration of COVID-19 hotspots, offers improved analysis of crisis response on Twitter. Such insight is critical for informed public health messaging campaigns in future waves of the pandemic, which should focus in part specifically on the interests of those who are most vulnerable in the lowest resourced health settings.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This research was supported in part by the Gruber Foundation (to A.V.).

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

All relevant ethical guidelines have been followed; any necessary IRB and/or ethics committee approvals have been obtained and details of the IRB/oversight body are included in the manuscript.

All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Data Availability

Publicly available Tweets downloaded using the Twitter API.

The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.