Abstract
Opioid-involved overdose deaths have risen significantly since 1999 with over 80,000 deaths annually since 2021, primarily driven by synthetic opioids, like fentanyl. Responding to the rapidly changing opioid crisis requires reliable and timely information. One possible source of such data is the social media platforms with billions of user-generated posts, a fraction of which are about drug use. We therefore assessed the utility of Reddit data for surveillance of the opioid epidemic, covering prescription, heroin, and synthetic drugs (as of September 2024, up-to-date Reddit data was still accessible on the open web). Specifically, we built a natural language processing pipeline to identify opioid-related comments and created a cohort of 1,689,039 geo-located Reddit users, each assigned to a state. We followed these users from 2010 through 2022, measured their opioid-related posting activity over time, and compared this posting activity against CDC overdose and National Forensic Laboratory Information System (NFLIS) drug report rates. To simulate the real-world prediction of synthetic drug overdose rates, we added near real-time Reddit data to a model relying on CDC mortality data with a typical 6-month reporting lag and found that Reddit data significantly improved prediction accuracy. We observed drastic, largely unpredictable changes in both Reddit and overdose patterns during the COVID-19 pandemic. Reddit discussions covered a wide variety of drug types that are currently missed by official reporting. This work suggests that social media can help identify and monitor known and emerging drug epidemics and that this data is a public health “common good” to which researchers should continue to have access.
Significance statement The opioid epidemic persists in the United States with over 80,000 deaths annually since 2021, primarily driven by synthetic opioids like fentanyl. As the geographic and demographic patterns of the opioid epidemic are rapidly changing, accurate and timely monitoring is needed. In this paper, we used social media data from Reddit to conduct public health surveillance of the opioid epidemic, following 1.5+ million geo-located users over 10+ years. We also found that near real-time Reddit data can improve our ability to predict future overdose death rates compared to models only using CDC data with typical half-year reporting delays. Our work suggests that social media can be a useful component for public health surveillance of the opioid epidemic.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
A.L. is supported by a National Science Foundation Graduate Research Fellowship (DGE - 1656518). T.H. is supported by NIH R01DK103358, Simons Foundation, NSF - IOS-1546218, R35GM122515, NSF CBET - 1728858 and NIH R01AI130945. R.B.A. is supported by NIH GM102365, HG010615, and the Chan-Zuckerberg Biohub.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
This research did not require IRB approval.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Footnotes
Data analysis and figures were updated to include more recent years. Additional analyses were performed to assess statistical significance of correlation between trends studied in the work.
Data Availability
The underlying data is not provided for ethical reasons.