Abstract
Objectives Online health forums provide rich and untapped real-time data on population health. Through novel data extraction and natural language processing (NLP) techniques, we characterise the evolution of mental and physical health concerns relating to the COVID-19 pandemic among online health forum users.
Setting and design We obtained data from 739,434 posts by 53,134 unique users of three leading online health forums: HealthBoards, Inspire and HealthUnlocked, from the period 1st January 2020 to 31st May 2020. Using NLP, we analysed the content of posts related to COVID-19.
Primary outcome measures
Proportion of forum posts containing COVID-19 keywords
Proportion of forum users making their very first post about COVID-19
Number of COVID-19 related posts containing content related to physical and mental health comorbidities
Results Posts discussing COVID-19 and related comorbid disorders spiked in early- to mid-March around the time of global implementation of lockdowns prompting a large number of users to post on online health forums for the first time. The pandemic and corresponding public response has had a significant impact on posters’ queries regarding mental health.
Conclusions We demonstrate it is feasible to characterise the content of online health forum user posts regarding COVID-19 and measure changes over time. Social media data sources such as online health forums can be harnessed to strengthen population-level mental health surveillance.
Strengths and limitations of this study
Analysing online health forum data using NLP revealed a substantial rise in activity which correlated with the onset of the COVID-19 pandemic.
Real-time data sources such as online health forums are essential for monitoring fluctuating population health and tailoring responses to daily pressures.
It is not yet possible to establish COVID-19 status or whether concerned posters have pre-existing mental or physical health issues, are recovered, or have become unwell for the first time.
Online health forums are help-seeking forums, which introduces self-selection bias.
Competing Interest Statement
All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf and declare: RP has received funds from Janssen, Induction Healthcare and Holmusk outside the current study. The other authors declare no competing interests.
Funding Statement
RP has received support from a Medical Research Council (MRC) Health Data Research UK Fellowship (MR/S003118/1) and a Starter Grant for Clinical Lecturers (SGL015/1020) supported by the Academy of Medical Sciences, The Wellcome Trust, MRC, British Heart Foundation, Arthritis Research UK, the Royal College of Physicians and Diabetes UK. FS and CB were partly funded by an Alan Turing Institute (ATI) Fellowship and by an EPSRC COVID-19 Rapid Response Impact Acceleration Fund. Computational resources were partly funded by a Microsoft Azure Sponsorship through the ATI.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Data for this study were drawn from publicly available online health forums and extracted in aggregate form for secondary data analysis rather than at individual user level. No individual user level data were retained, making it impossible to obtain informed consent. However, users were aware that their data were available for anyone to view online by virtue of contributing to publicly available online health forums. The data were analysed using the computing infrastructure based at Queen Mary University of London (QMUL) which employs a two-layer security model to maintain data privacy. QMUL is registered as a data controller with the Information Commissioner's Office (ICO; registration number: Z5507327), which covers all research activities undertaken at the university.
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Footnotes
Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf and declare: RP has received funds from Janssen, Induction Healthcare and Holmusk outside the current study. The other authors declare no competing interests.
Ethics approval: Data for this study were drawn from publicly available online health forums and extracted in aggregate form for secondary data analysis rather than at individual user level. No individual user level data were retained, making it impossible to obtain informed consent. However, users were aware that their data were available for anyone to view online by virtue of contributing to publicly available online health forums. The data were analysed using the computing infrastructure based at Queen Mary University of London (QMUL) which employs a two-layer security model to maintain data privacy. QMUL is registered as a data controller with the Information Commissioner’s Office (ICO; registration number: Z5507327), which covers all research activities undertaken at the university.
Source of funding: RP has received support from a Medical Research Council (MRC) Health Data Research UK Fellowship (MR/S003118/1) and a Starter Grant for Clinical Lecturers (SGL015/1020) supported by the Academy of Medical Sciences, The Wellcome Trust, MRC, British Heart Foundation, Arthritis Research UK, the Royal College of Physicians and Diabetes UK. FS and CB were partly funded by an Alan Turing Institute (ATI) Fellowship and by an EPSRC COVID-19 Rapid Response Impact Acceleration Fund. Computational resources were partly funded by a Microsoft Azure Sponsorship through the ATI.
Role of funder: The views expressed are those of the authors and not necessarily those of the funders. The funders had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Data sharing: Given licensing and privacy issues, it is not possible to release the dataset generated from the online health forums investigated in this study. However, we welcome collaboration with other researchers and healthcare policy makers. Anyone interested in accessing the aggregate data and data analysis code should contact the guarantor (f.smeraldi{at}qmul.ac.uk).
Data Availability
Given licensing and privacy issues, it is not possible to release the dataset generated from the online health forums investigated in this study. However, we welcome collaboration with other researchers and healthcare policy makers. Anyone interested in accessing the aggregate data and data analysis code should contact the guarantor (f.smeraldi@qmul.ac.uk).