COVID-19 vaccine perceptions: An observational study on Reddit ============================================================== * Navin Kumar * Isabel Corpus * Meher Hans * Nikhil Harle * Nan Yang * Curtis McDonald * Shinpei Nakamura Sakai * Kamila Janmohamed * Weiming Tang * Jason L. Schwartz * S. Mo Jones-Jang * Koustuv Saha * Shahan Ali Memon * Chris T. Bauch * Munmun De Choudhury * Orestis Papakyriakopoulos * Joseph D. Tucker * Abhay Goyal * Aman Tyagi * Kaveh Khoshnood * Saad Omer ## Abstract **Objectives** As COVID-19 vaccinations accelerate in many countries, narratives skeptical of vaccination have also spread through social media. Open online forums like Reddit provide an opportunity to quantitatively examine COVID-19 vaccine perceptions over time. We examine COVID-19 misinformation on Reddit following vaccine scientific announcements. **Methods** We collected all posts on Reddit from January 1 2020 - December 14 2020 (n=266,840) that contained both COVID-19 and vaccine-related keywords. We used topic modeling to understand changes in word prevalence within topics after the release of vaccine trial data. Social network analysis was also conducted to determine the relationship between Reddit communities (subreddits) that shared COVID-19 vaccine posts, and the movement of posts between subreddits. **Results** There was an association between a Pfizer press release reporting 90% efficacy and increased discussion on vaccine misinformation. We observed an association between Johnson and Johnson temporarily halting its vaccine trials and reduced misinformation. We found that information skeptical of vaccination was first posted in a subreddit (r/Coronavirus) which favored accurate information and then reposted in subreddits associated with antivaccine beliefs and conspiracy theories (e.g. conspiracy, LockdownSkepticism). **Conclusions** Our findings can inform the development of interventions where individuals determine the accuracy of vaccine information, and communications campaigns to improve COVID-19 vaccine perceptions. Such efforts can increase individual- and population-level awareness of accurate and scientifically sound information regarding vaccines and thereby improve attitudes about vaccines. Further research is needed to understand how social media can contribute to COVID-19 vaccination services. **Funding** Study was funded by the Yale Institute for Global Health and the Whitney and Betty MacMillan Center for International and Area Studies at Yale University. The funding bodies had no role in the design, analysis or interpretation of the data in the study. Keywords * COVID-19 * Vaccine * Reddit * Computational * Misinformation ## Introduction Coronavirus Disease 2019 (COVID-19) vaccine development and widespread scale-up are major steps in combating the pandemic [1]. Several vaccine candidates appear to provide protection not just against disease but also against infection [2], meaning the vaccines could be instrumental in significantly reducing infections among populations. However, for vaccines to be successful, they not only need to be deemed safe and effective by scientists, but also widely accepted by the public [3]. Effective health communication is key to vaccine acceptance, but is a complex task given widespread vaccine hesitancy, rapidly changing vaccine information [4, 5], and vaccine misinformation [6, 7]. Misinformation is defined as information that has the features of being false, determined based on expert evidence, but shared with no intention of harm [8]. For example, some falsely believe that participants in vaccine trials have died after taking the vaccine, or that vaccination is a means to covertly implant microchips [9, 10]. Such information may worsen existing fear around a vaccine and limit public uptake of a COVID-19 vaccine [3]. There are a diverse range of individuals who are skeptics of vaccination including those who are antivaccine or antivaxxers (individuals who are opposed to vaccination or laws that require vaccination) and those who are vaccine hesitant (those who delay in acceptance or refusal of vaccination) [11]. The diverse groups who are skeptical of vaccines may react to information in different ways [12]. With low willingness to vaccinate globally [13], and substantial COVID-19 misinformation [14], achieving sufficient vaccination coverage to reach population-level benefits will be challenging. Reduced vaccine uptake may impinge on population-level impact [15], and COVID-19 control at the population level [16]. For example, reduced vaccine uptake may increase the mortality cost of COVID-19 [16] and create clusters of non-vaccinators that disproportionately increase pandemic spread [17]. In addition, willingness to accept a COVID-19 vaccine seems to be fluctuating in the US [18, 19], with decreases in vaccine acceptance leading up to the 2020 Presidential Election in the United States (US), perhaps due to significant politicization of the vaccine [19]. Thus, vaccine acceptance is not constant or uniform, and likely affected by several factors, such as being responsive to information and perceptions regarding the vaccine, and the state of the pandemic and economy. Several studies have detailed the relationship between exposure to COVID-19 misinformation and vaccine acceptance [3], as well as COVID-19 vaccine perceptions assessed via Twitter [6, 20] and online surveys [21, 22, 23, 24, 25]. However, limited research has explored how online vaccine perceptions are associated with major events in the vaccine development and implementation timeline (e.g. major pharmaceutical firms halting vaccine trials or publishing results on vaccine effectiveness) and how online vaccine discussions move across arenas that have different baseline vaccine perceptions. To our knowledge, prior studies have generally not focused on Reddit, a social news aggregation and discussion website. Registered Reddit members submit posts (text, images, videos) to the site, which are then voted up or down by other members. Posts are organized by subject into user-created boards called communities or subreddits, which cover a large range of topics. Reddit may be a useful setting for examining vaccine perceptions because similar topics have been discussed before [26], including topics related to COVID-19 vaccine development [27]. Moreover, as seen with the recent GameStop trading event, Reddit is increasingly important in online conversations [28]. We note that Reddit and similar online sources are not necessarily representative of what the overall US general public feels [29]. However, Reddit provides insights on highly shared news, and can rapidly transmit both misinformation and accurate information [30, 31, 32]. It is important to understand the behavior of top users, how vaccine perceptions are related to events in the vaccine timeline and how vaccine discussion on Reddit migrates across subreddits that differ in their vaccine perceptions, as large-scale vaccination among the US general public expands. Most users of online platforms are passive or participate with a very low frequency. A small number of Reddit users are hyperactive and may over-proportionally influence vaccine perceptions online [33]. Thus, describing the behavior of hyperactive users is key to understanding shifts in vaccine perceptions. Understanding how perceptions are related to vaccine-related events may allow stakeholders to better design communication and education campaigns [34, 35] in response to vaccine distribution setbacks. Given the range of vaccine-related viewpoints online, greater insight on how discussions move across Reddit communities will allow stakeholders to better disseminate evidence-based information on Reddit. The purpose of this analysis was to detail the behavior of top Reddit users, posts’ relationship with events in the vaccine timeline, and the relationship between subreddits that shared COVID-19 vaccine posts. ## Methods ### Ethics statement Our work did not require institutional review board approval as we used publicly accessible and deidentified posts from Reddit without any interaction with the posts’ authors. We did not use personally identifiable information, and paraphrased posts to reduce traceability. ### Data acquisition and processing Using the Pushshift API and the Python Reddit API Wrapper [36, 37], we collected all posts on the entire Reddit, across all subreddits from January 1 2020 - December 14 2020 that contained both COVID-19 and vaccine keywords (see Supplement, only posts that had COVID-19 AND vaccine-related keywords were collected) derived from systematic reviews on the topic. We also collected metadata for each post e.g. the username, ID, subreddit. We then preprocessed our data as follows: 1) removed duplicate entries; 2) filtered out entries <50 characters as these generally do not provide enough information for meaningful analysis [38, 39]; 3) filtered the content using a curated set of search terms (as shown in the Supplement) to retain only COVID-19 vaccine-related content; 4) removed text in non-English languages, URLs, emojis, and punctuation. ### Hyperactive users To better understand the possibly outsize influence of some individuals, we provided a descriptive overview of the behavior of top 10 users, focusing on content and number of posts. ### Topic modeling We used topic modeling to understand changes in word prevalence within topics around COVID-19 vaccines (see Supplement for additional detail). Topic modeling is a computer-aided content analysis technique through which texts are organized into themes known as “topics” [40, 41]. We used an approach to topic modeling known as Structural Topic modeling (STM) [42, 43]. STMs [42, 43] enable the generation of topics with regards to document metadata such as date and source and other covariates relevant to the research question, such as new COVID-19 cases. We used the following metadata covariates for the STM model: date (1 was denoted for the first day and numbered sequentially after), new COVID-19 cases per day worldwide, new COVID-19 deaths per day worldwide (publicly available and both obtained from COVID-19 Data Repository by the Center for Systems Science and Engineering at Johns Hopkins University [44]), S and P 500 opening score (publicly available from the Wall Street Journal), post type (comment or post), score (upvotes - downvotes). We used worldwide cases and deaths instead of US cases/deaths as Reddit COVID-19 discussion centers on pandemic progression both globally and in the US, despite most users being from the US. These control variables may address underlying factors possibly influencing vaccine perceptions. By considering a broader picture of what may influence topic proportions around vaccine discussion, we can better test the claims relation to the association between specific events and topic proportions. March 11 2020 was denoted as the start date for our analysis, the date the World Health Organization declared COVID-19 a pandemic [45]. As STM is an unsupervised approach, the number of topics (k) to estimate is key to the analysis. We first estimated several models ranging from 5 to 30 topics. These models were then evaluated qualitatively by two authors (IS, AG) independently for 1) their ability to produce coherent topics and 2) appropriately capture topics regarding COVID-19 vaccination [46]. The two authors agreed on the same topic solution (k=20). Topic interpretation was influenced by authors’ first reading the top 100 most-cited COVID-19 peer-reviewed research articles and the top 10 most cited peer-reviewed research articles around topic modeling. Two authors assigned topics [Cohen’s kappa (k) >0.8] and a third author resolved disagreements when they arose. We also detailed how events in the vaccine timeline (described in following section) were associated with topic prevalence. We generated linear regression models with expected topic proportions for each topic as dependent variables and vaccine events as main explanatory variables, with the following additional covariates: new COVID-19 cases per day worldwide, new COVID-19 deaths per day worldwide, SP 500 opening score, post type. To validate regression analyses in Figure 1, we undertook a close reading of the 100 most representative text fragments for exemplar topics. We found that these topics varied in line with the indicated events. ![Figure 1:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/04/13/2021.04.09.21255229/F1.medium.gif) [Figure 1:](http://medrxiv.org/content/early/2021/04/13/2021.04.09.21255229/F1) Figure 1: Values were generated from a regression where the outcome variable was the proportion of each document dedicated to each topic, given the selected STM model, and with various vaccine events as main explanatory variables. Topics on the right of the zero line were more likely to be brought up after the indicated event. Confidence intervals (95%) included both regression uncertainty and measurement uncertainty from the STM model. ### Selecting events of interest We first assembled a preliminary list of COVID-19 vaccine-related events based on a review of online news sites and peer-reviewed vaccine research articles and consultation with experts on vaccination, resulting in a list of six events (see Supplement). We then conducted preliminary analyses with remaining events to determine the ones that were associated with the greatest shift in topic proportions for each topic. The three events below were selected as our final list, as these were associated with a shift in topic proportions for most topics. Events as follows: 1) AstraZeneca halts Phase 3 vaccine trial (September 8 2020); 2) Johnson Johnson temporarily halts vaccine trial (October 12 2020); 3) Pfizer announces preliminary vaccine clinical trial results showing 90% efficacy (November 9 2020). ### Social network analysis Next, we conducted social network analysis to provide insights on how vaccine discussion on Reddit migrates across subreddits that differ in their vaccine perceptions, and the relationship between these subreddits [47]. While standard social networks tend to assess relationships between people, we used a network to describe relationships between subreddits, studying the connections between people as mediated by the subreddits they were in and the posts shared between these subreddits. We used node sizes to represent number of users in a subreddit, node edges to indicate shared COVID-19 vaccine posts between subreddits (an edge was indicated if there was >one shared post), and node labels to detail the subreddit name. Edge direction was based on whether a node had >50% of its posts made prior to its adjacent connecting node e.g. A->B if >50% of A’s shared posts were made before B. When analyzing the trajectory of posts from one subreddit to another, we assumed that posts moved from A->B->C if a post was made first in A, then followed by B and C. This may allow us to see how posts moved from one subreddit to another. We used the fast-greedy algorithm for cluster identification. We focused on the main social network in our data (largest component subgraph) and excluded all edges with a weight of one (i.e. all connections between subreddits that had only one post in common) and all clusters that had <15 vertices and whose vertices had a betweenness centrality <20 (we used a range of network characteristics to yield an easy to understand social network and the above measures yielded the clearest output). ## Results Post-processing, we had 266,840 documents (25,400,556 words). ### Overview of hyperactive users We reviewed the posts for the top 10 users who posted the most in our dataset, ranging from 159 - 278 posts/person. Six of these users posted evidence-based information (e.g. Effectiveness of the COVID-19 vaccine: real-world evidence from healthcare workers, Vaccine linked to reduction in risk of COVID-19 admissions to hospitals), but four users (one of these users was suspended from Reddit at time of writing) seemed to be skeptical of vaccination (e.g. Hell Gates says Vaccines are Americans’ only hope to return to Normal Life!, Doctors Around the World Issue Dire Warning: DO NOT get the experimental covid vaccine, At What Point Do We Realize Bill Gates Is Dangerously Insane?). Individuals skeptical of vaccination were common among those who posted the most frequently in our data. ### Topic modeling Table 1 indicated the topics in the dataset, their proportions, and the top 10 words for each topic (see Table 1). Broadly, our data centered on the severity of the pandemic (Topic 13), hope for a swift end to the pandemic (Topic 10), and suspicion of science and mainstream media (Topic 5). The severity of the pandemic topic focused on death, risk and sickness in relation to the pandemic. The hope for a swift end to the pandemic topic was about hope, safety and the length of the pandemic. Finally, the suspicion of science topic was around reduced trust and belief in the media and science. We also noted several other topics, such as evidence-based COVID-19 discussion (exploring factually sound and true evidence about COVID-19) (Topic 20), COVID-19 transmission rates and patterns (Topic 12), the effect of the virus on humans (Topic 2), COVID-19 vaccine conspiracy theories and misinformation (e.g. Bill Gates-related vaccine conspiracy theories) (Topic 19), and racism on social media (Topic 15). View this table: [Table 1:](http://medrxiv.org/content/early/2021/04/13/2021.04.09.21255229/T1) Table 1: Structural topic model results from 266,840 documents, March 11 2020 - December 14 2020, including the topic proportion and the top 10 words associated with each topic. We then explored how various events in the vaccine timeline were related to topic prevalence (see Figure 1). We observed an association between AstraZeneca temporarily halting its vaccine trials, and increased discussion around government expenses, such as funds spent on businesses, and supporting the economy. Similarly, we found an association between Johnson and Johnson temporarily halting its vaccine trial, and increased discussion of federal policies and then-US President Donald Trump’s leadership. We found an association among Johnson and Johnson temporarily halting its vaccine trial and greater discussion around suspicion of science and mainstream media. Similarly, there was an association between Pfizer announcing preliminary Phase 3 results showing 90% vaccine efficacy and reduced discussion about suspicion of science and mainstream media. We detailed an association between Johnson and Johnson temporarily halting its vaccine trial and reduced discussion around COVID-19 vaccine conspiracy theories and misinformation. We found an association between Pfizer announcing preliminary vaccine clinical trial results, an increase in discussion around COVID-19 vaccine conspiracy theories and misinformation, and a corresponding decrease in evidence-based COVID-19 discussion. We also found that new COVID-19 deaths and cases were positively associated with increased discussion around COVID-19 vaccine conspiracy theories and misinformation, and suspicion of science and mainstream media, highlighting the relationship between COVID-19 progression and similar rises in misinformation (See Supplement for full results). ### Subreddit networks To understand the relationship between subreddits that shared COVID-19 vaccine posts, we analyzed the greatest component subgraph, as this was substantially larger than all other subgraphs which had 1-5 nodes and did not provide for meaningful conclusions (see Figure 2). The largest node/subreddit (r/Coronavirus, the official community for COVID-19 on Reddit) had 2.4 million users. Nodes connected by an edge shared two to 41 posts. ![Figure 2:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/04/13/2021.04.09.21255229/F2.medium.gif) [Figure 2:](http://medrxiv.org/content/early/2021/04/13/2021.04.09.21255229/F2) Figure 2: The largest subgraph above shows subreddits that have COVID-19 vaccine-related posts. Node sizes represent subreddit user sizes, node edges indicate shared COVID-19 vaccine posts between subreddits, and node labels indicate the subreddit name. Node color is based on centrality (red is higher centrality, yellow is lower centrality). Edge direction was based on whether a node had >50% of its posts prior to its adjacent connecting node e.g. A->B if >50% of A’s shared posts were made before B. We used larger labels for subreddit nodes with >5,000 COVID-19 vaccine posts. We found nine posts that were first posted in r/Coronavirus and then subsequently posted in at least one subreddit. Posts were reposted one to 10 times. Eight of these posts concerned evidence-based information (e.g. COVID-19 timeline, Vaccine development timeline) and were reposted in other subreddits favoring evidence-based information (e.g. AmericanPolitics, worldnews). However, one post (COVID-19 much milder than believed) was aligned with vaccination skepticism and subsequently posted in subreddits favoring vaccine skeptic narratives (e.g. conspiracy, LockdownSkepticism - we read through the first 50 posts in these subreddits and verified they were largely around disagreement with evidence-based measures to mitigate the pandemic). This suggests that misinformation is present in some subreddits which generally feature accurate information. This may also indicate that most posts which start in the main COVID-19 subreddit (r/Coronavirus) and then re-posted in other subreddits tend to be evidence-based. However, a minority of posts in r/Coronavirus are skeptical of vaccination, but then do not get reposted in evidence-based subreddits, but instead in subreddits broadly skeptical of vaccination. ## Discussion Our analysis of 266,840 posts on COVID-19 vaccines between March 11 2020 - December 14 2020 generated several key findings. First, there was a relationship between interim positive announcements followed by increased vaccine misinformation, and a relationship between halting vaccine trials and reduced misinformation discussion. Past research has indicated shifts in vaccine perceptions with time [18, 19, 48]. We expand on that work, suggesting an association between events in the vaccine timeline and vaccine perceptions. Information skeptical of vaccination may flow from a regulated and legitimate source to avenues centering on misinformation and distrust in science. Previous research indicated how antivaccine posts travel online, with users largely moving from one antivaccine post to another [49, 50]. Building on this work, we propose that individuals skeptical of vaccination may selectively highlight posts from legitimate online environments and then forward these posts in arenas aligned with vaccine-skeptic narratives, moving information that was previously under the purview of a more neutral, science-trusting audience to individuals skeptical of vaccination - perhaps providing opportunities to engage such individuals and reduce misinformation. The strength of our work is the use of computational methods to explore how Reddit vaccine perceptions are associated with events in the vaccine timeline and how posts move among environments with differing vaccine perceptions. Such outcome measurement is central to understanding how vaccine perceptions shift, allowing for accurate public health messaging capable of improving COVID-19 vaccine perceptions. There was an association between positive vaccine developments and an increase in discussion of COVID-19 vaccine misinformation, and a relationship between development setbacks and reduced misinformation discussion. Past research has indicated shifting vaccine perceptions over time [5], but there is limited research on specific events, especially vaccine trials and their relationship with vaccine perceptions. Previous work also indicated that COVID-19 misinformation can be remedied with scientific facts [6], but we highlight the complexity of the phenomenon. The spread and production of misinformation can sometimes be due to confirmation bias, where individuals consume, interpret, and favor information that supports their beliefs [51]. For true antivaxxers, COVID-19 vaccine successes may be interpreted as attempts by Bill Gates to track the population through microchips, and thus news around vaccine successes may be interpreted in a misinformation framework, perhaps explaining the relationship between vaccine success and increased discussion around COVID-19 vaccine conspiracy theories. Similarly, when vaccine trials are halted, such news may cohere with antivaxxers, who may have no interest in engaging with news that possibly demonstrates the failure of medical science - given antivaxxers’ distrust of medical experts [52], perhaps explaining the reduced misinformation discussion. Thus, simply presenting scientific data to antivaxxers [6] may not be effective, as demonstrated in a study where presenting some antivaxxers with facts made them more antivaccine [53]. There was a relationship between a vaccine trial halting and increased discussion around suspicion of science and mainstream media, and a vaccine trial being effective and reduced discussion around suspicion of science and mainstream media. Factors such as political conservatism and lower levels of education may be associated with lack of trust in science [54], and we build on such research by suggesting that news around science successes and setbacks is associated with trust in science. In an environment where individuals are unsure what to believe around vaccines [55], we propose that vaccine successes build faith in science, and vaccine setbacks erodes this trust. We also documented how posts skeptical of vaccination may move from more legitimate avenues to arenas where vaccine-skeptic narratives are more popular. In addition, such posts were popular among some highly active users in our dataset. COVID-19 misinformation is present in mainstream environments and does not always get fact-checked [56] and Reddit is no different. Individuals with largely antivaccine beliefs seek out information that coheres with their views [49]. We build on this work and suggest that individuals skeptical of vaccination also look for information from venues that tend to have evidence-based discussion, but then may interpret such information in line with their views and moral foundations, later sharing such information in forums more skeptical of vaccination. This may indicate that skeptics of vaccination do venture out of their echo chambers to enter spaces where accurate information is the norm - presenting attractive opportunities for constructive intervention. To improve COVID-19 vaccine perceptions, minimize misinformation, and increase vaccination rates, public health authorities should conduct tailored interventions and communications campaigns to counter the rhetoric of vaccine misinformation [57, 58]. An example intervention could ask respondents to determine information accuracy around vaccines [59, 60] nudging individuals through the design of these programs toward accurate vaccine information. It is possible that interventions of this sort could shift the beliefs of the vaccine hesitant and thereby boost vaccine uptake, despite potentially little or no effect on committed opponents of vaccination. The concomitant spread of misinformation about COVID-19 vaccines and scientific implications provides insights about the mechanism of misinformation spread. Given our findings around a vaccine trial halting and increased discussion around suspicion of science, we suggest that scientists be more communicative on the difficulties they face in creating vaccines to mitigate science mistrust. Communications campaigns can harness these findings and forward evidence-based posts in subreddits where misinformation is common, when vaccine trial data is released. Given the possibility that individuals seemingly more interested in antivaccine narratives may sometimes venture into more evidence-based environments, interventions can target skeptics or critics of vaccination who sometimes enter more mainstream spaces, engaging them with more evidence-based information, keeping in mind how antivaxxers may deal with such information. Similarly, as legitimate online spaces contain COVID-19 vaccine misinformation, more effective moderation policies can be enacted in these and similar environments e.g. perhaps including a “verified” tag to a post if it comes from a credible source. Such measures may augment health outcomes through several modes. For example, improved vaccine perceptions may lead to reduced vaccine hesitancy and thereby increase vaccine acceptance and COVID-19 vaccination rates. Reduced vaccine misinformation may also improve trust in science and health systems, more broadly, enhancing larger efforts to address health disparities observed in vaccination coverage and many other areas [61]. ## Limitations Our findings relied on the validity of data collected with our search terms. We searched all of Reddit for COVID-19 vaccine posts, and our data contained text fragments representative of vaccine perceptions. We are thus confident in the comprehensiveness of our data. Any use of Reddit data presents several challenges and limitations. As no personal information is collected on Reddit, the demographic makeup of users is unknown [62]. Some evidence suggests that Reddit users are likely to be male, younger than the general population and mostly based in the US [63]. Our results should be interpreted in line with this probable gender and age skew of our data. It was not possible to determine what posts were viewed by skeptics of vaccination in more legitimate subreddits, but subsequently not reposted in subreddits more supportive of anti-vaccine narratives, thereby providing more support for our suggestion around confirmation bias. It is possible that posts were made in one subreddit before another purely due to chance, and that the directionality assumed is due to coincidence. We cannot be certain why individuals created the text in our data, the processes behind the shift in narratives, and why individuals shared the same post in more than one subreddit, and we cannot address these mechanisms with our data. Future work can address these questions and explore the motivations of those creating and sharing such text. We conducted a retrospective and observational study, and thus cannot draw causal conclusions regarding vaccine perceptions. It is possible that other vaccine-related events may have caused the observed changes, and that vaccine success stimulated debate that brought to the surface existing antivaccine discussion, instead of causing it. ## Conclusion There was a relationship between vaccine efficacy and increased discussion around vaccine misinformation, and an association between halting of vaccine trials and reduced misinformation discussion. Posts skeptical of vaccination can move from more legitimate avenues to arenas where antivaccine narratives are more popular, suggesting that individuals skeptical or opposed to vaccines do venture out of their online echo chambers, providing potential opportunities to engage such individuals and reduce misinformation. Given the time period of our data extends to the period immediately prior to the launch of large-scale vaccination among the US public, stakeholders may utilize our findings to better design and inform upcoming vaccination communication efforts. ## Data Availability Data is available from authors' at reasonable request. ## Availability of data and materials Data is available from authors’ at reasonable request. ## Competing interests The authors declare that they have no competing interests. ## CrediT authorship contribution statement All authors made significant contributions to the manuscript. The following were the respective roles for each author: NK, AG, CM, IC, KJ, MH, NY, NH, SNS, KK, SO contributed to the study design, hypothesis generation, data collection, data analysis, data interpretation, and manuscript write-up and review. MDC, JDT, OP, WT, CB, AT, JLS, SMJ, SAM contributed to the manuscript write-up and review. ## Supplement ### Software All analysis was conducted using python and R with the following packages: datetime [64], dplyr [65], ggraph [66], grid [67], gridExtra [68], igraph [69], lubridate [70], NumPy [71], pandas [72], pracma [73], praw [74], quanteda [75], readtext [76], readr [77], stm [43], stminsights [78], splines [79], stringr [80], textclean [81], tidygraph [82], tidytext [83], tidyverse [84]. ### Search terms #### COVID-19 keywords (coronavirus OR coronaviruses OR corona virus OR corona viruses) OR (coronavirus infections OR corona virus infections) OR ’(betacoronavirus OR beta coronavirus OR beta coronaviruses OR betacoronaviruses OR beta corona virus OR beta corona viruses OR betacorona virus OR betacorona viruses) OR (severe acute respiratory syndrome coronavirus OR severe acute respiratory syndrome corona virus) OR SARS CoV-2 OR cov2 OR sars 2 OR COVID OR (coronavirus 2 OR corona virus 2) OR covid19 OR nCov OR (new coronavirus OR new corona virus) OR (novel coronavirus OR novel corona virus) OR (novel coronavirus pneumonia OR novel corona virus pneumonia) OR ncp OR (pneumonia AND (wuhan—china—chinese—hubei)) #### Vaccine keywords (vaccine OR vaccinate OR vaccinated OR vaccinating OR vaccines OR vaccinates OR vaccination OR vaccinations) OR (immunisation OR immunise OR immunising OR immunisations OR immunises OR immunised) Or (immunization OR immunizations OR immunize OR immunized OR immunizes OR immunizing) ### Online news sites historyofvaccines.org/content/articles/coronavirustimeline mmunize.org/timeline biospace.com/article/a-timeline-of-covid-19-vaccine-development fortune.com/2020/12/30/covid-vaccine-first-coronavirus-cases-timeline-2020 ### Vaccination experts We identified key scholars in vaccination through the number of articles (>10) published regarding vaccination. We then contacted the identified researchers and asked them to assist. ### Longlist of vaccine-related events Fauci says he is cautiously optimistic that a vaccine will be effective and achieved within 1 or 2 years (May 12 2020) United States and AstraZeneca Form Vaccine Deal (May 21 2020) Moderna Vaccine Begins Phase 3 Trial, Receives $472M From then-US President Donald Trump’s Administration (July 27 2020) AstraZeneca Halts Phase 3 Vaccine Trial (September 8 2020) Johnson Johnson Halts Vaccine Trial (October 12 2020) Pfizer announcing preliminary vaccine clinical trial results showing 90% efficacy (November 9 2020) ### Topic modeling Within topic modeling, a topic is a distribution over a vocabulary [85]. For example, in a topic denoted “vape”, there is likely a greater probability that the terms “smoke” and “device” occur than the words “peanut” and “tomato”. “Smoke” may appear in both “vape” and “cooking” topics with different contextual meanings. Given the topic is a distribution, “smoke” may appear with other high-probability terms like “roast” and “fry” in the “cooking” topic, but with terms like “nicotine” and “device” in the “vape” topic. Thus, topics can be understood as if a person was to talk about a topic and when doing so, tended to use some words than others when the topic is “cooking” compared to “vape”. Topic models are apt for analyzing large quantities of textual data via an automated technique for providing context. The key innovation of STM is that it can incorporate metadata or information about each document. This allows metadata covariates, such as new COVID-19 cases per day, to influence topic discovery. Metadata can affect both topic prevalence and content. Metadata covariates for topical prevalence allow the metadata to affect topic frequency. Similarly, covariates in topical content allow the metadata to affect the word rate within a topic or how a topic is discussed [43]. The STM process will output documents and vocabulary for analysis [43]. Output can be investigated in a range of ways, such as detailing words associated with topics or the relationship between metadata and topics. Model output can be used to conduct hypothesis testing around these relationships. The number of topics was based on our understanding of the dataset and how other researchers interpreted STM results [46, 86]. Choosing the number of topics was also influenced by post-estimation validation outcomes and past work [46]. As per standard content analysis [87], topic model validation also needs qualitative review, where researchers assess the interpretability and relative efficacy of models based on their subject matter expertise and data context. Our final model [k=20] provided the greatest external validity and most semantically coherent output of distinctive topics. Above the indicated number of topics, there were diminishing returns for solutions, as the substantive meaning and coherence of categories started to break down. Below the indicated number of topics, variation decreased and specific topics got placed into more generic categories. Validating a topic model is not the same as evaluating a statistical model regarding a population sample [88]. The goal is to identify the framework which best describes the data, not estimating population parameters [88]. Most of the text was produced and consumed by people who were interested in the COVID-19 vaccine, and this lens was used to interpret the presence/absence of topics and words. Most of the topic labels were straightforward and did not require much interpretation. To characterize topics in the COVID-19 vaccine narrative, we qualitatively coded each topic by investigating word clouds based on each topic and reviewing exemplar documents which detailed high proportions of each topic [85]. The topic we classified as “Economy and markets” had the following most frequently occurring words: market, company, stock, product, industry, supply, price, sell, demand, billion, economy, trade, million, invest, high. Exemplar documents which exhibited high proportions of this topic indicated a preoccupation with these words. Thus, the interpretation of the topic was clear, given the genre of the narrative and relying on research regarding prominent topics around the COVID-19 vaccine. Topic validation is key to assessing whether the substantive meaning of the topic and its words are parallel with the qualitative meaning of the text and we used methodological guidance from past research for this purpose [85, 41]. Past work advocated the use of sample documents to validate each topic’s substantive meaning. Determining the number of sample documents to use is based on the amount of resolution needed by a social scientist to answer the research question using topic modeling methods [89]. Thus, determining the number of sample documents is a largely qualitative process, dependent on the research question at hand. To determine the appropriate number of documents to sample, we searched the social science literature for studies that used topic modeling, based on the following study characteristics: 1) similar research questions as our study; 2) similar topic areas as our study; 3) study data was drawn from similar sources as our study. We searched databases such as Web of Science Core Collection, Embase, PsycINFO, MEDLINE and Sociological Abstracts. We used keywords such as vaccine, misinformation, and topic modeling. The paper by Farrell (2016) was determined to be most similar to our study based on the assessed characteristics. Farrell (2016) explored ideological polarization in climate change and used a broad range of sources, such as press releases, published papers, and website articles. Based on the nature of the research question and large range of sources, Farrell (2016) determined that a sample of 50 documents was sufficient to validate the substantive meaning of the topic output. Given the similarities between Farrell’s (2016) study and ours on a range of characteristics, we similarly determined that a sample of 50 documents was adequate to validate the topics. We used findThoughts and plotQuote within the STM package to examine the top 50 associated documents for each topic to validate a topic’s substantive meaning. Determination of the top 50 documents was based on ranking topics by the maximum a posteriori estimate of the topic’s theta value, which represents the modal estimate of the proportion of word tokens assigned to the topic with the model. These top 50 documents were read by two of the authors to determine validity (k>0.8). A third author resolved disagreements where necessary. ### Regression results for models underlying Figure 1 View this table: [Supplementary Table 1:](http://medrxiv.org/content/early/2021/04/13/2021.04.09.21255229/T2) Supplementary Table 1: Regression output for difference in topic proportions before and after AstraZeneca halted Phase 3 vaccine trials (September 8 2020) View this table: [Supplementary Table 2:](http://medrxiv.org/content/early/2021/04/13/2021.04.09.21255229/T3) Supplementary Table 2: Regression output for difference in topic proportions before and after Johnson & Johnson halted vaccine trials (October 12 2020) View this table: [Supplementary Table 3:](http://medrxiv.org/content/early/2021/04/13/2021.04.09.21255229/T4) Supplementary Table 3: Regression output for difference in topic proportions before and after Pfizer announced 90% vaccine efficacy (November 9 2020) ## Highlights Association between vaccine trials and discussion on vaccine misinformation. Vaccine discussion moves from legitimate avenues to antivaccine arenas. Findings have implications for development of vaccine perception interventions. Interventions may nudge individuals toward accurate information. ## Acknowledgements Study was funded by the Yale Institute for Global Health and the Whitney and Betty MacMillan Center for International and Area Studies at Yale University. The funding bodies had no role in the design, analysis or interpretation of the data in the study. This study was pre-registered on the Open Science Framework (osf.io/urp2a). * Received April 9, 2021. * Revision received April 9, 2021. * Accepted April 13, 2021. * © 2021, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at [http://creativecommons.org/licenses/by-nc-nd/4.0/](http://creativecommons.org/licenses/by-nc-nd/4.0/) ## References 1. [1].World Health Organization and others. Access to Covid-19 Tools (Act) Accelerator. Commitment and Call to Action, Geneva; 2020. 2. [2].Mallapaty S. Are COVID vaccination programmes working? Scientists seek first clues. Nature. 2021. 3. [3].Loomba S, de Figueiredo A, Piatek S, de Graaf K, Larson HJ. Measuring the Impact of Exposure to COVID-19 Vaccine Misinformation on Vaccine Intent in the UK and US. medRxiv. 2020. 4. [4].Vaughan E, Tinker T. Effective health risk communication about pandemic influenza for vulnerable populations. American journal of public health. 2009;99(S2):S324–S332. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2105/AJPH.2009.162537&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19797744&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F04%2F13%2F2021.04.09.21255229.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000271218800018&link_type=ISI) 5. [5].de Figueiredo A, Simas C, Karafillakis E, Paterson P, Larson HJ. Mapping global trends in vaccine confidence and investigating barriers to vaccine uptake: a large-scale retrospective temporal modelling study. The Lancet. 2020;396(10255):898–908. 6. [6].Jamison AM, Broniatowski DA, Dredze M, Sangraula A, Smith MC, Quinn SC. Not just conspiracy theories: Vaccine opponents and proponents add to the COVID-19 ‘infodemic’on Twitter. Harvard Kennedy School Misinformation Review. 2020;1(3). 7. [7].Funk C, Kennedy B, Hefferon M. Vast majority of Americans say benefits of childhood vaccines outweigh risks. Pew Research Center. 2017. 8. [8].Vraga EK, Bode L. Correction as a Solution for Health Misinformation on Social Media. 2020;110:S278–S280. 9. [9].Geldsetzer P. Knowledge and perceptions of COVID-19 among the general public in the United States and the United Kingdom: A cross-sectional online survey. Annals of internal medicine. 2020. 10. [10].Wakabayashi D, Alba D, Tracy M. Bill gates, at odds with trump on virus, becomes a right-wing target. The New York Times. 2020. Available from: [https://www.nytimes.com/2020/04/17/technology/bill-gates-virus-conspiracy-theories.html](https://www.nytimes.com/2020/04/17/technology/bill-gates-virus-conspiracy-theories.html). 11. [11].MacDonald NE, et al. Vaccine hesitancy: Definition, scope and determinants. Vaccine. 2015;33(34):4161–4164. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.vaccine.2015.04.036&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25896383&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F04%2F13%2F2021.04.09.21255229.atom) 12. [12].Dubé E, Vivion M, MacDonald NE. Vaccine hesitancy, vaccine refusal and the anti-vaccine movement: influence, impact and implications. Expert review of vaccines. 2015;14(1):99–117. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1586/14760584.2015.964212&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25373435&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F04%2F13%2F2021.04.09.21255229.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000346183500009&link_type=ISI) 13. [13].Lazarus JV, Ratzan SC, Palayew A, Gostin LO, Larson HJ, Rabin K, et al. A global survey of potential acceptance of a COVID-19 vaccine. Nature medicine. 2020:1–4. 14. [14].Pennycook G, McPhetres J, Zhang Y, Rand D. Fighting COVID-19 misinformation on social media: Experimental evidence for a scalable accuracy nudge intervention. PsyArXiv Preprints. 2020;10. 15. [15].Donzelli G, Palomba G, Federigi I, Aquino F, Cioni L, Verani M, et al. Misinformation on vaccination: A quantitative analysis of YouTube videos. Human vaccines & immunotherapeutics. 2018;14(7):1654–1659. 16. [16].Djidjou-Demasse R, Michalakis Y, Choisy M, Sofonea MT, Alizon S. Optimal COVID-19 epidemic control until vaccine deployment. medRxiv. 2020. 17. [17].Salathé M, Bonhoeffer S. The effect of opinion clustering on disease outbreaks. Journal of The Royal Society Interface. 2008;5(29):1505–1508. 18. [18].McAndrew S, Allington D. Mode and Frequency of Covid-19 Information Updates, Political Values, and Future Covid-19 Vaccine Attitudes. PsyArXiv; 2020. Available from: psyarxiv.com/j7srx. 19. [19].Galvão J. COVID-19: the deadly threat of misinformation. The Lancet Infectious Diseases. 2020. 20. [20].Bonnevie E, Gallegos-Jeffrey A, Goldbarg J, Byrd B, Smyser J. Quantifying the rise of vaccine opposition on Twitter during the COVID-19 pandemic. Journal of Communication in Healthcare. 2020:1–8. 21. [21].Karlsson LC, Soveri A, Lewandowsky S, Karlsson L, Karlsson H, Nolvi S, et al. Fearing the Disease or the Vaccine: The Case of COVID-19. Personality and Individual Differences. 2020:110590. 22. [22].García LY, Cerda AA. Contingent assessment of the COVID-19 vaccine. Vaccine. 2020;38(34):5424–5429. 23. [23].Williams L, Gallant AJ, Rasmussen S, Brown Nicholls LA, Cogan N, Deakin K, et al. Towards intervention development to increase the uptake of COVID-19 vaccination among those at high risk: Outlining evidence-based and theoretically informed future intervention content. British Journal of Health Psychology. 2020;25(4):1039–1054. 24. [24].Enitan S, Oyekale A, Akele R, Olawuyi K, Olabisi E, Nwankiti A, et al. Assessment of Knowledge, Perception and Readiness of Nigerians to participate in the COVID-19 Vaccine Trial. International Journal of Vaccines and Immunization. 2020;4:1–13. 25. [25].Bell S, Clarke R, Mounier-Jack S, Walker JL, Paterson P. Parents’ and guardians’ views on the accept- ability of a future COVID-19 vaccine: A multi-methods study in England. Vaccine. 2020;38(49):7789–7798. 26. [26].Manikonda L, Beigi G, Liu H, Kambhampati S. Twitter for sparking a movement, reddit for sharing the moment:# metoo through the lens of social media. arXiv preprint arxiv:180308022. 2018. 27. [27].Priya S, Sequeira R, Chandra J, Dandapat SK. Where should one get news updates: Twitter or reddit. Online Social Networks and Media. 2019;9:17–29. 28. [28].Chohan UW. Counter-Hegemonic Finance: The Gamestop Short Squeeze. Available at SSRN. 2021. 29. [29].Glenski M, Pennycuff C, Weninger T. Consumers and curators: Browsing and voting patterns on reddit. IEEE Transactions on Computational Social Systems. 2017;4(4):196–206. 30. [30].Tomeny TS, Vargo CJ, El-Toukhy S. Geographic and demographic correlates of autism-related anti- vaccine beliefs on Twitter, 2009-15. Social science & medicine. 2017;191:168–175. 31. [31].Love B, Himelboim I, Holton A, Stewart K. Twitter as a source of vaccination information: content drivers and what they are saying. American journal of infection control. 2013;41(6):568–570. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajic.2012.10.016&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23726548&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F04%2F13%2F2021.04.09.21255229.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000319801900021&link_type=ISI) 32. [32].Salathé M, Khandelwal S. Assessing vaccination sentiments with online social media: implications for infectious disease dynamics and control. PLoS Comput Biol. 2011;7(10):e1002199. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pcbi.1002199&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22022249&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F04%2F13%2F2021.04.09.21255229.atom) 33. [33].Papakyriakopoulos O, Serrano JCM, Hegelich S. Political communication on social media: A tale of hyperactive users and bias in recommender systems. Online Social Networks and Media. 2020;15:100058. 34. [34].Mickoleit A. Social media use by governments: A policy primer to discuss trends, identify policy opportunities and guide decision makers. 2014. 35. [35].Saha K, Torous J, Ernala SK, Rizuto C, Stafford A, De Choudhury M. A computational study of mental health awareness campaigns on social media. Translational behavioral medicine. 2019;9(6):1197–1207. 36. [36].Baumgartner J. Pushshift API (version 1.0). API Documentation, Pushshift [https://pushshiftio/apiparameters](https://pushshiftio/apiparameters). 2018. 37. [37].Boe B, Pedersen A, Mellor T. Python Reddit API Wrapper; 2016. 38. [38].Jiang ZP, Levitan SI, Zomick J, Hirschberg J. Detection of Mental Health from Reddit via Deep Contextualized Representations. In: Proceedings of the 11th International Workshop on Health Text Mining and Information Analysis; 2020. p. 147–156. 39. [39].LaViolette J, Hogan B. Using platform signals for distinguishing discourses: The case of men’s rights and men’s liberation on Reddit. In: Proceedings of the International AAAI Conference on Web and Social Media. vol. 13; 2019. p. 323–334. 40. [40].Mohr JW, Bogdanov P. Introduction—Topic models: What they are and why they matter. Poetics. 2013;41(6):545–569. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.poetic.2013.10.001&link_type=DOI) 41. [41].Blei DM. Probabilistic topic models. Commun ACM. 2012;55(4):77–84. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1145/2133806.2133826&link_type=DOI) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000302915000026&link_type=ISI) 42. [42].Roberts ME, Stewart BM, Airoldi EM. A model of text for experimentation in the social sciences. Journal of the American Statistical Association. 2016;111(515):988–1003. 43. [43]. E Rm, Stewart BM, Tingley D. stm: R package for structural topic models; 2014. 44. [44].Csse J. Covid-19 data repository by the center for systems science and engineering (csse) at johns hopkins university; 2020. 45. [45].Organization WH, et al. WHO Director-General’s opening remarks at the media briefing on COVID-19-11 March 2020. Geneva, Switzerland; 2020. 46. [46].Grimmer J, Stewart BM. Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political analysis. 2013;21(3):267–297. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/pan/mps028&link_type=DOI) 47. [47].Buntain C, Golbeck J. Identifying social roles in reddit using network structure. In: Proceedings of the 23rd international conference on world wide web; 2014. p. 615–620. 48. [48].Pananos AD, Bury TM, Wang C, Schonfeld J, Mohanty SP, Nyhan B, et al. Critical dynamics in population vaccinating behavior. Proceedings of the National Academy of Sciences. 2017;114(52):13762–13767. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czoxMjoiMTE0LzUyLzEzNzYyIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjEvMDQvMTMvMjAyMS4wNC4wOS4yMTI1NTIyOS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 49. [49].Tang L, Fujimoto K, Amith MT, Cunningham R, Costantini RA, York F, et al. “Down the Rabbit Hole” of Vaccine Misinformation on YouTube: Network Exposure Study. Journal of Medical Internet Research. 2021;23(1):e23262. 50. [50].Massey PM, Kearney MD, Hauer MK, Selvan P, Koku E, Leader AE. Dimensions of Misinformation About the HPV Vaccine on Instagram: Content and Network Analysis of Social Media Characteristics. Journal of medical Internet research. 2020;22(12):e21451. 51. [51].Treen KMd, Williams HT, O’Neill SJ. Online misinformation about climate change. Wiley Interdisci- plinary Reviews: Climate Change. 2020;11(5):e665. 52. [52].Whitehead M, Taylor N, Gough A, Chambers D, Jessop M, Hyde P. The anti-vax phenomenon. The Veterinary Record. 2019;184(24):744. [FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiRlVMTCI7czoxMToiam91cm5hbENvZGUiO3M6NjoidmV0cmVjIjtzOjU6InJlc2lkIjtzOjEwOiIxODQvMjQvNzQ0IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjEvMDQvMTMvMjAyMS4wNC4wOS4yMTI1NTIyOS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 53. [53].Nyhan B, Reifler J, Richey S, Freed GL. Effective messages in vaccine promotion: a randomized trial. Pediatrics. 2014;133(4):e835–e842. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MTA6InBlZGlhdHJpY3MiO3M6NToicmVzaWQiO3M6MTA6IjEzMy80L2U4MzUiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMS8wNC8xMy8yMDIxLjA0LjA5LjIxMjU1MjI5LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 54. [54].Ecklund EH, Scheitle CP, Peifer J, Bolger D. Examining links between religion, evolution views, and climate change skepticism. Environment and Behavior. 2017;49(9):985–1006. 55. [55].Goldenberg MJ. Antivaccination movement exploits public’s distrust in scientific authority. BMJ. 2019;367. 56. [56].Evanega S, Lynas M, Adams J, Smolenyak K, Insights CG. Coronavirus misinformation: quantifying sources and themes in the COVID-19 ‘infodemic’. JMIR Preprints. 2020. 57. [57].Shah PD, Calo WA, Gilkey MB, Boynton MH, Dailey SA, Todd KG, et al. Questions and concerns about HPV vaccine: a communication experiment. Pediatrics. 2019;143(2). 58. [58].Buttenheim AM, Joyce CM, Ibarra J, Agas J, Feemster K, Handy LK, et al. Vaccine exemption requirements and parental vaccine attitudes: an online experiment. Vaccine. 2020;38(11):2620–2625. 59. [59].Pennycook G, Rand DG. Fighting misinformation on social media using crowdsourced judgments of news source quality. Proceedings of the National Academy of Sciences. 2019;116(7):2521–2526. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czoxMDoiMTE2LzcvMjUyMSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIxLzA0LzEzLzIwMjEuMDQuMDkuMjEyNTUyMjkuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 60. [60].Barnett PA, Hoskins CE, Alhoti JA, Carpenter LJ. Reducing public misinformation about organ donation: An experimental intervention. Journal of Social Distress and the Homeless. 2009;18(1-2):57–73. 61. [61].Wesson DE, Lucey CR, Cooper LA. Building trust in health systems to eliminate health disparities. Jama. 2019;322(2):111–112. 62. [62].Amaya A, Bach R, Keusch F, Kreuter F. New data sources in social science research: Things to know before working with Reddit data. Social Science Computer Review. 2019:0894439319893305. 63. [63].Duggan M, Smith A. Cell internet use 2013. Washington, DC: Pew Research Center; 2013. 64. [64].DateTime; 2018. Available from: [https://pypi.org/project/DateTime/](https://pypi.org/project/DateTime/). 65. [65].Wickham H, François R, Henry L, Müller K. dplyr: A Grammar of Data Manipulation; 2020. R package version 0.8.5. Available from: [https://CRAN.R-project.org/package=dplyr](https://CRAN.R-project.org/package=dplyr). 66. [66].Pedersen TL. ggraph: An Implementation of Grammar of Graphics for Graphs and Networks; 2020. R package version 2.0.3. Available from: [https://CRAN.R-project.org/package=ggraph](https://CRAN.R-project.org/package=ggraph). 67. [67].R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria; 2020. Available from: [https://www.R-project.org/](https://www.R-project.org/). 68. [68].Auguie B. gridExtra: Miscellaneous Functions for “Grid” Graphics; 2017. R package version 2.3. Available from: [https://CRAN.R-project.org/package=gridExtra](https://CRAN.R-project.org/package=gridExtra). 69. [69].Csardi G, Nepusz T. The igraph software package for complex network research. InterJournal. 2006;Complex Systems:1695. Available from: [https://igraph.org](https://igraph.org). 70. [70].Grolemund G, Wickham H. Dates and Times Made Easy with lubridate. Journal of Statistical Software. 2011;40(3):1–25. Available from: [https://www.jstatsoft.org/v40/i03/](https://www.jstatsoft.org/v40/i03/). 71. [71].Oliphant TE. A guide to NumPy. vol. 1. Trelgol Publishing USA; 2006. 72. [72].McKinney W, et al. Data structures for statistical computing in python. In: Proceedings of the 9th Python in Science Conference. vol. 445. Austin, TX; 2010. p. 51–56. 73. [73].Borchers HW. pracma: Practical Numerical Math Functions; 2021. R package version 2.3.3. Available from: [https://CRAN.R-project.org/package=pracma](https://CRAN.R-project.org/package=pracma). 74. [74].Boe B. PRAW: The Python Reddit API Wrapper; 2012. Available from: [https://github.com/praw-dev/praw/](https://github.com/praw-dev/praw/). 75. [75].Benoit K, Watanabe K, Wang H, Nulty P, Obeng A, Müller S, et al. quanteda: An R package for the quantitative analysis of textual data. Journal of Open Source Software. 2018;3(30):774. Available from: [https://quanteda.io](https://quanteda.io). 76. [76].Benoit K, Obeng A. readtext: Import and Handling for Plain and Formatted Text Files; 2020. R package version 0.80. Available from: [https://CRAN.R-project.org/package=readtext](https://CRAN.R-project.org/package=readtext). 77. [77].Wickham H, Hester J. readr: Read Rectangular Text Data; 2020. R package version 1.4.0. Available from: [https://CRAN.R-project.org/package=readr](https://CRAN.R-project.org/package=readr). 78. [78].Schwemmer C. stminsights: A ’Shiny’ Application for Inspecting Structural Topic Models; 2018. R package version 0.3.0. Available from: [https://CRAN.R-project.org/package=stminsights](https://CRAN.R-project.org/package=stminsights). 79. [79].Wang W, Yan J. splines2: Regression Spline Functions and Classes; 2021. R package version 0.4.1. Available from: [https://CRAN.R-project.org/package=splines2](https://CRAN.R-project.org/package=splines2). 80. [80].Wickham H. stringr: Simple, Consistent Wrappers for Common String Operations; 2019. R package version 1.4.0. Available from: [https://CRAN.R-project.org/package=stringr](https://CRAN.R-project.org/package=stringr). 81. [81].Rinker TW. textclean: Text Cleaning Tools. Buffalo, New York; 2018. Version 0.9.3. Available from: [https://github.com/trinker/textclean](https://github.com/trinker/textclean). 82. [82].Pedersen TL. tidygraph: A Tidy API for Graph Manipulation; 2020. R package version 1.2.0. Available from: [https://CRAN.R-project.org/package=tidygraph](https://CRAN.R-project.org/package=tidygraph). 83. [83].Silge J, Robinson D. tidytext: Text Mining and Analysis Using Tidy Data Principles in R. JOSS. 2016;1(3). Available from: [http://dx.doi.org/10.21105/joss.00037](http://dx.doi.org/10.21105/joss.00037). 84. [84].Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, et al. Welcome to the tidyverse. Journal of Open Source Software. 2019;4(43):1686. 85. [85].Roberts ME, Stewart BM, Tingley D, et al. Structural topic models for open-ended survey responses. Am J Polit Sci. 2014;58(4):1064–1082. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/ajps.12103&link_type=DOI) 86. [86].Wallach HM, Murray I, Salakhutdinov R, Mimno D. Evaluation methods for topic models. In: Proceedings of the 26th annual international conference on machine learning; 2009. p. 1105–1112. 87. [87].Krippendorff K. Content analysis: An introduction to its methodology. Sage publications; 2018. 88. [88].DiMaggio P, Nag M, Blei D. Exploiting affinities between topic modeling and the sociological perspective on culture: Application to newspaper coverage of US government arts funding. Poetics. 2013;41(6):570–606. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.poetic.2013.08.004&link_type=DOI) 89. [89].Nikolenko SI, Koltcov S, Koltsova O. Topic modelling for qualitative studies. Journal of Information Science. 2017;43(1):88–102.