Abstract
As the COVID-19 vaccination campaign unfolds as one of the most rapid and widespread in history, it is important to continuously assess the real world safety of the FDA-authorized vaccines. Curation from large-scale electronic health records (EHRs) allows for near real-time safety evaluations that were not previously possible. Here, we advance context- and sentiment-aware deep neural networks over the multi-state Mayo Clinic enterprise (Minnesota, Arizona, Florida, Wisconsin) for automatically curating the adverse effects mentioned by physicians in over 108,000 EHR clinical notes between December 1st 2020 to February 8th 2021. We retrospectively compared the clinical notes of 31,069 individuals who received at least one dose of the Pfizer/BioNTech or Moderna vaccine to those of 31,069 unvaccinated individuals who were propensity matched by demographics, residential location, and history of prior SARS-CoV-2 testing. We find that vaccinated and unvaccinated individuals were seen in the the clinic at similar rates within 21 days of the first or second actual or assigned vaccination dose (first dose Odds Ratio = 1.13, 95% CI: 1.09-1.16; second dose Odds Ratio = 0.89, 95% CI: 0.84-0.93). Further, the incidence rates of all surveyed adverse effects were similar or lower in vaccinated individuals compared to unvaccinated individuals after either vaccine dose. Finally, the most frequently documented adverse effects within 7 days of each vaccine dose were arthralgia (Dose 1: 0.59%; Dose 2: 0.39%), diarrhea (Dose 1: 0.58%; Dose 2: 0.33%), erythema (Dose 1: 0.51%; Dose 2: 0.31%), myalgia (Dose 1: 0.40%; Dose 2: 0.34%), and fever (Dose 1: 0.27%; Dose 2: 0.31%). These remarkably low frequencies of adverse effects recorded in EHRs versus those derived from active solicitation during clinical trials (arthralgia: 24-46%; erythema: 9.5-14.7%; myalgia: 38-62%; fever: 14.2-15.5%) emphasize the rarity of vaccine-associated adverse effects requiring clinical attention. This rapid and timely analysis of vaccine-related adverse effects from contextually rich EHR notes of 62,138 individuals, which was enabled through a large scale Artificial Intelligence (AI)-powered platform, reaffirms the safety and tolerability of the FDA-authorized COVID-19 vaccines in practice.
Introduction
Following their Emergency Use Authorizations by the Food and Drug Administration (FDA) in December of 2020, more than 52 million doses of BNT162b2 (Pfizer/BioNTech) and mRNA-1273 (Moderna) COVID-19 vaccines have been administered in the United States1–3. Phase III trials demonstrate strong efficacy and favorable safety profiles for these vaccines in the cohorts studied. Specifically, the trials showed 95.0% (95% CI, 90.3 to 97.6) and 94.1% (95% CI, 89.3 to 96.8%) efficacy for BNT162b2 and mRNA-1273 respectively. While self-resolving mild to moderate adverse effects were common in vaccinated participants, serious adverse effects occurred rarely and with a frequency comparable to placebo4,5. Local adverse effects of any severity reported in these trials included injection site pain (84.1-92%), injection site swelling (10.5-70%), and injection site erythema (9.5-14.6%). Systemic effects of any severity included fatigue (62.9-70%), headache (55.1-64.7%), myalgia (38.3-61.5%), arthralgia (23.6-46.4%), chills (31.9-45.4%), fever (14.2-15.5%), and nausea/vomiting (1.1-23.0%)6,7.
Consistent with CDC recommendations, early vaccination efforts (Phase 1a) in the United States have targeted healthcare workers, residents and staff at long term care facilities, who are at elevated risk for COVID-19 relative to the general public8–11. As the vaccines continue to be administered more broadly, it will be critical to continuously evaluate real world safety and efficacy data from all those who have received these vaccines. This approach may validate existing findings or highlight differences in the larger population compared to the clinical experience with respect to vaccine efficacy and adverse effects. Of note, the populations undergoing vaccination may differ meaningfully from the trial populations. For example, healthcare workers and long-term care residents receiving the first wave of vaccines in Phase 1a likely comprised a small fraction of the Phase 3 trial populations. Additionally, some pregnant women are receiving COVID-19 vaccines in the real world12, while they were excluded from both trials4,5. Finally, continuous monitoring will help to identify and better quantify the frequency of rare severe adverse effects such as anaphylaxis, which was widely reported but only observed in a small number of individuals after approval of both vaccines13–17.
One approach to post-authorization surveillance of vaccine efficacy and safety is via the real-time analysis of patient data stored in electronic health record (EHR) systems. We have previously developed and described augmented curation methods to rapidly create and compare cohorts of COVID-19 patients within a large EHR system, and have recently applied these methods to assess the real world efficacy of both BNT162b2 and mRNA-1273 in over 31,000 individuals receiving these vaccines at the Mayo Clinic and associated health system18–22. Here we expand on these efforts to study the adverse effects experienced by individuals after COVID-19 vaccination in the clinical environment.
It should be noted that monitoring vaccine-associated adverse effects in a clinical trial setting and outside of trial environments are quite different. In clinical trials, participants are aware that they are receiving an experimental product, and adverse effect reporting is encouraged or solicited. On the other hand, individuals receiving a COVID-19 vaccine during the mass vaccination campaign are informed of adverse effects that are likely to be experienced and can even be discouraged from seeking medical attention unless the symptom is particularly severe. Thus, adverse effects are likely to be captured only in the EHR of individuals who experience symptoms which are severe or persistent enough to warrant a return to clinic, or who happen to have a previously scheduled routine clinical visit in the post vaccination time period. Accordingly, the intention of our analysis is not to determine whether real world data recapitulates the adverse effect frequencies reported in prior trials. Instead, it is to (1) establish the rates at which individuals actually report potential vaccine-associated adverse effects to health care practitioners (HCPs) in several defined time intervals after vaccination, and (2) determine whether these rates of adverse effect reporting rates are unexpectedly high.
To address the latter point, it is critical to establish the baseline frequency at which each adverse effect is expected to be documented in the clinical notes of our vaccinated cohort. We thus employed a 1-to-1 propensity matching procedure to derive a cohort of unvaccinated individuals who are balanced for demographic factors, residential location, and history of SARS-CoV-2 PCR testing (see Methods)23, and we similarly curated their clinical notes over the same time period to quantify the frequency of the defined symptoms of interest. This propensity matched group serves a purpose similar to that of the placebo arm in clinical trial safety assessments, allowing us to contextualize and better interpret the absolute rates of adverse effects documented in the notes of vaccinated individuals.
We find that individuals undergoing COVID-19 vaccination do not return to the clinic at unexpectedly high rates in the 7, 14, or 21 days following either vaccine dose. Further, the rates of adverse effects (headache, myalgia, arthralgia, fatigue, fever, diarrhea, nausea/vomiting, anaphylaxis, facial paralysis, lymphadenopathy, and erythema) documented in EHR notes over these time intervals are not higher in vaccinated individuals compared to the unvaccinated control cohort. This study supports the safety and tolerability of BNT162b2 and mRNA-1273 in practice, further strengthening the case for the rapid and broad distribution of vaccines to the public.
Methods
Study design, setting and population
This is a retrospective study of individuals who underwent polymerase chain reaction (PCR) testing for suspected SARS-CoV-2 infection at the Mayo Clinic and hospitals affiliated with the Mayo Clinic Health System. This study was reviewed by the Mayo Clinic Institutional Review Board (IRB) and determined to be exempt from the requirement for IRB approval (45 CFR 46.104d, category 4). Subjects were excluded if they did not have a research authorization on file.
The cohorts of vaccinated and unvaccinated individuals considered for this study are identical to the cohorts considered in a previous analysis: “FDA-authorized COVID-19 vaccines are effective per real-world evidence synthesized across a multi-state health system”23. In total, there were 507,525 individuals in the Mayo electronic health record (EHR) database who received a PCR test between February 15, 2020 and February 8, 2021. To obtain the study population, we defined the following inclusion criteria: (1) at least 18 years old; (2) no positive SARS-CoV-2 PCR test before December 1, 2020; (3) resides in a locale (based on Zip code) with at least 25 individuals who have received BNT162b2 or mRNA-1273. This population included 249,708 individuals, of whom 31,623 have received BNT162b2 or mRNA-1273 and 218,085 have no record of COVID-19 vaccination. Vaccinated individuals who had tested positive for SARS-CoV-2 by PCR between December 1, 2020 and the date of their first vaccine dose were excluded, resulting in 31,299 individuals. Individuals with zero follow-up days after vaccination (i.e., those who received the first vaccine dose on the date of data collection) were also excluded, leaving 31,069 individuals in the final vaccinated cohort.
The propensity matched unvaccinated cohort was selected from the previously derived set of 218,085 unvaccinated individuals. The purpose of this cohort was to establish the baseline frequency of EHR documentation for each adverse effect of interest in a cohort which is clinically similar to our vaccinated cohort. These baselines, or expected, frequencies can then be compared to the observed frequencies to determine whether or not these adverse effects are reported at unexpectedly high rates among patients receiving a COVID-19 vaccine.
The selection algorithm and its associated counts are summarized in Figure 1. More details on the propensity score matching procedure are provided in the prior manuscript23. This includes a table showing that the cohorts were indeed balanced for all covariates included in propensity score matching (standardized mean difference < 0.1 for each one) and figures illustrating the distributions of age and total follow-up time between the first vaccine dose and the end of the data collection period.
Definition of time intervals for safety analyses
For each vaccinated individual, we defined the date of their first vaccine dose as Day V1 and the date of their second vaccine dose as Day V2. In the Results section, these are referred to as “actual” dates of vaccination. For each unvaccinated individual, Day V1 and Day V2 were designated as identical to their matched vaccinated individual. In the Results section, these are referred to as “assigned” dates of vaccination.
Definition of adverse effects of interest
The adverse effects considered were primarily derived from those assessed in phase 3 trials of BNT162b2 and mRNA-12734,5, including fatigue, fever, myalgia, arthralgia, headache, lymphadenopathy, erythema, diarrhea, and vomiting. We also included anaphylaxis and facial paralysis (Bell’s palsy), as these rare events have been reported in individuals receiving COVID-19 vaccines as well. Each adverse effect was mapped to a set of synonyms intended to capture the most common ways that a given phenotype would be referenced in the context of a clinical note.
Curation of adverse effects from clinical notes
To curate the adverse effects experienced by each patient from the electronic health record, we used a BERT-based neural network model24 to classify the sentiment for the phenotypes (described above) mentioned in the clinical notes. Specifically, this classification model categorizes phenotype-containing sentences into one of four categories: (1) confirmed diagnosis (2) ruled-out diagnosis, (3) possibility of disease, and (4) alternate context (e.g., family history). This classification model was trained on 18,500 sentences and has shown an out-of-sample accuracy of 93.6% with precision and recall scores above 95%25. For each individual, we applied the sentiment model to the clinical notes in the Mayo Clinic electronic health record during our defined intervals of interest for each individual: (1) Day V1 to 7, 14, or 21 days after Day V1, and (3) Day V2 to 7, 14, or 21 days after Day V2. For each phenotype, we identified the first date on which the given individual had at least one sentence in which the phenotype was categorized as “confirmed diagnosis” with a confidence score of at least 90%. For the phenotypes anaphylaxis and facial paralysis, each such sentence was manually reviewed to verify the positive sentiment (i.e. confirmed diagnosis) and to assess the tense of this sentiment (i.e. past vs. present). Only sentences which confirm a present diagnosis were used to count adverse effects in this study.
Evaluating rates of return to clinic after vaccination
To evaluate the likelihood of returning to the clinic after vaccination, we counted the number of individuals who had at least one clinical note in the 7, 14, and 21 days after Day V1 and Day V2. The fraction of individuals with clinical follow-up was calculated as the number of individuals with at least clinical note in the time window divided by the total number of individuals in each group (n = 31,069 for first vaccine dose; n = 17,067 for second vaccine dose). The difference in clinical follow-up rates was assessed by calculating the odds ratio (OR) along with its corresponding 95% confidence interval (CI) using the fisher.test function in R version 1.3.95926. The null hypothesis was that the OR falls between 0.91 and 1.1 (i.e., the larger rate is at most 10% larger than the smaller rate); thus, an OR was considered significant if the upper bound of the 95% CI was less than 0.91 or the lower bound of the 95% CI was greater than 1.1.
Evaluating vaccine adverse effects among the total cohorts
To evaluate adverse effects associated with receiving a COVID-19 vaccine in the clinical setting, we compared the two populations described above and summarized in Figure 1: (1) 31,069 individuals with follow-up who received BNT162b2 or mRNA-1273 and did not have a prior positive SARS-CoV-2 PCR test (“vaccinated”), and (2) 31,069 propensity matched individuals who have never received a COVID-19 vaccine and did not have a positive SARS-CoV-2 PCR test before the first vaccination date (dose 1) of their matched individual (“unvaccinated”).
The incidence of a given adverse effect after each vaccine dose was assessed by computing the incidence rate ratio (IRR) of the vaccinated and unvaccinated cohorts. Specifically, we evaluated adverse effects which were documented in clinical notes within 7 days of receiving the first vaccine dose (Day V1 to 7 days after Day V1) or the second vaccine dose (Day V2 to 7 days after Day V2). For each cohort in a defined time period, incidence rates were calculated as the number of individuals experiencing the given adverse effect in that time period divided by the total number of at-risk person-days contributed in that time period. For each individual, at-risk person-days are defined as the number of days from the start of the time period in which the day on which the individual experienced the adverse effect or died, or four days prior to testing positive for SARS-CoV-2. The IRR was calculated as the incidence rate of the vaccinated cohort divided by the incidence rate of the unvaccinated cohort, and its 95% CI was computed using an exact approach described previously27. The IRR was considered to be statistically significant if the 95% CI did not include 1.
Evaluating vaccine adverse effects in individuals who returned to clinic
Because only a fraction of individuals in the vaccinated and unvaccinated cohorts contributed clinical notes during the defined intervals after their actual or assigned vaccination dates, we also assessed the rates of adverse effects documented in the EHR specifically among the individuals who had returned to clinic. To select the cohorts for this analysis, we considered any pair of matched vaccinated and unvaccinated individuals who both contributed at least one clinical note which contained at least one phenotype term in the 7, 14, or 21 days after the actual or assigned date of first or second vaccination. The number of such matched pairs in each time interval is provided in Table S1. For each time interval We then computed incidence rates for each adverse effect in the vaccinated and unvaccinated cohorts, along with the incidence rate ratios and 95% CI as described above. The IRR was considered to be statistically significant if the 95% CI did not include 1.
Results
Vaccinated individuals do not return to clinic at unexpectedly high rates after either vaccine dose
To assess rates of clinical follow-up after study enrollment, we compared the number of vaccinated and unvaccinated individuals with EHR notes within 7, 14, or 21 days of each actual or assigned vaccination date. In the 7 days after the first vaccine dose, 6,121 of the 31,069 (19.7%) vaccinated individuals had at least one EHR note, compared to 5,789 of 31,069 (18.6%) unvaccinated individuals (Odds Ratio = 1.07; 95% CI: [1.03-1.12]) (Table 1). This difference was not considered significant as the lower bound of the 95% CI was less than 1.1 (see Methods). The rates of return to clinic were also similar between these cohorts within 14 and 21 days of the first dose (Odds Ratio14 Days = 1.11 [1.07-1.15]; Odds Ratio21 Days = 1.13 [1.09-1.16]) (Table 1) Within 7 days of the second dose, 2,513 of 17,067 (14.7%) unvaccinated individuals contributed EHR notes, compared to 2,778 (16.3%) of 17,067 unvaccinated individuals (Odds Ratio = 0.89 [0.84-0.94]) (Table 2). This difference was not considered significant as the upper bound of the 95% CI was greater than 0.91 (see Methods). Similarly, the rates of return to clinic were similar between these cohorts within 14 and 21 days of the second dose (Odds Ratio14 Days = 0.88 [0.83-0.93]; Odds Ratio21 Days = 0.89 [0.84-0.93]) (Table 2).
In summary, these findings show that individuals receiving COVID-19 vaccines do not tend to return to the clinic in the subsequent weeks at higher-than-expected rates, which suggests favorable tolerability of these vaccines.
Documented vaccine associated adverse effects in EHR notes are rare compared to the frequencies reported in clinical trials
Among the 31,069 individuals in each cohort, the most commonly documented symptoms in the 7 days after the first vaccine dose included arthralgia (183 [0.59%] vaccinated individuals; 171 [0.55%] unvaccinated individuals), diarrhea (179 [0.58%]; 266 [0.86%]), erythema (159 [0.51%]; 184 [0.59%]), myalgia (125 [0.40%]; 135 [0.43%]), and fever (85 [0.27%]; 198 [0.64%]) (Table 3). The same events were also the most frequently documented in the 7 days following the second vaccine dose (n = 17,067 individuals per group): arthralgia (67 [0.39%] vaccinated individuals; 81 [0.47%] unvaccinated individuals), diarrhea (56 [0.33%]; 120 [0.70%]), erythema (53 [0.31%]; 93 [0.54%]), myalgia (58 [0.34%]; 62 [0.36%]), and fever (31 [0.31%]; 109 [0.64%]) (Table 4).
Notably, these rates of adverse effects documented in EHR notes were markedly lower than the rates of adverse effects observed in clinical trials (e.g. fatigue: 63-70%; myalgia: 38-62%; arthralgia: 24-46%; fever: 14-16%; erythema: 10-15%)6,7. This is to be expected, as individuals vaccinated in the real world are advised that it is normal to experience these adverse effects, and so they are less likely to report them to a healthcare provider. In line with this, the vaccine associated adverse effects which are captured in EHR notes are likely to be those that are severe or persistent enough to cause an individual to return to clinic or otherwise notify their HCP.
Adverse effects are not documented more frequently in the EHR notes of vaccinated individuals compared to those of propensity-matched unvaccinated individuals
To test whether documentation of each adverse effect was actually associated with receiving a COVID-19 vaccine, we computed the incidence rate ratio (IRR) for each adverse effect between the vaccinated and propensity matched unvaccinated cohorts. Most symptoms, including myalgia, arthralgia, fatigue, and headache, had similar incidence rates during the 7 days after the first or second actual or assigned vaccine dose, with the confidence intervals including 1 (Tables 3-4). This was also true of the more serious adverse effects, anaphylaxis and facial paralysis (Tables 3-4). Several symptoms actually had a significantly lower incidence rate in the 7 days after both doses in the vaccinated cohort, including fever (IRRdose 1 = 0.43 [0.33-0.55]; IRRdose 2 = 0.28 [0.18-0.43]), diarrhea (IRRdose 1 = 0.67 [0.55-0.81]; IRRdose 2 = 0.46 [0.33-0.64]), and lymphadenopathy (IRRdose 1 = 0.55 [0.4-0.75]; IRRdose 2 = 0.41 [0.24-0.68]).
These trends of vaccinated individuals showing similar or lower incidence rates for each considered adverse effect were also true in the 14 and 21 days following actual or assigned vaccination dates (Tables S2-S5). In summary, this data supports the tolerability of BNT162b2 and mRNA-1273, as individuals receiving COVID-19 vaccines do not return to the clinic to report adverse effects at higher rates than propensity matched unvaccinated individuals.
Among matched pairs who return to clinic, adverse effects are reported at similar rates in the vaccinated and unvaccinated cohorts
To complement our previous analysis, we also considered the rates of adverse effect documentation in only the subset of matched pairs who each contributed at least one EHR note in the 7, 14, or 21 days following the first or second actual or assigned vaccine dose. In the 7 days after the first actual or assigned vaccination date (n = 1,432 pairs), the IRR 95% CIs for every adverse effect except diarrhea included 1, and the incidence rate of diarrhea was higher in the unvaccinated cohort (IRR = 0.47 [0.29-0.73]) (Table 5). After the second vaccination date (n = 944 pairs), the IRR 95% CIs all included 1 except for myalgia, which showed a slightly higher incidence rate in vaccinated individuals (IRR = 2.7 [1.1-7.6]) (Table 6).
Again, vaccinated individuals who returned to the clinic also showed similar or lower incidence rates for each considered adverse effect in the 14- and 21-days post vaccination compared to their matched unvaccinated individuals who also returned to the clinic (Tables S6-S9). Taken together, this analysis further corroborates the conclusion that adverse effects are not documented in the EHR of vaccinated individuals at unexpectedly high rates.
Discussion
This study demonstrates that the two currently FDA-authorized COVID-19 vaccines, BNT162b2 and mRNA-1273, are safe and tolerated beyond the confines of a clinical trial setting. This conclusion is consistent with the extensive safety and tolerability assessments conducted in Phase I/II and Phase III trials over the past nine months4,5,28,29. Here we assessed real world safety and tolerability by longitudinally curating the EHR documentation of adverse effects in 31,069 individuals receiving at least one dose of a COVID-19 vaccine compared to a propensity matched unvaccinated cohort of the same size. Compared to this control cohort, vaccinated individuals were not more likely to return to the clinic or to report any of the surveyed adverse effects within 7, 14, or 21 days after the first or second vaccine dose. When considering only the pairs of matched individuals contributing at least one phenotype-containing clinical note in these follow-up windows, only myalgia was documented more frequently in EHR notes of vaccinated patients within 7 days of the second vaccine dose (IRR = 2.7; 95% CI: 1.1-7.6), while the other adverse effects were documented at similar rate or lower rates in the vaccinated cohort.
Our finding that EHR notes from vaccinated and unvaccinated individuals record similar rates of potential vaccine associated adverse effects differs from the data obtained in Phase III trials, wherein vaccinated participants experienced higher rates of symptoms. Further, the absolute rates of adverse effects documented in these EHR notes are well below the rates reported in clinical trials. These discrepancies are likely attributable to differences in methodology of reporting and recording symptoms and events. Both trials included a 7-day post-vaccination period in which symptoms were actively solicited from some or all individuals as well as longer periods in which unsolicited adverse effects and serious adverse reactions were recorded from all individuals. In contrast, our methods rely exclusively on the recording of unsolicited symptoms or events in the EHR. Given that individuals are warned of the likely vaccine associated adverse effects at the time of vaccination in the real world setting, it is likely that most mild or moderate symptoms are never actually reported and thus are not documented in an EHR note. It is also possible that HCPs, who comprise a significant proportion of the vaccinated population thus far, are less likely to report adverse effects given their clinical expertise and ability to self-assess the severity of their illness.
That said, serious safety concerns by definition require medical care and thus are likely to be documented in the EHR. For example, should an individual experience anaphylaxis, this individual will likely require emergent care, during which one or more clinical notes will be written and will mention this phenotype. Thus, our method should identify the symptoms and phenotypes that represent the most serious threats to vaccine safety and tolerability of practical significance. Indeed, this is the central reason why our analysis should be viewed as complementary to the data which has been obtained in the more controlled setting of clinical trials. While the observed adverse effect frequencies in the trial setting are extremely valuable, our assessment specifically aims to describe the frequencies of adverse effects that receive some form of clinical attention as evidenced by their documentation in the EHR.
The purpose of using propensity matching in this study was to establish an expected frequency for each potential vaccine associated adverse effect in a group of individuals with similar demographic and geographic characteristics. The finding that individuals in the propensity matched unvaccinated cohort were seen in clinic more frequently than those from the vaccinated cohort after the second vaccine dose was surprising, but there are several potential explanations for this. As was previously demonstrated, the incidence rate of testing positive for SARS-CoV-2 was indeed lower in the vaccinated cohort, with a particularly strong effect observed in the time windows beyond the second dose23. Thus, we would indeed expect fewer vaccinated individuals to be seen in the clinic for COVID-19. Further, there may be behavioral or psychological differences that set in after receiving the COVID-19 vaccine, wherein individuals who have received both doses feel less compelled to visit the clinic if they experience flu-like symptoms. Finally, there may be a genuine imbalance in the baseline frequency of clinical visits due to the enrichment of healthcare workers in the vaccinated cohort. We addressed these potential confounding factors by performing sub-analyses on only the matched pairs of individuals who both contributed at least one EHR note in each time interval of interest. The fact that adverse effects were still not observed more frequently among vaccinated individuals in this sub-analysis strengthen our overall conclusions regarding the safety and tolerability of these vaccines.
This study does have several important limitations to consider. First, while the analysis was conducted on a population derived from a large healthcare system, the cohort demographics are not representative of the American population. For example, both the vaccinated and unvaccinated cohorts were predominantly Caucasian (>90%) and female (>60%). These biases likely reflect both the populations who receive care at the various Mayo Clinic centers and the populations who have been prioritized in Phase 1a of the vaccine rollout. Second, the BERT model used to curate EHR notes does not imply a direct link between COVID-19 vaccination and the experience of a phenotype. That is, we simply capture the occurrence of an adverse effect without ensuring that the clinical note indeed suggests or confirms that vaccination caused the symptom. This shortcoming is addressed by comparing vaccinated individuals to the unvaccinated control cohort, which establishes a baseline expected frequency for each symptom in the absence of vaccination. Finally, while sentences suggesting the occurrence of anaphylaxis or facial paralysis were manually reviewed to confirm both the positive sentiment and the tense, sentences for the other curated phenotypes were not reviewed. In the future, we will train natural language processing models to discriminate past from present tense, thereby circumventing the need for this manual review.
As the remainder of the US population undergoes COVID-19 vaccination, it will not be feasible to conduct solicited reporting of adverse effects in all vaccinated individuals. However, the use of augmented curation for real world safety monitoring presents a practical solution to this problem. The method demonstrated here represents a scalable approach to continuously monitor serious safety concerns associated with any authorized COVID-19 vaccines. Taken together with our recent study highlighting the real-world efficacy of these vaccines23, this data reinforces that individuals, providers, and public health officials should proceed rapidly with vaccination efforts, with high confidence in their efficacy and safety.
Data Availability
After publication, the data will be made available upon reasonable requests to the corresponding author. A proposal with detailed description of study objectives and the statistical analysis plan will be needed for evaluation of the reasonability of requests. Deidentified data will be provided after approval from the corresponding author and the Mayo Clinic.
Data Availability
After publication, the data will be made available upon reasonable requests to the corresponding author. A proposal with detailed description of study objectives and the statistical analysis plan will be needed for evaluation of the reasonability of requests. Deidentified data will be provided after approval from the corresponding author and the Mayo Clinic.
Declaration of Interests
RM, PL, ES, AP, SA, CP, VA, AJV, PA, AR, CC, KC, DD, NK, ER, GB, AM, TW, and VS are employees of nference and have financial interests in the company and in the successful application of this research. JCO receives personal fees from Elsevier and Bates College, and receives small grants from nference, Inc, outside the submitted work. ADB is a consultant for Abbvie, is on scientific advisory boards for nference and Zentalis, and is founder and President of Splissen therapeutics. JH, JCO, GJG, AWW, AV, MDS, and ADB are employees of the Mayo Clinic. The Mayo Clinic may stand to gain financially from the successful outcome of the research. This research has been reviewed by the Mayo Clinic Conflict of Interest Review Board and is being conducted in compliance with Mayo Clinic Conflict of Interest policies.
Author Contributions
VS, PL, and SA conceived the study. RM, PL, AJV and VS wrote the manuscript and reviewed the findings. ES, AP, SA, CP, VA, PA, AR, CC, KC, DD, NK, ER, GB, AM, TW contributed methods, analysis, and software. JCOH, GJG, AWW, ADB, MDS, AV, and JH reviewed the study, findings, and the manuscript. All authors revised the manuscript.
Supplemental Material
Acknowledgements
The authors thank Murali Aravamudan for the careful review and feedback on this manuscript.
Footnotes
↵+ Joint first authors