Abstract
Despite extensive scientific research supporting the safety and effectiveness of approved vaccines, debates about their use continue in the public sphere. A paper prominently circulated on social media concluded that countries requiring more infant vaccinations have higher infant mortality rates (IMR), which has serious public health implications. However, inappropriate data exclusion and other statistical flaws in that paper merit a closer examination of this correlation. We re-analyzed the original data used in Miller and Goldman’s study to investigate the relationship between vaccine doses and IMR. We show that the sub-sample of 30 countries used in the original paper was not a random sample from the entire dataset, as the correlation coefficient of 0.49 reported in that study would only arise about 1 in 100,000 times from random sampling. Next, we show IMR as a function of countries’ actual vaccination rates, rather than vaccination schedule, and show a strong negative correlation between vaccination rates and IMR. Finally, we analyze United States IMR data as a function of Hepatitis B vaccination rate to show an example of increased vaccination rates corresponding with reduced infant death over time. From our analyses, it is clear that vaccination does not predict higher IMR as previously reported.
Introduction
Vaccines as a Critical Public Health Issue
Development of vaccinations are viewed as one of the greatest public health successes of all time. Widespread immunization has resulted in the control of many infectious diseases that were previously devastating and lethal, including smallpox, poliomyelitis, measles, rubella, tetanus, diphtheria, Haemophilus influenzae type b, and others(1–3). However, anti-vaccination movements have existed since vaccines were first introduced, and recent waves of this skepticism have led to the resurgence of diseases that were previously controlled(3,4). Recently, this public debate has intensified due to the rapid development and distribution of the COVID-19 vaccine(5).
The term “vaccine hesitancy” has been used to describe the uneasiness of individuals and parents who are unsure about vaccination(4,6). Understanding the factors that lead to vaccine hesitancy has been difficult and complex, and researchers have discovered there are many context-specific and variable factors at play that impact vaccination decisions and behavior, including understanding of scientifically-based risks versus benefits, perceived personal risks versus benefits, and concerns about the vaccination schedule(4,6). This hesitancy has been seen to affect behavior. For example, Martin and Petrie found that mistrust of vaccine benefits and worries about unforeseen future effects of vaccines were statistically predictive of past vaccine refusal and future intentions to refuse vaccination(7).
In the case of vaccines, there can be much more at stake than just the impact on one individual in a community. Vaccination of a large portion of the population (e.g. > 90%) protects the entire population by eliminating disease transmission; this is essential to help those who are medically unable to be vaccinated(8). This indirect protection and community benefit has been observed with various vaccines(9–12), demonstrating that vaccine refusal affects more than just the individual who does not get immunized. For example, Salmon et al. examined the effect of vaccine exemption on both the individual and their community(13). The authors concluded that those who claimed vaccination exemption status were 35 times more likely to contract measles; however, if the number of individuals claiming exemption were to double, even nonexempt individuals could see up to a 30% increase in the incidence of measles. Overall, disease outbreaks are more likely in areas that contain larger numbers of unvaccinated individuals (e.g., Centers for Disease Control and Prevention: retrieved from https://www.cdc.gov/measles/cases-outbreaks.html). Thus, addressing vaccine hesitancy by increasing public confidence in vaccine safety has the potential to positively impact public health and save lives(14).
Vaccine Misinformation and its Spread
Exposure to anti-vaccine information can directly affect vaccine intentions(15), and exposure to misinformation is more widespread than ever with increased use of the internet and social media(5,16,17). Not only can any information be shared on social media, regardless of its validity, but information can also be amplified quickly and spread virally(18–20). A 2018 study found that sophisticated bots and content polluters are more likely to post about vaccines than average Twitter users, often with anti-vaccine content(21). Research suggests that automated users are at least partially inflating anti-vaccine content and amplifying misinformation online, and this can have serious public health implications(22). This widely disseminated misinformation makes it difficult for individuals to determine which sources of information to trust and can affect their vaccine decisions(23,24).
Miller and Goldman’s 2011 Paper and the Purpose of this Study
In their 2011 paper, Miller and Goldman(25) examined the correlation between infant mortality rate (IMR) and infant vaccine scheduling in various countries. They concluded that vaccine schedules with a greater number of vaccine doses for infants are correlated with higher IMR, proposing the potential for synergistic toxicity of vaccines. This is in sharp contrast to the scientific consensus that vaccines are safe and beneficial for infants even when given with other vaccines(26–29). Although the 2011 study was published in a peer-reviewed journal (Human and Experimental Toxicology), a brief reading of the Miller and Goldman manuscript led us to question the methods, results and conclusions. We observed significant deficiency in the statistical methods. Thus it is troublesome that this manuscript is in the top 5% of all research outputs since its publication, being shared extensively on social media with tens of thousands of likes and re-shares (see https://acs.altmetric.com/details/406556).
To be trustworthy, science must be self-correcting(30). Sometimes these corrections are a refinement of current understanding (e.g. Einstein’s advances in Physics(31)), and sometimes they are a reversal of incorrect conclusions. This continual revision of the scientific record is normal, and an essential part of the scientific enterprise. It is critical that flawed scientific publications are recognized, as these can cause serious harm(32). In the case of vaccinations, faulty research impacts not just an individual who avoids vaccinations, but also the public health and safety of the population as a whole(8). Due to the disproportionate effect Miller and Goldman’s 2011 paper has had on the public conversation about vaccine safety compared to other scientific publications, we repeated their analysis to examine whether their conclusions are justified.
Methods
All data and scripts used for the calculations and figures in this manuscript are publicly available on GitHub at https://github.com/PayneLab/vaccine_reevaluation.
Data Sources
IMR and immunization schedule data from the sources referenced in the original paper were used for Figure 1. The “CIA Country comparison: infant mortality rate data” (2009) was no longer available on www.cia.gov, the website referenced in the original paper. However, the same dataset was found on http://teacherlink.ed.usu.edu/tlresources/reference/factbook/rankorder/2091rank.html. We determined that this was an identical dataset to what was used by Miller and Goldman by manually confirming that the metrics for each of the 30 included countries were identical. For long term preservation, this file has been added to our GitHub repository, see ∼/data/2009_IMR_data.txt.
Immunization schedule data, detailing the ages at which each vaccine is recommended within each country, was collected from the “WHO/UNICEF Immunization Summary: A Statistical Reference Containing Data Through 2008 (The 2010 edition),” as referenced in the original paper (https://data.unicef.org/wp-content/uploads/2015/12/Immunization_Summary_2008_53.pdf). This file is now saved in our GitHub, see ∼/data/Immunization_Summary_2008_53.pdf.
Vaccine doses administered, as used in Figure 3, were downloaded from UNICEF data warehouse: https://data.unicef.org/resources/data_explorer/unicef_f/. We selected the following variables for download:
infant mortality rate,
under-five mortality rate,
child mortality rate (aged 1-4 years),
Percentage of live births who received bacille Calmette-Guerin (vaccine against tuberculosis),
Percentage of surviving infants who received the first dose of DTP-containing vaccine,
Percentage of surviving infants who received the third dose of DTP-containing vaccine,
Percentage of surviving infants who received the third dose of hep B-containing vaccine,
Percentage of live births who received hepatitis-B-containing vaccine within 24 hours of birth,
Percentage of surviving infants who received the third dose of Hib-containing vaccine,
Percentage of surviving infants who received the first dose of inactivated polio-containing vaccine,
Percentage of surviving infants who received the first dose of measles-containing vaccine,
Percentage of children who received the 2nd dose of measles-containing vaccine, as per administered in the national schedule,
Percentage of surviving infants who received the third dose of pneumococcal conjugate-containing vaccine (PCV),
Percentage of surviving infants who received the third dose of inactivated polio-containing vaccine,
Percentage of surviving infants who received the first dose of rubella-containing vaccine,
Percentage of surviving infants who received the last dose of rotavirus-containing vaccine (2nd or 3rd dose depending on vaccine used),
Percentage of surviving infants who received yellow fever-containing vaccine (for countries at risk and where the vaccine is in the national schedule)
The resulting data have been added to our GitHub repository as ∼/data/Unicef_vaccination_doses_2019.txt
Hepatitis B vaccination rates and longitudinal IMR were retrieved from https://apps.who.int/gho/data/node.main.A828?lang=en and https://childmortality.org respectively. These data have been saved in our GitHub repository, see ∼/data/HepBdata.xls and ∼/data/UNIGME-2020-Country-Sex-specific_U5MR-CMR-and-IMR.xlsx.
Collecting Vaccine Schedule Data
Combined vaccine dose counts were counted as the number of individual vaccines administered in the combined vaccine (Ex: DTaP = 3 vaccines) multiplied by the number of times the vaccine was scheduled for administration before 12 months of age (Ex: 3 doses of DTaP = 3 doses * 3 vaccines/dose = 9 vaccine doses), as described in Miller and Goldman’s paper. For consistency we used the following criteria for counting vaccine doses, as it matched closest with the numbers included in Miller and Goldman’s paper: only vaccinations scheduled for less than 12 months, or ranges up to 12 months, were included; vaccinations scheduled for high-risk groups, subnational, military groups, travelers, children of carriers, pertussis contraindication, and HIV+ infants were not included. For example, Pneumo_ps is recommended for only high-risk groups in many countries, and so this was not counted in our metric. Doses were manually counted following these criteria and appended to the IMR data and stored as a file called Figure_1_Data.csv.
Analysis
All of the software and files used in this manuscript, including the code for generating images, is saved in our public GitHub repository https://github.com/PayneLab/vaccine_reevaluation. Data and code used to create Figure 1 and the associated correlation metrics can be found in ∼/code/Make_Figure_1.R. Data and code used to create Figure 2 can be found in ∼/code/Make_Figure_2.R. Sampling of the 30 countries was done at random, and repeated 50,000 times to generate a distribution of potential correlation values. We calculated the simple mean, median, standard deviation, IQR and z-score using the base R functions (see the code). For the vaccination rate and IMR analyses accompanying Figure 3, we calculated simple linear regression with the data downloaded from UNICEF as noted above. All implementation details are available in our GitHub in the script ∼/code/Make_Figure_3.R. We calculated the effect of Hepatitis B vaccination rate with a Spearman correlation, using the data from sources above. Data and code used to create Figure 4 can be found in ∼/code/Make_Figure_4.R. IMR rates were separated by sex because Hepatitis B is more prevalent in males(33,34) and the IMR is characteristically distinct by sex.
Results
A prime conclusion for the manuscript by Miller and Goldman(25) is that “nations that require more vaccine doses tend to have higher infant mortality rates.” At the time of publication, a corrigendum was published to notify readers of unreported affiliations and conflicts of interest for the authors(35). However, as we show herein, the most important problem with the manuscript is not the authors’ conflict of interest. Rather, it appears that their conclusion could only be reached by omitting >80% of the available data. Moreover, a re-analysis of the full dataset does not support the original conclusion.
Limitations of the Miller and Goldman Study
One of the major errors of Miller and Goldman’s analysis was unexplained data exclusion(25). In their paper, data from only 30 nations was used, despite the fact that data for 185 countries were available in their original data source (Figure 1). Within the text they state that they included “the immunization schedules for the United States and all 33 nations with better IMRs than the United States.” However, there is no scientific reason given for the exclusion of nations with IMR higher than the United States. In fact, the vast majority of the data excluded from analysis had both fewer vaccinations than the US and also a significantly higher IMR than the US. Strikingly, the manuscript itself discusses IMR data for Gambia and Mongolia, both of which were excluded from the statistical analysis, demonstrating that the authors were aware of these data.
Data from four additional nations were excluded with a tenuous explanation. Liechtenstein, San Marino, Andorra and Monaco are small European countries with relatively low IMR and a high number of vaccinations. Miller and Goldman removed these countries “because they each had fewer than five infant deaths.” It is likely their small population has very few annual infant deaths. However, it is unclear why this criteria should lead to their exclusion. Miller and Goldman stated that including these nations would produce “extremely wide confidence intervals and IMR instability”, suggesting that nations could have been included or excluded based on their effect on a statistical metric. Excluding data inappropriately can lead to selection bias, contributing to misinformation in the scientific community(36).
Reanalysis Including Data for Previously Excluded Countries
To re-evaluate the hypothesis that vaccines are associated with infant mortality, we repeated the linear regression analysis using vaccine schedule and IMR data for all countries. This is possible because the original data used by Miller and Goldman is publicly available (see Methods). When all of the data were included, the positive correlation between IMR and immunization schedules disappeared (R2 = 0.026 vs. R2 = 0.493). This indicates that there is no relationship between increasing vaccination schedules and infant mortality.
In order to better visualize how extreme Miller and Goldman’s result was even within their own dataset, we randomly sampled 30 countries from the full dataset of 185 countries and computed the linear regression. This sampling was done 50,000 times, and the distribution of regression results was plotted (Figure 2). We then determined the degree to which Miller and Goldman’s result (R2 = 0.493) may be considered an outlier. Within this distribution of random samples, the mean R2 was 0.049 with a standard deviation of 0.053. We calculated the z-score of 0.493 against our distribution to be 8.3, meaning there is approximately a 1 in a 100,000 chance that this result was achieved with a random sample of the dataset. To verify this, we performed 1 million random samplings, and the most extreme R2 observed was 0.577, with only 10 samples’ R2 exceeding 0.493. Therefore, we conclude that the sample of 30 countries from the Miller and Goldman analysis is not representative of the true dataset.
Vaccination Rate, Not Just Schedule
In the original analysis by Miller and Goldman, they used the vaccine schedule and not the actual data on vaccine doses administered, claiming vaccination rates were high enough it would not affect the results. However, in countries/locales with poor access to health care, the recommended set of vaccinations might not be available to a significant fraction of the population. Therefore, to more clearly answer the question about whether vaccination is related to infant mortality, we compared the vaccination rate for each country against the infant mortality rate. Data from UNICEF includes 2019 global statistics on vaccination rate and IMR for 8 different types of vaccines (see Methods). In agreement with previous literature demonstrating the benefit of vaccines, we show that higher vaccination rates lead to lower infant mortality rates for 7 of the 8 vaccinations tested (Figure 3).
Vaccine Impact Over Time
It is curious that the original Miller and Goldman study did not examine longitudinal data to evaluate their hypotheses. If vaccines were really affecting infant mortality, then the introduction of new vaccines should be correlated with a rise in infant mortality. Therefore, we propose a different test to evaluate the impact of increasing the number of vaccinations. Specifically, we want to evaluate the infant mortality over a time period when a new vaccination becomes common and the impact of that specific addition can be assessed. The Hepatitis B vaccine was introduced in 1981(37) and became common in the United States in the 1990s. We identified a dataset for vaccine doses administered and infant mortality which covers the timeframe of HepB vaccine adoption (see Methods).
We examined the relationship between Hepatitis B vaccination rate (percent of one-year-old children vaccinated) and IMR for male and females in the United States from 1993 - 2019 (Figure 4). For both males and females there is a modest decrease in IMR from 1993 - 1996 as percent of one-year-old children vaccinated for HepB approaches approximately 85%. Between 1996 and 2019, when Hepatitis vaccination remains consistently high, infant mortality drops significantly from 8.5% to 6% in males and from 7% to 5% in females. Although there are potential confounding factors, these data suggest that if there is any relationship between Hepatitis B vaccination and IMR it is a lowering of the infant mortality rate. Similar conclusions have been drawn by other studies analyzing vaccine effectiveness(38,39).
Discussion
Our findings indicate that the conclusions previously suggested by Miller and Goldman(25) are false. More recommended vaccine doses are not associated with an increase in infant mortality. Their conclusion could only be reached by extensive omission of available data (see Figures 1 and 2). When we repeated the analysis with previously excluded data, the positive correlation between vaccination scheduling and IMR disappeared.
While Miller and Goldman claimed that using vaccination rates rather than the schedule would be unlikely to alter their results, we found that using vaccination rates did challenge their 2011 conclusions. When we examined the association between actual vaccination rates and IMR, we found a consistent and strong negative correlation, with higher vaccination rates predicting less infant death (Figure 3). This result better aligns with the scientific consensus about the benefits of vaccination, even with many vaccines given together (26–29). Finally, we presented a case study with longitudinal data demonstrating a lowering of infant mortality in the United States coincident with the widespread adoption of the Hepatitis B vaccine over time (Figure 4). If synergistic toxicity truly exists, as proposed by Miller and Goldman, adding new vaccines to the schedule would have had the opposite effect over time.
The Miller and Goldman study had other limitations not directly addressed in our study. For example, it only looks at the initial effect of vaccines on infants. Vaccines are developed for diseases that affect the entire age spectrum of the population, e.g. infants, adolescents and adults. Therefore, to correctly evaluate the public health impact of vaccination, one would need to include the lives saved when vaccinated individuals no longer acquire these diseases. Such analyses are part of current published literature, e.g. measles cases in the US which dropped dramatically in the 1960s coincident with the introduction of a measles vaccine(40). Another major flaw of the paper’s analysis was simplistic implementation of statistical methods. When studying a question as complicated as infant mortality, including only one variable (vaccination schedule) in regression analyses is not considered best practice. Furthermore, linear regression and correlation coefficients are heavily influenced by outliers(41), which were removed as noted above (see Figure 1). The Miller and Goldman manuscript mentions potential covariates and confounding factors, including many socio-economic factors known to play a critical role in infant mortality(42–45). Unfortunately, even the ones available in their dataset were not included in their analysis.
In the context of the current vaccine debate, it is important that accurate information about vaccine safety is accessible. A vast literature exists for the development and clinical testing of individual vaccines (e.g. refs(46–49)), evaluation of vaccination schedules (e.g. refs(26,50–54)) and public health studies testing their efficacy within society (e.g. refs(55–59)). Unfortunately, many individuals get their information from social media, which is not a curated or validated source, and many of these social media users lack the scientific training to evaluate the validity of what they see. In this setting, a single manuscript can have an inordinate impact on public discourse, even when it is demonstrably false. While corrections and retractions are not always successful at preventing the original misinformation from impacting public debate, repeated corrections and retractions can help alleviate the effects of misinformation(60).
Data Availability
All data and analyses are availability on GitHub at https://github.com/PayneLab/vaccine_reevaluation.
Acknowledgements
This work was performed in the BYU Bioinformatics Capstone course, and was not supported by an external funding agency.
Footnotes
Updated abstract in system to match abstract in revision PDF.