Abstract
There is some evidence that tuberculosis vaccine bacillus Calmette-Guérin (BCG) has non-specific beneficial effects against non-related infections. Here, we examined the possible association between BCG vaccination with prevalence and mortality by COVID-19 by using publicly available data of COVID-19 in 199 countries/regions and the BCG World Atlas. By using linear regression modeling, we found that the number of total cases and deaths per one million population were significantly associated with the country’s policy concerning BCG vaccine administration. The amount of variance in cases and deaths explained by BCG vaccination policy ranged between 12.5% and 38%. Importantly, this effect remained significant after controlling for the country’s life expectancy and the average temperature in February and March 2020, which themselves are significantly correlated with the cases and deaths indices, respectively. By contrast, the ratio between deaths and cases was weakly affected. This latter outcome suggested that BCG vaccination may have hindered the overall spread of the virus or progression of the disease rather than reducing mortality rates (i.e., deaths/cases ratio). Finally, by roughly dividing countries into three categories showing high, middle, or low growth rate of the cases, we found a highly significant difference between the slope categories among the BCG groups, suggesting that the time since the onset of the spread of the virus was not a major confounding factor. While this retrospective epidemiological study potentially suffers from a number of unknown confounding factors, these associations support the idea that BCG vaccination may provide protection against SARS-CoV-2, which, together with its proven safety, encourages consideration of further detailed epidemiological studies, large-scale clinical trials on the efficacy of this vaccine on COVID-19, and/or re-introduction of BCG vaccination practice in the countries which are currently devoid of the practice.
Introduction
The recent outbreak of coronavirus disease 2019 (COVID-19) caused by the 2019 novel coronavirus (SARS-CoV-2) is one of the most serious threats to the human race in this millennium. There appears to be a large variability among countries in the risk of infection and mortality by this virus1. For example, Japan, which has close physical and social distance from China, is one of the first countries that recorded the first case in the earliest phase of this pandemic. As of March 30th, while many governments have ‘locked-down’ numerous cities and even the entire country, the Japanese government still keeps applying mild measures to mitigate the spread of the virus. The citizens have simply been ‘advised’ to stay home. Nevertheless, most people in Japan, with the notable exception of Tokyo metropolitan, are currently working as usual. Surprisingly, despite such a liberal measure taken and relatively low number of PCR tests conducted, Japan is ranked at 117th and 69th among 199 countries and regions1, as of March 28th, regarding the number of cases per 1 million population and the number of deaths per 1 million population, respectively. Also, there seem to be visible differences in COVID-19 prevalence and mortality between countries in Western Europe and Eastern Europe. There are many potential geographical, social, and biological factors, such as temperature, humidity, life expectancy, average income, social norms, and ethnical genetic background, that could potentially explain such variabilities among countries, but it remained fully explored. In case they are successfully identified, such information could potentially contribute to the mitigation of the spread of the virus.
Some vaccines (e.g., measles and Bacillus Calmette–Guérin (BCG) vaccines) are associated with a lower risk of illness and death from other disorders2–4. A systematic review of epidemiological studies provided evidence for the non-specific beneficial effects of BCG vaccine on all-cause mortality5,6. Induction of cytokines associated with trained immunity is proposed to underly such protection against non-related viral infections7. Researchers in four countries will soon start a clinical trial of BCG vaccine on this disease, based on the proposed beneficial effect of BCG4. Interestingly, a blogger noticed that there might be a correlation between the administration of BCG vaccine in each country and the extent to which the spread of COVID-19 occurs8. In this study, we evaluated the hypothesis that BCG vaccination has some protective effect against COVID-19 by utilizing publicly available datasets through linear regression modeling. Among the potentially confounding factors, we assessed two factors, life expectancy and the average temperature of the countries. We also tried to evaluate the potential effect of BCG vaccination policy on the growth rate of COVID-19, controlled by the onset of its propagation in each country.
Methods and Results
The data on COVID-19 and life expectancy were retrieved on March the 28th from “COVID-19 CORONAVIRUS PANDEMIC” page1 and “Life Expectancy of the World Population” page9 in worldmeter website. The average temperature data were retrieved from “List of cities by average temperature” page10 in Wikipedia. Average temperature of each country was obtained by averaging those of the cities in each country. The raw data on which our analyses are based is attached as Supplementary Table 1. Also, the STROBE Statement for cohort studies is in the supplementary materials11.
Three sets of linear regressions were run. The dependent variables (DVs) were the rate of total cases per one million population (Model 1), the rate of death per one million population (Model 2), and the ratio between these two variables (Model 3). The country’s BCG policy was the independent variable (IV) of interest. In accordance with the BCG world atlas12, this variable had three levels: Group A, the country currently administers the BCG; Group B, the country used to administer the BCG and currently does not; and Group C, the country has never administered the vaccine. The country’s life expectancy and mean temperature (°C) in February and March 2020 were added as control variables in all the models in order to rule out potentially confounding effects. In fact, infection rates and mortality rates may have been influenced by the subject’s age (i.e., the older, the more vulnerable) and climate conditions (i.e., the higher the temperature, the less contagious the virus). Finally, the Order Quantile (ORQ) normalization transformation was used to standardize the continuous variables. Beyond meeting the statistical assumption of linear modeling (normalization), this procedure is also useful to mitigate potentially confounding effects of outliers.
This study included 136 countries or regions for which complete data about total cases per one million population, life expectancy, and temperature were available (Model 1). The data regarding total deaths per one million population and deaths/cases ratio were available for 91 countries (Models 2 and 3). All the analyses were run with the R software13, and the graphs were produced by using the package ggplot214. The ORQ transformation was performed with the bestNormalize R package15. Finally, the rsq R package was employed to calculate partial R2 statistics (i.e., unique variances)16. Descriptive statistics of each variable (e.g., means, medians, and quantiles) are retrievable from the R codes provided online.
Regarding total cases per one million population, the linear model showed a statistically significant effect for both Group B (b = 0.6483, p = .0002) and Group C (b = 0.8666, p = .0025; as compared to Group A). Temperature was barely significant (Figure 1a; b = –0.1232, p = .0424). As anticipated, life expectancy was significant (Figure 1c; b = 0.5544, p = 6.33×10−14). The model’s adjusted R2 was .6409 (i.e., approximatively 64% of the observed variance was explained). The amount of unique variance – i.e., the variance explained by a certain variable by controlling for all the other variables in the model – accounted for by the BCG variables was 12.50%. The amount of unique variance accounted for by life expectancy and the country’s mean temperature was 34.52% and 2.37%, respectively.
Since life expectancy was the strongest predictor of cases, we reran the analyses only with those countries with a life expectancy higher than 78 (n = 45). This way, the mean life expectancy was homogeneous across BCG groups (A: 80.92±2.31; B: 82.41±1.27; C: 81.33±2.44). The pattern of results was essentially the same. There was a statistically significant effect for both Group B (Figure 2a; b = 0.6122, p = .0024) and Group C (Figure 2a; b = 0.6511, p = .0326). The temperature had no significant effect. The amount of unique variance accounted for by the BCG variables was 19.78%.
Regarding total deaths per one million population, the linear model again showed a statistically significant effect for both Group B (b = 0.7262, p = .0007) and Group C (b = 1.495, p = 7.11×10−5). While life expectancy was significant (Figure 1d; b = 0.4251, p = 6.20×10−6), the country’s mean temperature was not (Figure 1b; b = – 0.0761, p = .3453). The model’s adjusted R2 was .5473. The amount of unique variance accounted for by the BCG variables was 20.37%. The amount of unique variance accounted for by life expectancy and the country’s mean temperature was 20.33% and 0%, respectively.
Again, since life expectancy was a strong predictor in the model, we reran the analyses only with those countries with a life expectancy higher than 78 (n = 40). The pattern of results was the same. There was a statistically significant effect for both Group B (b = 0.7918, p = .0011) and Group C (b = 1.498, p = 8.05×10−5). The temperature had no significant effect. The amount of unique variance accounted for by the BCG variables was 38.39%.
Regarding the ratio between deaths and cases, the linear model showed a statistically significant effect for Group C (b = 1.140, p = .0267) but not Group B (b = 0.1356, p = .6442). This time, life expectancy was not significant (b = –0.1294, p = .3019). By contrast, the country’s mean temperature was significant (b = 0.2629, p = .0226). The model’s adjusted R2 was .1019. The amount of variance accounted for by the BCG variables was 3.39%.
Finally, a potential issue of the above analyses is that we used data that were not corrected by the onset of the spread of the virus. We are now in the middle of the rapid development of a pandemic and the time since the onset significantly differs across different countries, which may serve as a potentially confounding factor. To address this issue, we utilized the curves of the cumulative number of confirmed cases by the number of days since 100th case (Figure 3a), which was reported on a website17. Since we did not obtain the numerical data behind the figure in the website, we classified the 70 countries into three groups, that is, high (h), middle(m), and low(l) growth rate (Figure 3a; Boundary between h and m was approximately 2500 cases and the one between m and l was approximately 1400 cases at Day 15 since 100 cases; The countries in which 15 days was not reached, the value of Day 15 was estimated by projecting the growth curve to 15th day). While based on a rough classification, this analysis was useful to explore the potential impact of the country’s onset of the virus on our findings. The Group A countries significantly tend to be classified in lower growth group compared to Groups B and C countries (as a single group) with all the 68 countries with known BCG policy (Figure 3b; Chi-squared test: p = 0.0002; B and C groups were combined for this test). As in the linear regression models, given that life expectancy may affect growth rate, we also conducted a Chi-squared test after excluding the countries that had a shorter life expectancy (≤78 years old, n = 40), and found that the difference remained significant after the exclusion (Figure 3c; Chi-squared test: p < 0.0001; B and C groups were combined for this test), where p-value became even smaller adjusting life expectancy despite the decrease of sample number. Both the analyses thus showed that Group A countries tended to exhibit a lower growth rate compared to Groups B and C countries regardless of their life expectancy.
Discussion
Associations of BCG policy with COVID-19 after controlling two major confounding factors
In this study, we have shown that the countries that currently adopt universal BCG vaccination programs (Group A) have, compared to the other countries, a lower number of cases and deaths per one million people. The countries that no longer recommend BCG vaccination for everyone (Group B) and those that have never had universal BCG vaccination programs (Group C) reports more cases and deaths. We have also evaluated the impact of two potentially confounding factors, the country’s life expectancy and the average temperature in February and March 2020. Crucially, the effects of BCG groups on the two dependent variables remain significant, even when the country’s life expectancy and temperature are controlled for. The amount of variance explained by BCG vaccination is about 12.50% and 20% for cases and deaths per one million population, respectively. The percentage of explained variance is greatly increased when only countries with a life expectancy above 78 years are considered (about 20% and 38% for cases and deaths, respectively). Also, only Group C appears to play a role in the deaths/cases ratio and explains little more than 3% of the observed variance. This latter result may suggest that, unlike cases and deaths per one million population, the death rate is weakly affected by the type of BCG vaccination policy adopted by the country. We thus hypothesize that the protective effect of the vaccine, if any, may consist of a significant reduction of the spread of the virus rather than a reduced mortality rate. It is also possible that the vaccine prevents progression of the disease after infection, since only the persons with severe symptoms tend to be able to receive PCR tests in many of the countries at present. This hypothesis requires additional empirical corroboration. Finally, in line with the results of the linear regression analysis, the Chi-squared analyses highlight that BCG vaccination policy appears to affect how quickly this virus spreads. Taken together, these results suggest that the overall impact of BCG vaccination on the COVID-19-related cases and deaths is, at the very least, non-negligible and worth of further empirical investigation.
We have reported that a large part of the variance in the two dependent variables is explained by life expectancy, while the effect of temperature seems to be marginal at best. This is not surprising, considering the fact that age is a major risk factor in COVID-1918,19. Life expectancy is highly correlated with the income level of individuals20 and countries21, which would also be related to the quality of medical care that people can receive. If life expectancy is controlled, higher income and quality of medical care may decrease the risk and mortality of diseases in general, which is expected to decrease the positive effect of life expectancy on the risks in COVID-19. Considering the highly significant contribution of life expectancy on the risks, we could consider that the income level or the quality of medical care cannot be the confounding factor that explains the association between BCG vaccination and the risks in COVID-19. Regarding the tight relationships between age and mortality in COVID-1918,19, BCG protection may wane with time since vaccination22, and it could be potentially contributing to the higher mortality in aged populations to some extent. These possibilities need to be examined by further analyses.
As seen, the average temperature in countries exhibits a small and sometimes even non-significant negative correlation with the risks of COVID-19 infection and mortality. That said, we have not assessed the effect of humidity, which we plan to do in the near future, since absolute humidity is considered to be associated with influenza infection23,24. However, it is unlikely that absolute humidity significantly affects the results. In fact, absolute humidity does not seem to play any major role in the transmission rates of the COVID-1925. Moreover, absolute humidity and temperature are usually highly correlated, and it is thus unlikely that humidity may play a major role in explaining the spread of the COVID-19 if the temperature does not.
BCG vaccination policy is associated with a lower COVID-19 growth rate
We are still in the midst of the pandemic, during which the virus keeps spreading in most of the countries (except for China, where the rise has nearly been stopped). Also, the role of the observed between-country differences concerning the date of the spread of the COVID-19 is still unclear. In brief, the current state of affairs might change quickly and, therefore, our findings should be interpreted with caution. That being said, we have found a highly significant difference among the BCG groups of the slope of increase of the cases in log scale, that is, adjusted by the day when 100th cases have been achieved. While this analysis is based on a somewhat subjective classification of the slope steepness in each country, and the number of countries assessed is only 68 here, the result indicates that the difference of the onset cannot be a major contributing factor underlying the association between COVID-19 indices and BCG vaccination.
The Group B countries are heterogenous regarding 1) the years in which the vaccination started and stopped12 and 2) the strain of the vaccine26. Also, in the A groups, there are differences in coverage levels among countries27. For example, many of A countries reside in Africa, where BCG vaccination is mostly mandatory12, but coverage is not necessarily high27, which could potentially have distorted the results. However, in this regard, analyzing those countries with a life expectancy above 78 years, which excluded most of the African countries, did not substantially affect the results. Since the initiation of the spread of COVID-19 in African countries were relatively late compared to the ones in other continents, it might be necessary to conduct similar analyses by taking the coverage rate into consideration after further spreading of the disease.
Potential mechanisms
BCG, originally developed against tuberculosis, is hypothesized to develop ‘frontline’ immunity, training it to respond non-specifically to certain viruses with greater intensity7,3,4. This idea is supported by clinical and epidemiological studies, which showed that BCG appeared to lower overall mortality in children5,2,6. BCG, which can remain alive in the human skin for several months, triggers not only specific memory B and T cells, but also stimulates the innate blood cells for a prolonged period7. In a randomized placebo-controlled study, it was shown that BCG vaccination protects against experimental infection with a weakened form of the yellow fever virus7,4. By showing a strong association of BCG vaccination program with COVID-19-related risks, our results are in line with the idea that BCG vaccination provides non-specific protection against COVID-19.
Recommended future research
The retrospective nature of the study requires extra caution in interpreting the results presented here because they may be affected by unknown confounding factors. Examples include the country’s overall hygienic conditions, number of PCR testing per population, ethnical genetic backgrounds, prevalence of SARS-CoV-2 strains, strength of the mitigation/suppression policy, people’s compliance to such policies, and culture/life style (e.g., shaking hands and hugging, wearing masks, and washing hands). Some of these elements might affect the observed relationships between COVID-19 indices and BCG vaccination policy by affecting both the susceptibility to SARS-CoV-2 and BCG vaccination policy in an indirect manner. These and other factors need to be examined in future studies (or even before formal publication of this manuscript in a journal with peer review; see “production note” below).
As outlined above, we have employed a rough classification for the statistical test for the relationship between BCG groups and the growth curve of COVID-19 in each country with a limited number of observations (Figure 3a). Obviously, it would be preferable to use continuous data to produce a more accurate estimation of the impact of BCG vaccination policy on how quickly the virus spreads. Also, examining variables such as BCG strain26 and the timing and the number of vaccinations (including the ones for other diseases) would be worth some attention. These variables might be easier to be examined in those countries where mandatory BCG vaccination was terminated by investigating the history of vaccination of COVID-19 patients and their symptoms and condition severity. In our study, we used the data that are publicly available from several websites (see Method and results section), which have not been validated by any official institutions (to the best of our knowledge). Therefore, further analyses are recommendable once the data’s reliability has been certified.
It may also be worth looking into the relationship between the age of BCG vaccination and the risk of infection and severity /mortality in COVID-19 patients18,19. An SNP in IL-1b (IL1B; rs16944) was found to affect the trained immunity response induced by BCG7, and it is of interest to evaluate the association of this SNP and the protective effect of BCG vaccine against SARS-CoV-2.
Conclusion
The present study reports significant associations of BCG vaccination with prevalence and mortality from COVID-19 in 136 countries, even when the country’s life expectancy and average temperature are controlled for. Together with the previous studies showing potential non-specific protection of BCG, we believe that further studies examining this hypothesis are warranted. In countries with no mandatory BCG vaccination at present, policymakers involved in health service may consider, with careful evaluation of its safety, increasing/implementing BCG vaccination during this outbreak of COVID-19, at least for the categories of people who are currently receiving the vaccination in other countries.
Data Availability
The data are available in the supplemental materials. The sources of the data are retrievable from the reference list.
Conflict of Interest
The authors declare that there is no conflict of interest regarding the publication of this article.
Production note
We welcome the contributions from readers of any data related to variables of interest (including potentially confounding factors), which might be considered as a collaboration. It is preferable that data contributed be sorted by country name and in excel format (as in our dataset; See Supplementary Table 1). Using the same naming convention for the name of the countries as those in supplementary table 1 is also helpful for us to carry out analyses quickly.
The data we’d like to add to the excel file includes, but not limited to,
Humidity in February and March for each country,
Average income for each country,
Any data related to social distance (population density, culture of shaking hands, etc.) for each country
Administration policy of other vaccines for each country, and
Prevalence of infectious diseases (such as H1N1 influenza) in each country.
When you provide us data, specify where you obtained data so that we can confirm the data.
We look forward to collaborating with you in this war against COVID-19.
Acknowledgement
We thank Yoko Kagami, and Harumi Mitsuya for their assistance in inputting the data and making Figures, Mamoru Tejima for letting us know the existence of potential correlation, and Hideo Hagihara, Tomoyuki Murano, Hironori Funabiki and Shinichi Nakagawa for useful discussions. We also thank Tomoko Tatsumi for helping us drafting the R codes.