Abstract
Identifying biomedical and socioeconomic predictors of the number of deaths caused by COVID-19 can help the development of effective interventions. In this study, we used the hypothesis-driven regression approach to test the hypothesis that the mask wearing rate, along with age and obesity, can largely predict the cumulative number of deaths across countries. Our regression models explained 69% of the variation in the cumulative number of deaths per million (March to June 2020) among 22 countries, identifying the face mask wearing rate in March as an important predictor. The number of deaths per million predicted by our elastic net regression model showed high correlation (r = 0.86) with observed numbers. These findings emphasize the importance of face masks in preventing the ongoing pandemic of COVID-19.
One Sentence Summary Face mask wearing rate in March is a strong predictor of the cumulative number of deaths per million caused by COVID-19 among 22 countries.
Main Text
There have been considerable differences in the number of deaths caused by COVID-19 across countries. Western countries, in particular, have recorded a high number of deaths compared to Eastern countries. Several regression studies have identified predictors for the differences in the number of deaths such as age, obesity, and previous BCG vaccination (1-3). However, to our knowledge, there have been no cross-country regression studies that used the face mask wearing rate as a predictor, although face masks have been given an increasing attention as an effective means to prevent transmission of COVID-19 (4, 5). This is possibly because the mask wearing rate has been available only for 22 countries. Since the number of predictors should not generally exceed 10% of the sample size in traditional regression analysis, the mask wearing rate may have been excluded from the pool of potential predictors in previous studies that used a large number of samples and predictors to achieve high prediction rates, limiting public awareness of the importance of face masks in protecting their health.
In this study, we employed the hypothesis-driven regression approach to identify the association of mask wearing rate and the cumulative number of deaths caused by COVID-19 across countries, based on the hypothesis that the mask wearing rate can largely predict the cumulative number of deaths along with age and obesity, which are the other relatively independent risk factors for hospitalized COVID-19 patients (6-8). This approach uses a limited number of a priori determined predictors based on a certain hypothesis and is especially useful in providing biological insights regarding the association with a small number of samples (9-11), while the implication is restricted to the hypothesis.
We first calculated the Spearman’s correlations between the cumulative number of deaths per million from various dates and predictors related to transmission (mask non-wearing rate and rate to avoid public spaces), age, and obesity in order to determine variables to include the regression (Fig. 1). The correlation coefficients between the cumulative number of deaths per million and mask non-wearing rates (both Mar and Apr-May) showed time-dependent increase, reaching an apparent plateau in early May. The mask non-wearing rate in Mar generally showed higher correlation coefficients compared to that of Apr-May with the highest correlation of 0.79 on June 6, 2020 (P = 6.865e-06). The rate to avoid public spaces showed low correlations with the cumulative number of deaths per million.
Age-related predictors were highly associated with the cumulative number of deaths per million in consistent with a previous report (7). The correlation coefficients were lower in younger age groups (age 65-69 years) compared to the higher age groups in both sexes (highest rho = 0.68). In contrast to the mask non-wearing rate, correlations between the cumulative number of deaths and age-related predictors were highest in mid-May and decreased thereafter.
Body mass index (BMI) generally showed lower correlations with the cumulative number of deaths per million with a clear sex-dependent difference. Male BMI was more closely associated with the cumulative number of deaths per million than female BMI, corroborating with the fact that males are more susceptible to severe outcomes from COVID-19 compared to females (12, 13). The highest correlation of 0.59 was observed between male BMI and the cumulative number of deaths per million on May 26.
Correlation analyses using the weekly number of deaths per million showed the similar tendency, in which the mask non-wearing rate in Mar showed the highest association (Fig. S1). Time course profiles for Spearman’s correlation coefficients of several highly correlated parameters further confirmed the unique feature of age ≥ 80 (male) among parameters used (Fig. S2). Spearman’s correlation coefficients for this parameter remained almost constant throughout the study period, whereas those for other three parameters showed time-dependent increase and reached the apparent plateau around day 50 (mid-May).
Next, we created scatter plots representing the relationship between the cumulative number of deaths per million and the several potential predictors showing high Spearman’s correlations (Fig. 2A, top panels). The number of deaths per million on May 13 was used because it showed high correlations with the selected predictors. The cumulative number of deaths per million showed exponential association with most predictors with clear separations between the Western and Eastern countries, especially in the plot with the mask non-wearing rate (Mar). We therefore applied the logarithmic transformation to the cumulative number of deaths per million (Fig. 2A, bottom panels), which resulted in significant linear correlations of the transformed value with the mask non-wearing rate in Mar (r = 0.796, P = 9.356e-06), mask non-wearing rate from Apr-May (r = 0.496, P = 0.01897), age ≥ 80 years (male) (r = 0.658, P = 0.0008717), and male BMI (r = 0.682, P = 0.0004762).
Interestingly, in the Western countries there was a tendency toward a negative association between the number of deaths per million and mask non-wearing rate in April - May (Spearman’s correlation; r = −0.4126551, P = 0.1611). This is attributed to the marked reduction in the mask non-wearing rate in some countries with high number of deaths, possibly caused by the fear of the disease. Likewise, within the United States, the mask non-wearing rate in April across states also showed a negative correlation with the number of deaths (Spearman’s correlation; r = - 0.477556, P = 0.0003946) (Fig. S3).
We subsequently attempted to predict the log-transformed cumulative number of deaths per million using the multiple linear regression approach. The mask non-wearing rate (Mar) along with age ≥ 80 (male) explained 68.6% of the variation of the response variable (model 1). Male BMI was a relatively weak predictor compared to age ≥ 80 (male) but was significantly associated with the response variable when the mask non-wearing rate (Mar) was excluded from the model (models 2 and 3). On the other hand, mask non-wearing rate in Apr-May was not significantly associated with the response variable (models 4 and 5), suggesting the importance of face masks in the early phase of the pandemic. The mask non-wearing rate (Mar) by itself predicted 61.6% of the variation in the cumulative number of deaths per million in a single regression analysis (beta = 0.06146, P = 9.36e-06).
It is noted that we found a weak but significant correlation (Pearson’s correlation ∼ 0.5) between the mask non-wearing rate (Mar), age ≥ 80 (male), and male BMI (Fig. S4), although the degree of correlation was too low to introduce multicollinearity in our regression (Table 1). This correlation may be attributed to the breathing difficulties because both obese (14) and aged (15) individuals show impaired lung function.
Lastly, we predicted the log-transformed number of deaths per million by the mask non-wearing rate (Mar), age ≥ 80 (male), and male BMI using the elastic net regression. The elastic net regression is a machine learning method that estimates the regression coefficients with penalty terms, enabling us to include a larger number of predictors compared to the traditional multiple regression. Observed cumulative number of deaths per million were significantly correlated with those predicted by the elastic net regression model (Fig. 3).
Face masks are considered effective in preventing the transmission of COVID-19 (16, 17). In line with these previous studies, the mask non-wearing rate in March was found to be the strongest predictor of the cumulative number of deaths per million in this study, where even the single regression explained 61.6% of the variation of the response variable. We also observed that there is a considerable difference in face mask wearing rates between the Western and Eastern countries. A country’s policy regarding wearing face masks alone cannot explain this big difference because, for example, face masks were never been mandated in Japan, despite the high face mask wearing rate observed in the country (18). We speculate that cultural factors could be the major reason for the difference in mask wearing rate as Jack et al. (19) described that “whereas Western Caucasian internal representations predominantly featured the eyebrows and mouth, East Asian internal representations showed a preference for expressive information in the eye region.” This tendency could explain why it is considered rude to wear sunglasses among East Asians and suspicious to wear face masks in Western countries (20).
A limitation of this study is that we were able to obtain the mask wearing rate from only 22 countries. While we demonstrated the strong association of the predictor and the cumulative number of deaths caused by COVID-19 by taking advantage of the hypothesis-driven regression, this finding should be verified by building a model with a sufficient number of samples and predictors. Another limitation is that the regression models used in the present study do not directly test the causal relationship of observed variables. Appropriate study design such as cohort sampling or introduction of interventions are needed to address the causality.
Taken together, the present study demonstrated the close association between face mask wearing rate and the number of deaths caused by COVID-19, identifying the age and obesity as relatively weak predictors. These findings have an implication for introducing mandatory face mask usage as the precautionary principle.
Data Availability
All data was collected from publicly available secondary sources. % of people in each country who say they are: Wearing a face mask when in an public spaces as face mask wearing rate in mid-March(3/9-18) 2020 and late April to early May(4/26-5/1) 2020 of countries listed below were obtained from: Smith M. International COVID-19 Tracker Update: 2 May. 2 May 2020, yougov.co.uk/topics/international/articles-reports/2020/05/01/international-covid-19-tracker-update-2-may Smith M. International COVID-19 Tracker Update: 18 May. 18 May 2020, yougov.co.uk/topics/international/articles-reports/2020/05/18/international-covid-19-tracker-update-18-may Total COVID death per million of countries listed below except for Taiwan were obtained from: Ritchie, Hannah. Coronavirus Source Data. 16 June 2020, ourworldindata.org/coronavirus-source-data Data of BMIs of countries listed below except for Taiwan were obtained from: Global Status Report on Noncommunicable Diseases 2014. 5 Oct. 2015, www.who.int/nmh/publications/ncd-status-report-2014/en/. Population% by age of countries listed below except for Taiwan were obtained from: Kose, Ayhan, et al. World Bank Open Data. Data, 15 June 2020, data.worldbank.org/.
Author contributions
Daisuke Miyazawa: Conceptualization, Data curation, Project administration, Resources, Writing - Original Draft, Gen Kaneko: Methodology, Software, Formal analysis, Validation, Visualization, Writing - Review and Editing.
Competing interests
None.
Data and materials availability
All data and code are available in the main text or the supplementary materials.
Supplementary Materials
Materials and Methods
All data were collected from publicly available secondary sources. Analyses in the present study included 13 Western countries (UK, France, Italy, USA, Spain, Mexico, Germany, Canada, Sweden, Norway, Finland, Denmark, and Australia) and 9 Asian countries (Malaysia, China, Saudi Arabia, India, Indonesia, Philippines, Japan, Singapore, and Thailand). These countries were chosen because of the availability of March mask-wearing data. The face mask wearing rates in March (March 9 to March 18) 2020 and late April to early May (April 26 to May 1) 2020 across countries were derived from “percentage of people in each country who answered that they are wearing a face mask when in public spaces” from the YouGov database (https://yougov.co.uk/topics/international/articles-reports/2020/05/01/international-covid-19-tracker-update-2-may). This database has collaborated with the Institute of Global Health Innovation at Imperial College London, and it summarizes interviews conducted with nationally representative sample sizes (150-2000/week depending on countries). Total COVID-19 deaths per million were obtained from https://ourworldindata.org/coronavirus-source-data. The BMI data were obtained from the Global Status Report on Non-communicable Diseases 2014 (October 5, 2015; https://www.who.int/nmh/publications/ncd-status-report-2014/en/). Population percent by age data was obtained from https://data.worldbank.org on June 15, 2020.
Data analysis
Statistical analyses were conducted using R version 3.6.2. The R packages used in this study include fields, ggplot2, glmnet, diverse, broom, car, and ggcorrplot. Pearson’s and Spearman’s correlations were calculated using the “cor” and “cor.test” functions. Multiple and single linear regression models were built using the “lm” function.
Acknowledgments
The authors express sincere thanks to the journalist Mr. Junya Iwai for his help in data collection and constructive discussion.