Abstract
Identification of biomedical and socioeconomic predictors for the number of deaths by COVID-19 among countries will lead to the development of effective intervention. While previous multiple regression studies have identified several predictors, little is known for the effect of mask non-wearing rate on the number of COVID-19-related deaths possibly because the data is available for limited number of countries, which constricts the application of traditional multiple regression approach to screen a large number of potential predictors. In this study, we used the hypothesis-driven regression to test the effect of limited number of predictors based on the hypothesis that the mask non-wearing rate can predict the number of deaths to a large extent together with age and BMI, other relatively independent risk factors for hospitalized patients of COVID-19. The mask non-wearing rate, percentage of age ≥ 80 (male), and male BMI showed Spearman’s correlations up to about 0.8, 0.7, and 0.6 with the number of deaths per million from 22 countries from mid-March to mid-June, respectively. The observed number of deaths per million were significantly correlated with the numbers predicted by the lasso regression model including four predictors, age ≥ 80 (male), male BMI, and mask non-wearing rates from mid-March and late April to early May (Pearson’s coefficient = 0.918). The multiple linear regression models including the mask non-wearing rates, age, and obesity-related predictors explained up to 79% variation of the number of deaths per million. Furthermore, 56.8% of the variation of mask non-wearing rate in mid-March, the strongest predictor of the number of deaths per million, was predicted by age ≥ 80 (male) and male BMI, suggesting the confounding role of these predictors. Although further verification is needed to identify causes of the national differences in COVID-19 mortality rates, these results highlight the importance of the mask, age, and BMI in predicting the COVID-19-related deaths, providing a useful strategy for future regression analyses that attempt to contribute to the mechanistic understanding of COVID-19.
Introduction
There have been considerable differences in the number of deaths by COVID-19 among countries. Western countries (Europe and the United States), in particular, have been suffered from the high number of deaths and the high death rate compared to East Asian countries. Several regression studies have identified predictors, or candidates for true risk factors, for the difference among countries such as age, obesity, and previous BCG vaccination (Jeet et al., 2020; Bellali, Chtioui, & Chahed, 2020; Leung, Bulterys, & Bulterys, 2020; Squalli, 2020; Knittel & Ozaltun, 2020). However, it is important to continue characterizing novel and known predictors under various biomedical and socioeconomic contexts for potential future intervention to counteract with the ongoing COVID-19 pandemic.
Leffler et al. (2020) used the issued dates for each government’s mandates or recommendations for face mask wearing as a predictor in their analysis. By their multivariable linear regression, duration of infection in the country, duration of wearing masks, and percentage of the population over age 60 significantly predicted each country’s COVID-19 mortality rate. Also, a systematic review and meta-analysis identifying 172 observational studies indicated that face mask use could result in a large reduction in risk of COVID-19 (Chu et al., 2020). These findings, along with the big difference in face mask wearing rates between Western and other East Asian countries, suggest that the importance of mask wearing rate can be a significant predictor for the number of deaths by COVID-19. However, to our knowledge, there has been no analytical studies including the face mask wearing rate in regression models. This is possibly because the common regression approach uses a large number of samples and predictors to achieve the high prediction rates. The mask wearing rate is available only for limited number of countries.
Other studies identified age, BMI, and male sex as provisional and relatively independent risk factors for hospitalized patients of COVID-19 (Palaiodimos et al., 2020). In 5279 patients with laboratory confirmed COVID-19 patients in New York City, age is the strongest risk factor associated with hospital admission, while the critical illness after admission was associated with age, heart failure, BMI >40, and male sex (Petrilli et al., 2020). Docherty et al. (2020) reported using data from 20133 patients admitted to hospital with COVID-19 in the United Kingdom that independent risk factors are increasing age, male sex, and chronic comorbidity, including obesity.
In this study, we employed the hypothesis-driven regression approach to identify predictors for the number of deaths by COVID-19 across countries based on the hypothesis that the mask non-wearing rate can predict the number of deaths to a large extent together with age and BMI, other relatively independent risk factors for hospitalized patients of COVID-19. This approach uses limited number of a priori determined predictors based on a certain hypothesis. The hypothesis-driven regression is especially useful in providing biological insights related to the association with small number of samples, while the implication is restricted to the hypothesis, and has been used in various fields of research (Klein et al., 2013; Sabuncu et al., 2016; Carriedo et al., 2020). Based on the results we discussed the cultural factors for the observed difference among countries, which may have lowered the number of COVID-19-related deaths in East Asian countries.
Methods
Data collection
All data was collected from publicly available secondary sources. Analyses in the present study included UK, France, Italy, USA, Spain, Mexico, Germany, Malaysia, Canada, Sweden, China, Saudi Arabia, India, Indonesia, Philippines, Norway, Finland, Japan, Denmark, Australia, Singapore, and Thailand. These countries were chosen because of the availability for mid-March mask-wearing data. The face mask wearing rates in mid-March (3/9-18) 2020 and late April to early May (4/26-5/1) 2020 across countries were derived from “% of people in each country who say they are: Wearing a face mask when in public spaces” in Smith (2020). Total COVID death per million were obtained from Ritchie (2020). BMI data was obtained from Global Status Report on Noncommunicable Diseases 2014. 5 Oct. 2015, www.who.int/nmh/publications/ncd-status-report-2014/en/. Population percent by age data was obtained from Kose et al. (2020).
Data analysis
R version 3.6.2 was used for all statistical analyses. Pearson’s and Spearman’s correlations were calculated by cor and cor.test fuctions. Lasso regression was used to find potential effects of predictors on the number of deaths per million (Tibshirani, 2011). Multiple linear regression models were built using the lm function.
Results
Spearman’s correlations
We first calculated Spearman’s correlations between the number of deaths per million from various dates and parameters related to the mask non-wearing rate, age, and obesity (Fig. 1). The correlation coefficients between the number of deaths per million and mask non-wearing rates (both mid-March and late April to early May) showed time-dependent increases, reaching the apparent plateau in early May. The mask non-wearing rate in mid-March generally showed higher correlation coefficients than those that determined in late April to early May. The highest correlation of 0.79 was observed between the mid-March mask non-wearing rate and the number of deaths per million on June 6, 2020 (P = 6.865e-06).
Age-related predictors were also highly correlated with the number of deaths per million with Spearman’s correlation coefficients of up to ∼0.68. In both sexes, correlation coefficients were lower in younger age groups (age 65–69) compared to those of higher age groups, and the highest correlations were found in age ≥ 80 groups. In contrast to the mask non-wearing rate, correlations between the number of deaths and age-related predictors were highest in mid-May and decreased thereafter.
In general BMI showed lower correlations with the number of deaths per million with a clear sex-dependent difference. Male BMI was more closely associated with the number of deaths per million than female BMI.
Time course profiles for Spearman’s correlation coefficients of several highly correlated parameters demonstrated the unique feature of age ≥ 80 (male) among parameters used (Fig. 2). Spearman’s correlation coefficients for this parameter remained almost constant throughout the study period, whereas those for other three parameters showed time-dependent increase and reached the apparent plateau around day 50 (mid-May).
Scatter plots
We subsequently made scatter plots showing the relationship between the number of deaths per million and predictors that showed high Spearman’s correlation (Fig. 3). The number of deaths per million on May 13 was used because it showed the highest correlations with most important predictors including the mask non-wearing rates (mid-March), age ≥ 80 (male), and male BMI. As expected, the number of deaths per million did not show a clear correlation with the mask non-wearing rate (late April to early May). The number of deaths per million showed exponential type increases in most plots, in particular with the mask non-wearing rate (mid-March).
We therefore applied the logarithmic transformation to the number of deaths per million. The transformed number of deaths per million on May 13 showed significant linear correlation with the mask non-wearing rate in mid-March (r = 0.865, P = 9.739e-08), mask non-wearing rate in late April to early May (r = 0.530, P = 0.009296), age ≥ 80 (male) (r = 0.658, P = 0.0008717), and BMI (male) (r = 0.682, P = 0.0004762). The relationship between the number of deaths per million (May 13) and age ≥ 80 (female) appeared to be similar in the scatter plot (data not shown), but their linear correlation did not reach the statistical significance with a lower Pearson correlation coefficient compared to that of age ≥ 80 of males (r = 0.416, P = 0.05381).
Lasso regression
Next, we used the least absolute shrinkage and selection operator (lasso) regression to find the potential effects of these predictors on the log-transformed number of deaths per million. The lasso regression estimates the regression coefficients with the lasso penalty terms, retaining important explanatory variables, while less important variables are removed from the regression (Tibshirani, 2011). When all parameters in Fig. 2 were included together with age ≥ 80 (female), the lasso regression selected BMI (male) (beta = 0.21610188), age ≥ 80 (male) (0.34906818), the mask non-wearing rate (mid-March) (0.06019861), and the mask non-wearing rate in late April to early May (−0.02008978) with an intercept of -6.65720405. Age ≥ 80 (female) was omitted from the regression. The observed number of deaths per million showed high correlation with the number predicted by the lasso regression model (Fig. 4; Pearson’s coefficient = 0.918, P = 1.635e-09).
Multiple linear regression
We performed a hypothesis-driven linear regression using three or two of predictors selected by the lasso regression. Because the mask non-wearing rate is available only for 22 countries (Supplementary File 1), we first selected BMI (male), age ≥ 80 (male), and the mask non-wearing rate in mid-March for this regression (Table 1, model 1). This multiple regression model explained 76.0% variation of the number of deaths per million on May 13 (P = 2.075e-06), identifying the significant effects of mask non-wearing rate in mid-March and age ≥ 80 (male). BMI (male) also showed a trend for association, but the association was not statistically significant.
Other combinations of three predictors also resulted in significant associations between the number of deaths per million on May 13 and the selected predictors with ∼0.7 adjusted R square values (Table 1). We found several important features in the multiple regressions. First, the effect of mask non-wearing rate in mid-March was significant in all regressions tested with low P values, and exclusion of this predictor markedly decreased the adjusted R square (model 2). Second, BMI (male) showed a relatively low association with the number of deaths on May 13, being significant only when the strongest predictor, mask non-wearing rate in mid-Match, was not included the regression (model 3). Third, the highest adjusted R square of 0.7934 was obtained when we include predictors age ≥ 80 (male) and mask non-wearing rates of mid-March and late April to early May (model 4). All regression coefficients were statistically significant in this model.
Regressions including two predictors also underscored the importance of the mask non-wearing rate in mid-March as the predictor for the number of deaths per million on May 13. Namely, the mask non-wearing rate in mid-March was always a significant predictor (models 5–7) with adjusted R square of up to ∼0.75. While BMI (male) and age ≥ 80 (male) were also significantly associated with the number of deaths per million on May 13, exclusion of the mask non-wearing rate in mid-March dropped the adjusted R square to ∼0.65.
Single regression
Because the mask non-wearing rate in mid-March showed the strong association with the number of deaths on May 13, we also built several single linear regression models for predicting the number of deaths from other dates using this predictor (Table 2). Overall, changes in adjusted R square showed the similar tendency with those of the Spearman’s correlation coefficients (Fig. 1), in which the value gradually increased during March and April, reaching the plateau in mid-May. Overall, the mask non-wearing rate in mid-March explained up to 72% of variations of the number of deaths per million.
Prediction of mask non-wearing rates
Lastly, we examined the relationship between the mask non-wearing rate, age ≥ 80 (male), and male BMI (Fig. 5). The mask non-wearing rate in mid-March showed significant correlations with both age ≥ 80 (male) and male BMI. Multiple linear regressions found both age ≥ 80 (male) and male BMI as significant predictors of the mask non-wearing rate with regression coefficients of 5.224 (P = 0.042559) and 8.377 (P = 0.000682), respectively, with an interception of -160.020 (P = 0.006040). This multiple regression model explained 56.8% variation of the mask non-wearing rate in mid-March (P = 0.0001325).
Compared to the mask non-wearing rate in mid-March, that in late April to early May showed a weak association with age ≥ 80 (male) and male BMI (Fig. 5). Accordingly, in the following multiple linear regression analysis, age ≥ 80 (male) and male BMI explained only 29.4% variation of the mask non-wearing rate although the regression was still statistically significant (P = 0.01419), The regression coefficients of men BMI and age ≥ 80 (male) were 8.422 and 3.270, respectively (P = 0.0141 and 0.3776, respectively).
Discussion
The number of deaths per million on a certain date highly depends on when the epidemic began in the country, in particular in the early stages of the pandemic. In this study, we have tentatively determined March 20 as the start date of a global pandemic. It is likely that this assumption is a reasonable approximation for our hypothesis because all employed predictors (mask non-wearing rates, age and BMI) tended to show weak correlations with the number of deaths per million at the onset of the pandemic followed by marked increases as the progress of the pandemic. These features enabled us to select a specific date (May 13), at which most parameters showed the high correlations with the number of deaths per million, and to build regression models that well explain the variation of the response variable with limited number of predictors. On the other hand, it is noteworthy that the age-related predictors were highly correlated to the number of deaths per million from the very beginning of the pandemic and then remained constant or gradually declined. Below we discuss the potential biomedical and socioeconomic mechanisms by which each predictor affects the number of deaths by COVID-19.
Obesity
We have proposed possible mechanisms by which obesity could be a risk factor for the severity of COVID-19 and H1N1 influenza infections (Miyazawa, 2020). Namely, respiratory failure is the most important pathology that contributes to the severity of both COVID-19 and H1N1 influenza infections. Since obese patients generally show a restrictive breathing pattern and reduced lung volumes, the obesity-hypoventilation syndrome can lead to respiratory failure in COVID-19 patients, being a risk factor especially for patients with severe symptoms. Also, obesity has been reported to be a risk factor for the development of acute respiratory distress syndrome (Zhi et al, 2016), which is a serious clinical manifestation of COVID-19 (Simonnet et al., 2020). According to Moriconi et al. (2020), in patients of COVID-19, inflammatory markers were higher in obese group than non-obese group at admission, and obese group showed a worse pulmonary clinical picture, with lower PaO2. Thus, obesity could be a risk factor for already hospitalized COVID-19 patients or even for those with serious symptoms.
These potential mechanisms may explain the reason why the association of BMI (male) and the number of deaths increased May to June 2020. Typical COVID-19 patients are hospitalized a few days to a week after infection and in some cases the symptom become severe a few weeks after hospitalization. If the obesity affects the last phase of the transition of the disease, there must be some delay in the increase in correlations between BMI and the number of deaths per million compared to the spread of COVID-19. However, we cannot rule out the possibility that obese people have high chance to be infected because obese adults are known to inhale air average 50% more per day than non-obese adults (Brochu, 2014), which may lead to increase the chance to inhale the virus.
Age
In contrast to BMI, age ≥ 80 (male) was correlated with the number of deaths from the onset of the global pandemic. This may reflect the common etiological feature that old people are more susceptible to infection and have higher risk of death after infection compared to young people because of the dysfunction of immunity.
The precise reason why older age contributes to the number of COVID-19-related deaths remains to be elucidated, while the immune dysfunction has been proposed as a potential mechanism (Mueller, McNamara, & Sinclair, 2020). Meanwhile, it is controversial whether the reduced immunity simply contributes to the higher mortality after infection. Excessive immunity could cause the cytokine storm and ARDS in COVID-19 in some cases (Ye, Wang, & Mao, 2020). Furthermore, two patients with X-linked agammaglobulinemia have recovered from COVID-19, suggesting that B-cell response may not be necessary for the recover from this disease (Soresina et al., 2020). While other dysfunction of immunity, such as age-related T-cell dysfunction, may be involved in the severity of COVID-19 (Minato, Hattori, & Hamazaki, 2020), the mechanism should be carefully investigated.
Along with immune dysfunction, old individuals tend to provoke unwanted inflammation, which may contribute to the severity of COVID-19 (Mueller, McNamara, & Sinclair, 2020). This tendency can be seen in SARS-CoV-infected old nonhuman primate (Smits et al., 2010). Also, To et al. (2020) reported that older age is correlated with the higher salivary viral load, which is highest during the first week after symptom onset. Furthermore, as in the case of obese patients, decreased respiratory function of elderly could also be the reason. In elder individual, both forced expiratory volume in one second (FEV1) and forced vital capacity (FVC) decrease dramatically (Falaschetti et al., 2004), to almost half of their lifetime maximum values, especially in males (Leem et al., 2019).
Sex differences
We identified a clear sex-dependent difference in the Spearman’s correlation coefficients between the number of deaths per million and obesity-related parameters. This is in line with the fact that males are more susceptible to severe outcomes from COVID-19 than females (Montopoli et al., 2020; Grandi, Facchinetti, & Bitzer, 2020). It has been proposed that estrogen may be protective against COVID-19 in females (Grandi, Facchinetti, & Bitzer, 2020), while androgens may worsen COVID-19 in males (Montopoli et al., 2020). It is also possible that smoking, which is more common in males globally, may play a role in worsening COVID-19 (Grandi, Facchinetti, & Bitzer, 2020). Also, females generally have more fear to COVID-19 and tend to take more preventive behaviors than males (Yıldırım, 2020)
Face mask
Face masks are considered to be effective in preventing transmission of COVID-19 to others, and also have a protective effect in preventing transmission from others. A recent simulation study showed that universal masking at 80% adoption suppresses COVID-19 deaths significantly more than maintaining a lockdown. Comparison between the validated modeling results and empirical data from Asian regions showed an almost perfect correlation between early universal masking and successful suppression of COVID-19 outbreaks (Kai et al., 2020). Also, Zhang et al. (2020) reported that face mask alone could have reduced the number of infections by over 78,000 in Italy from April 6 to May 9 and over 66,000 in New York City from April 17 to May 9, claiming that the failure in containing the propagation of COVID-19 pandemic worldwide is largely attributed to the unrecognized importance of airborne virus transmission.
Likewise, the mask non-wearing rate in mid-March was found to be the strongest predictor for the number of deaths per million in this study, even the single regression by which explained more than 70% of the variation of the response variable.
There was a big difference in face mask wearing rates between Western countries and Asian countries (especially East Asian countries). The county’s policy for wearing face mask alone cannot explain this big difference because, for example, face mask has never been mandated in Japan, despite its high face mask wearing rate. We speculate that the cultural may be the major reason for the difference. Many Japanese wear surgical masks on a daily basis not for shedding infections or pollens, but also for achieving anonymity just like westerners wearing sunglasses, which is referred to as “mask dependency” in excessive cases (Li, 2017). However, while people may hope to achieve anonymity, most of them want to avoid making others uncomfortable, and there is a regional difference in what people feel uncomfortable. Jack, Caldara, & Schyns (2012) mentioned that “whereas Western Caucasian internal representations predominantly featured the eyebrows and mouth, East Asian internal representations showed a preference for expressive information in the eye region”. This tendency may be the major reason why it is more considered rude wearing sunglasses among East Asia (Gesteland, 2020), whereas it is considered more suspicious to wear face masks in Western countries. Yamanaka (2020) wonders if Japan has an “X-factor” that led to the low rates of COVID-19 deaths. Although the present study is not intended to show the causal role of face mask wearing rate, future attempts for intervention may consider the face mask wearing rate as the major candidate for the “X-factor”.
Additionally, most Japanese keep silent while using public transportation, because loud chatter in the public transportation is considered rude in Japan (Baseel, 2020). This may also be able to reduce the case of COVID-19 because more aerosols are exhaled from asymptomatic individuals during speaking than breathing (Buonanno, Stabile, & Morawska, 2020) and it is considered to contribute largely to spread of COVID-19 (Prather, Wang, & Schooley, 2020). Also aerosols’ viral density should be different between speaking and breathing, because origin of the aerosols should be different, mainly from lung or mouth. Aerosols from mouth than lung may contain more virus in asymptomatic individuals. Long et. al (2020) noted that although virus shedding does not equate with viral infectivity, the asymptomatic group had a significantly longer duration of viral shedding than the symptomatic group.
Correlation between variables
There was a significant positive correlation between male BMI and mask non-wearing rate in mid-March. Although this does not simply indicate that obese people tend not to wear masks, it is possible to speculate that when people become more obese, they feel more uncomfortable to wear masks since obese adult inhale air average 50% more per day than non-obese adults (Brochu, 2014). Also, body temperature is positively associated with obesity (Bastardot, 2019), and face masks could rise body temperature (Yip, 2005; Hayashi 2004). Therefore, obese people may feel more heat and discomfort (Li, 2005) when wearing face masks depending on the temperature and humidity. The small size of universal face masks may be a simple reason for the correlation.
There was a significant positive correlation between age ≥ 80 (male) and mask non-wearing rate in mid-March. Due to the lack of data on mask wearing rates by age, this cannot be fully discussed either. It has been reported that age was correlated with fear and were related to preventive behaviors (Yıldırım, 2020). Face mask wearing in public correlates with the proportion of people who are afraid of COVID-19, except for Scandinavian countries and United Kingdom (Smith, 2020). These facts are inconsistent with the correlation found in this study. Further data is required for detailed investigation.
Conclusion
The mask non-wearing rate in mid-March alone explained up to 72% of variations of the number of deaths per million. The fact that the face mask non-wearing rate in mid-March, the early phase of the pandemic, more strongly predicted the total number of deaths per population over the entire three-month course of the pandemic than the face mask non-wearing rate in late April to early May may suggest that face mask wearing from the early phase of the pandemic may be very important for suppressing COVID-19-related deaths, if mask wearing is assumed to be an independent risk factor.
In the lasso regression, four factors, BMI (male), age ≥ 80 (male), the mask non-wearing rate (mid-March), and the mask non-wearing rate in late April to early May were chosen, and the observed and predicted model of number of deaths per million showed high correlation (Pearson’s coefficient = 0.918, P = 1.635e-09). Although this has not been proven at this time to be the cause of the differences in COVID-19 mortality by country, it may be a useful method for predicting COVID-19 mortality.
The death rate of the country in the early stages of a pandemic could be further predicted if factors such as the timing and number of infected people entering the country from abroad are taken into account.
Data Availability
All data was collected from publicly available secondary sources. % of people in each country who say they are: Wearing a face mask when in an public spaces as face mask wearing rate in mid-March(3/9-18) 2020 and late April to early May(4/26-5/1) 2020 of countries listed below were obtained from: Smith M. International COVID-19 Tracker Update: 2 May. 2 May 2020, yougov.co.uk/topics/international/articles-reports/2020/05/01/international-covid-19-tracker-update-2-may Smith M. International COVID-19 Tracker Update: 18 May. 18 May 2020, yougov.co.uk/topics/international/articles-reports/2020/05/18/international-covid-19-tracker-update-18-may Total COVID death per million of countries listed below except for Taiwan were obtained from: Ritchie, Hannah. Coronavirus Source Data. 16 June 2020, ourworldindata.org/coronavirus-source-data Data of BMIs of countries listed below except for Taiwan were obtained from: Global Status Report on Noncommunicable Diseases 2014. 5 Oct. 2015, www.who.int/nmh/publications/ncd-status-report-2014/en/. Population% by age of countries listed below except for Taiwan were obtained from: Kose, Ayhan, et al. World Bank Open Data. Data, 15 June 2020, data.worldbank.org/.