Abstract
Identification of biomedical and socioeconomic predictors for the number of deaths by COVID-19 among countries will lead to the development of effective intervention. While previous multiple regression studies have identified several predictors for the number of COVID-19-related deaths, little is known for the association with mask non-wearing rate possibly because the data is available for limited number of countries, which constricts the application of traditional multiple regression approach to screen a large number of potential predictors. In this study, we used the hypothesis-driven regression approach to test the association with limited number of predictors based on the hypothesis that the mask non-wearing rate can predict the number of deaths to a large extent together with age and BMI, other relatively independent risk factors for hospitalized patients of COVID-19. The mask non-wearing rate, percentage of age ≥ 80 (male), and male BMI showed Spearman’s correlations up to about 0.8, 0.7, and 0.6, respectively, with the numbers of deaths per million in 22 countries from mid-March to mid-June, 2020. The observed numbers of deaths per million were significantly correlated with those predicted by the lasso regression model including four predictors, age ≥ 80 (male), male BMI, and mask non-wearing rates from mid-March and late April to early May (Pearson’s coefficient = 0.919). The multiple linear regression models including the mask non-wearing rates, age, and obesity-related predictors explained up to 75% variation of the number of deaths per million in the 22 countries with little concerns about multicollinearity. Furthermore, linear regressions using the mask non-wearing rate in mid-March as the sole predictor still explained up to 72% of the variation of the numbers of deaths from March to mid-June, emphasizing the importance of the strongest predictor. Although further verification is needed to identify causes of the national differences in COVID-19 mortality rates, these findings highlight the importance of the mask, age, and BMI in predicting the COVID-19-related deaths, providing a useful strategy for future regression analyses that attempt to contribute to the mechanistic understanding of COVID-19.
Introduction
There have been considerable differences in the numbers of deaths by COVID-19 among countries. Western countries (Europe and the United States), in particular, have been suffered from the high number of deaths and the high death rate compared to East Asian countries. Several regression studies have identified predictors, or candidates for true risk factors, for the difference among countries such as age, obesity, and previous BCG vaccination (Bellali, Chtioui, & Chahed, 2020; Leung, Bulterys, & Bulterys, 2020; Squalli, 2020; Knittel & Ozaltun, 2020). However, it is important to continue characterizing novel and known predictors under various biomedical and socioeconomic contexts for potential future interventions to counteract the ongoing COVID-19 pandemic.
Face masks have been the subject of increasing attention throughout the pandemic period. Leffler et al. (2020) used issued dates for each government’s mandates or recommendations for face mask wearing as a predictor in their regression analyses, demonstrating that the duration of infection in the country, duration of wearing masks, and percentage of the population over age 60 significantly predicted each country’s COVID-19 mortality rate. Also, a systematic review and meta-analysis identifying 172 observational studies indicated that face mask use could result in a large reduction in risk of COVID-19 (Chu et al., 2020). These findings, along with the big difference in face mask wearing rates between Western and other East Asian countries, suggest that the importance of mask wearing rate as a predictor for the number of deaths by COVID-19. However, to our knowledge, there has been no analytical studies including the face mask wearing rate in regression models. This is possibly because the common regression approach uses a large number of samples and predictors to achieve the high prediction rates. The mask wearing rate is available only for limited number of countries.
Other studies identified age, BMI, and male sex as relatively independent risk factors for hospitalized patients of COVID-19. For example, in a cohort of hospitalized patients in a minority-predominant population, severe obesity, increasing age, and male sex were associated with higher in-hospital mortality (Palaiodimos et al., 2020). The critical illness after admission was strongly associated with age, heart failure, BMI > 40, and male sex in 5279 patients with laboratory confirmed COVID-19 patients in New York City (Petrilli et al., 2020). Docherty et al. (2020) reported using data from 20133 patients admitted to hospital with COVID-19 in UK that independent risk factors for mortality are increasing age, male sex, and chronic comorbidity including obesity. Furthermore, several mechanistic implications have been explored for these factors (Miyazawa, 2020; Mueller, McNamara, & Sinclair, 2020; Grandi, Facchinetti, & Bitzer, 2020), which has supported their independent nature.
In this study, we employed the hypothesis-driven regression approach to identify predictors for the number of deaths by COVID-19 across countries based on the hypothesis that the mask non-wearing rate can predict the number of deaths to a large extent together with age, BMI, and sex, other relatively independent risk factors for hospitalized patients of COVID-19. This approach uses limited number of a priori determined predictors based on a certain hypothesis. The hypothesis-driven regression is especially useful in providing biological insights related to the association with small number of samples, while the implication is restricted to the hypothesis, and has been used in various fields of research (Klein et al., 2013; Sabuncu et al., 2016; Carriedo et al., 2020). Based on the results we discussed the cultural factors for the observed difference among countries, which may have lowered the number of COVID-19-related deaths in East Asian countries.
Methods
Data collection World
All data was collected from publicly available secondary sources. Analyses in the present study included 13 Western (UK, France, Italy, USA, Spain, Mexico, Germany, Canada, Sweden, Norway, Finland, Denmark, and Australia) and 9 Asian (Malaysia, China, Saudi Arabia, India, Indonesia, Philippines, Japan, Singapore, and Thailand) countries. These countries were chosen because of the availability for mid-March mask-wearing data. The face mask wearing rates in mid-March (3/9-18) 2020 and late April to early May (4/26-5/1) 2020 across countries were derived from “% of people in each country who answered they are: Wearing a face mask when in public spaces” in Smith (2020). This database (YouGov) has partnered with the Institute of Global Health Innovation at Imperial College London and summarizes interviews with nationally representative sample sizes (150– 2000/week depending on countries). Total COVID deaths per million were obtained from Ritchie (2020). BMI data was obtained from Global Status Report on Noncommunicable Diseases 2014. 5 Oct. 2015, www.who.int/nmh/publications/ncd-status-report-2014/en/. Population percent by age data was obtained from https://data.worldbank.org on June 15, 2020.
United States
The face mask wearing rates in late March to late April (3/26-4/29) 2020 across states in the United States were derived from “State-by-state: Face mask adoption across the US; Which, if any, of the following measures have you taken in the past 2 weeks to protect yourself from the Coronavirus (COVID-19)? (% of US adults in each state who say they wear a face mask when in public)” in Nguyen (2020). Total COVID deaths per million by state in the United States on July 3 were obtained from CDC COVID Data Tracker July 3, 2020, https://www.cdc.gov/covid-data-tracker/index.html. (Accessed July 3, 2020).
Data analysis
R version 3.6.2 was used for all statistical analyses. R packages used in this study include fields, ggplot2, glmnet, tidyverse, broom, car, and ggcorrplot. Pearson’s and Spearman’s correlations were calculated by cor and cor.test functions. Lasso regression was used to find potential association of predictors with the number of deaths per million (Tibshirani, 2011). Multiple and single linear regression models were built using the lm function.
Results
Spearman’s correlations
We first calculated Spearman’s correlations between the number of deaths per million from various dates and predictors related to the mask non-wearing rate, age, and obesity (Fig. 1). The correlation coefficients between the number of deaths per million and mask non-wearing rates (both mid-March and late April to early May) showed time-dependent increases, reaching the apparent plateau in early May. The mask non-wearing rate in mid-March generally showed higher correlation coefficients than those that in late April to early May. The highest correlation of 0.79 was observed between the mid-March mask non-wearing rate and the number of deaths per million on June 6, 2020 (P = 6.865e-06).
Age-related predictors were also highly correlated with the number of deaths per million with Spearman’s correlation coefficients of up to ∼0.68. In both sexes, correlation coefficients were lower in younger age groups (age 65–69) compared to those of higher age groups, and the highest correlations were found in age ≥ 80 groups. In contrast to the mask non-wearing rate, correlations between the number of deaths and age-related predictors were highest in mid-May and decreased thereafter.
In general BMI showed lower correlations with the number of deaths per million with a clear sex-dependent difference. Male BMI was more closely associated with the number of deaths per million than female BMI. The highest correlation of 0.59 was observed between male BMI and the number of deaths per million on May 26.
Time course profiles for Spearman’s correlation coefficients of several highly correlated parameters demonstrated the unique feature of age ≥ 80 (male) among parameters used (Fig. 2). Spearman’s correlation coefficients for this parameter remained almost constant throughout the study period, whereas those for other three parameters showed time-dependent increase and reached the apparent plateau around day 50 (mid-May).
Scatter plots
We subsequently made scatter plots showing the relationship between the number of deaths per million and predictors that showed high Spearman’s correlations (Fig. 3). The number of deaths per million on May 13 was used because it showed the highest correlations with most important predictors including the mask non-wearing rates (mid-March), age ≥ 80 (male), and male BMI. The number of deaths per million showed exponential type increases in most plots, in the plot with the mask non-wearing rate (mid-March) in particular, with clear separations between Western and Asian countries. As expected, the number of deaths per million did not show a clear positive correlation with the mask non-wearing rate (late April to early May). Rather, in the Western countries there was a tendency toward a negative association between the number of deaths per million and mask non-wearing rate in April - May (Spearman’s correlation; r = −0.4126551, P = 0.1611). This is attributed to the marked reduction in the mask non-wearing rate in some countries with high number of deaths, possibly caused by the fear of the disease. Likewise, within the United States, the mask non-wearing rate in late March to late April across states also showed a negative correlation with the total number of deaths per million on July 3 (Spearman’s correlation; r = −0.477556, P = 0.0003946) (Fig. S1).
Based on the cross-country scatter plots, we applied the logarithmic transformation to the number of deaths per million. The transformed number of deaths per million on May 13 showed significant linear correlation with the mask non-wearing rate in mid-March (r = 0.842, P = 8.754e-07), mask non-wearing rate in late April to early May (r = 0.496, P = 0.01897), age ≥ 80 (male) (r = 0.658, P = 0.0008717), and BMI (male) (r = 0.682, P = 0.0004762). The relationship between the number of deaths per million (May 13) and age ≥ 80 (female) appeared to be similar in the scatter plot (data not shown), but their linear correlation did not reach the statistical significance with a lower Pearson correlation coefficient compared to that of age ≥ 80 of males (r = 0.416, P = 0.05381).
Lasso regression
Next, we used the least absolute shrinkage and selection operator (lasso) regression to find the potential association of these predictors with the log-transformed number of deaths per million. The lasso regression is a machine learning method that estimates the regression coefficients with the lasso penalty terms, retaining important explanatory variables, while less important variables are removed from the regression (Tibshirani, 2011). When all parameters in Fig. 2 were included together with age ≥ 80 (female), the lasso regression selected BMI (male) (beta = 0.21830178), age ≥ 80 (male) (0.35009932), the mask non-wearing rate (mid-March) (0.06088824), and the mask non-wearing rate in late April to early May (−0.02080248) with an intercept of −6.73379877. Age ≥ 80 (female) was omitted from the regression. The observed number of deaths per million showed high correlation with the number predicted by the lasso regression model (Fig. 4; Pearson’s coefficient = 0.919, P = 1.601e-09).
Multiple linear regression
We subsequently attempted to predict the number of deaths per million by the traditional multiple linear regression approach. In contrast to the machine learning-based lasso regression that is designed primarily for prediction, the multiple linear regression provides a statistical interference, with which the role of each predictor can be discussed in detail. The multiple linear regression, however, has two major constraints: 1) The number of predictors should be conservative, not exceeding ∼10% of the sample size (Steyerberg & Harrell, 2004); 2) The multiple linear regressions is vulnerable to multicollinearity, a severe distortion of the model caused by high correlations among predictors. The definition of high correlation depends on the field and model, but in general |r| > 0.7 (Dormann et al., 2013) or |r| > 0.8–0.9 (Franke, 2010) are considered to be highly correlated. In order to deal with the first constraint, we employed the hypothesis-driven regression using two or three predictors based on the priori determined hypothesis. Predictors selected by the lasso regression were used. For the second constraint, we calculated the correlation between the predictors prior to conducting the multiple regression.
The mask non-wearing rate in mid-March showed significant correlations with that in late April to early May, age ≥ 80 (male), and male BMI with Spearman’s correlation coefficients of 0.87, 0.52, and 0.54, respectively. Compared to the mask non-wearing rate in mid-March, that in late April to early May showed a weak association with male BMI, and its correlation with age ≥ 80 (male) was not statistically significant. Age ≥ 80 (male) and male BMI were not significantly correlated. Therefore, we concluded that our regression model may have a problem of multicollinearity when it contains the two mask non-wearing rates, but other combinations of the predictors would not severely affect the accuracy of the regression. For further verification, we calculated the variance inflation factor (VIF) in all regression models.
We first selected BMI (male), age ≥ 80 (male), and the mask non-wearing rate in mid-March for this regression (Table 1, model 1). This multiple regression model explained 76.0% variation of the number of deaths per million on May 13 (P = 2.075e-06), identifying the significant association with mask non-wearing rate in mid-March and age ≥ 80 (male). BMI (male) also showed a trend for association, but the association was not statistically significant. This regression model may have a weak multicollinearity problem because the VIF of the mask non-wearing rate exceeded 2.5, a strict threshold of VIF in evaluating the presence of multicollinearity.
Other combinations of three predictors also resulted in significant associations between the number of deaths per million on May 13 and the selected predictors with ∼0.7 adjusted R square values (Table 1). We found several important features in these multiple regressions. First, the association with mask non-wearing rate in mid-March was significant in all regressions tested with low P values, and exclusion of this predictor markedly decreased the adjusted R square (model 2). Second, BMI (male) showed a relatively low association with the number of deaths on May 13, being significant only when the strongest predictor, mask non-wearing rate in mid-Match, was not included the regression (model 3). Third, the highest adjusted R square of 0.7934 was obtained when we include predictors age ≥ 80 (male) and mask non-wearing rates of mid-March and late April to early May (model 4). All regression coefficients were statistically significant in this model. Forth, VIF values were generally low unless the two mask non-wearing rates were together included in the model.
Regressions including two predictors also underscored the importance of the mask non-wearing rate in mid-March as a predictor for the number of deaths per million on May 13. Namely, the mask non-wearing rate in mid-March was always a significant predictor (models 5–7) with adjusted R square of up to ∼0.75. While BMI (male) and age ≥ 80 (male) were also significantly associated with the number of deaths per million on May 13, exclusion of the mask non-wearing rate in mid-March dropped the adjusted R square to ∼0.65. It is also noted that the regression coefficient of the mask non-wearing late from April to early May became negative when this predictor was included in the model with that of mid-March (models 4 and 7). This is probably related to the weak association of this predictor (Fig. 3) and its correlation with the mask non-wearing rate from April to May (Fig. 5; see Discussion for details). Altogether, we concluded that the model 5 would be the best regression model that predict about 75% of the variation of the number of deaths per million by satisfying all constrains in the linear regression.
Single regression
Lastly, we built several single linear regression models for predicting the number of deaths from other dates using this predictor (Table 2) since the mask non-wearing rate in mid-March showed the strong association with the number of deaths on May 13. Overall, changes in adjusted R square showed the similar tendency with those of the Spearman’s correlation coefficients (Fig. 1), in which the value gradually increased during March and April, reaching the plateau in mid-May. Overall, the mask non-wearing rate in mid-March explained up to 72% of variations of the number of deaths per million.
Discussion
The number of deaths per million on a certain date highly depends on when the epidemic began in the country, in particular in the early stages of the pandemic. In this study, we have tentatively determined March 20 as the start date of a global pandemic. It is likely that this assumption is a reasonable approximation for our hypothesis because all employed predictors (mask non-wearing rates, age and BMI) tended to show weak correlations with the number of deaths per million at the onset of the pandemic followed by marked increases as the progress of the pandemic. These features enabled us to select a specific date (May 13), at which most parameters showed the high correlations with the number of deaths per million, and to build regression models that well explain the variation of the response variable with limited number of predictors. On the other hand, it is noteworthy that the age-related predictors were highly correlated to the number of deaths per million from the very beginning of the pandemic and then remained constant or gradually declined. Below we discuss the potential biomedical and socioeconomic mechanisms by which each predictor affects the number of deaths by COVID-19.
Obesity
We have proposed possible mechanisms by which obesity could be a risk factor for the severity of COVID-19 and H1N1 influenza infections (Miyazawa, 2020). Namely, respiratory failure is the most important pathology that contributes to the severity of both COVID-19 and H1N1 influenza infections. Since obese patients generally show a restrictive breathing pattern and reduced lung volumes, the obesity-hypoventilation syndrome can lead to respiratory failure in COVID-19 patients, being a risk factor especially for patients with severe symptoms. Also, obesity has been reported to be a risk factor for the development of acute respiratory distress syndrome (Zhi et al, 2016), which is a serious clinical manifestation of COVID-19 (Simonnet et al., 2020). According to Moriconi et al. (2020), in patients of COVID-19, inflammatory markers are higher in obese group than non-obese group at admission, and obese group show a worse pulmonary clinical picture with lower PaO2. Thus, obesity could be a risk factor for already hospitalized COVID-19 patients or even for those with serious symptoms.
These potential mechanisms may explain the reason why the association of BMI (male) and the number of deaths increased May to June 2020. Typical COVID-19 patients are hospitalized a few days to a week after infection and in some cases the symptom become severe a few weeks after hospitalization. If the obesity affects the last phase of the transition of the disease, there must be some delay in the increase in correlations between BMI and the number of deaths per million compared to the spread of COVID-19. However, we cannot rule out the possibility that obese people have high chance to be infected because obese adults are known to inhale air average 50% more per day than non-obese adults (Brochu, 2014), which may lead to increase the chance to inhale the virus.
Age
In contrast to BMI, age ≥ 80 (male) was correlated with the number of deaths from the onset of the global pandemic. This may reflect the common etiological feature that old people are more susceptible to infection and have higher risk of death after infection compared to young people because of the dysfunction of immunity.
The precise reason why older age contributes to the number of COVID-19-related deaths remains to be elucidated, while the immune dysfunction has been proposed as a potential mechanism (Mueller, McNamara, & Sinclair, 2020). Meanwhile, it is controversial whether the reduced immunity simply contributes to the higher mortality after infection. Excessive immunity could cause the cytokine storm and ARDS in COVID-19 in some cases (Ye, Wang, & Mao, 2020). Furthermore, two patients with X-linked agammaglobulinemia have recovered from COVID-19, suggesting that the B-cell response may not be necessary for the recovery from this disease (Soresina et al., 2020). While other dysfunction of immunity, such as age-related T-cell dysfunction, may be involved in the severity of COVID-19 (Minato, Hattori, & Hamazaki, 2020), the mechanism should be carefully investigated.
Along with immune dysfunction, old individuals tend to provoke unwanted inflammation, which may contribute to the severity of COVID-19 (Mueller, McNamara, & Sinclair, 2020). This tendency has been observed in SARS-CoV-infected old nonhuman primate (Smits et al., 2010). Also, To et al. (2020) reported that the older age is correlated with the higher salivary viral load, which becomes the highest during the first week after symptom onset. Furthermore, as in the case of obese patients, decreased respiratory function of elderly could also be the reason. In elder individuals, both forced expiratory volume in one second (FEV1) and forced vital capacity (FVC) decrease dramatically (Falaschetti et al., 2004) to almost half of their lifetime maximum values, especially in males (Leem et al., 2019).
Sex differences
We identified a clear sex-dependent difference in the Spearman’s correlation coefficients between the number of deaths per million and obesity-related parameters. This is in line with the fact that males are more susceptible to severe outcomes from COVID-19 than females (Montopoli et al., 2020; Grandi, Facchinetti, & Bitzer, 2020). Likewise, it has been proposed that estrogen may be protective against COVID-19 in females (Grandi, Facchinetti, & Bitzer, 2020), while androgens may worsen COVID-19 in males (Montopoli et al., 2020). It is also possible that smoking, which is more common in males globally, may play a role in worsening COVID-19 (Grandi, Facchinetti, & Bitzer, 2020). Also, females generally have more fear to COVID-19 and tend to take more preventive behaviors than males (Yildirim, 2020)
Face mask
Face masks are considered to be effective in preventing transmission of COVID-19 to others, and also have a protective effect in preventing transmission from others. A recent simulation study showed that universal masking at 80% adoption suppresses COVID-19 deaths significantly more than maintaining a lockdown. Comparison between the validated modeling results and empirical data from Asian regions showed an almost perfect correlation between early universal masking and successful suppression of COVID-19 outbreaks (Kai et al., 2020). Also, Zhang et al. (2020) reported that face mask alone could have reduced the number of infections by over 78,000 in Italy from April 6 to May 9 and over 66,000 in New York City from April 17 to May 9, claiming that the failure in containing the propagation of COVID-19 pandemic worldwide is largely attributed to the unrecognized importance of airborne virus transmission.
In line with these previous studies, the mask non-wearing rate in mid-March was found to be the strongest predictor for the number of deaths per million in this study, where even the single regression explained more than 70% of the variation of the response variable. We also visualized that there is a big difference in face mask wearing rates between Western countries and Asian countries. The county’s policy for wearing face mask alone cannot explain this big difference because, for example, face mask has never been mandated in Japan, despite its high face mask wearing rate. We speculate that the cultural may be the major reason for the difference. Jack, Caldara, & Schyns (2012) mentioned that “whereas Western Caucasian internal representations predominantly featured the eyebrows and mouth, East Asian internal representations showed a preference for expressive information in the eye region.” This tendency may be the major reason why it is more considered rude wearing sunglasses among East Asia (Gesteland, 2020), whereas it is considered more suspicious to wear face masks in Western countries.
Correlation between variables
There was a significant positive correlation between male BMI and mask non-wearing rate in mid-March, although the degree of correlation was far below to introduce severe multicollinearity in our regression. While this does not simply indicate that obese people tend not to wear masks, it is possible to speculate that when people become more obese, they feel more uncomfortable to wear masks since obese adult inhale air average 50% more per day than non-obese adults (Brochu, 2014). Also, body temperature is positively associated with obesity (Bastardot, 2019), and face masks could rise body temperature (Yip et al., 2005; Hayashi & Tokura, 2004). Therefore, obese people may feel more heat and discomfort when wearing face masks depending on the temperature and humidity (Li et al., 2005). The small size of universal face masks may be a simple reason for the correlation.
There was a significant positive correlation between age ≥ 80 (male) and mask non-wearing rate in mid-March. Due to the lack of data on mask wearing rates by age, this cannot be fully discussed either. It has been reported that age is correlated with fear and preventive behaviors (Yildirim, 2020). Face mask wearing in public correlates with the proportion of people who are afraid of COVID-19, except for Scandinavian countries and United Kingdom (Smith, 2020). These facts are inconsistent with the correlation found in this study. Further data is required for detailed investigation.
Conclusion
The mask non-wearing rate in mid-March alone explained up to 72% of variations of the number of deaths per million, whereas together with age ≥ 80 (male) it explained 75% of the variation. The fact that the face mask non-wearing rate in mid-March, the early phase of the pandemic, more strongly predicted the total number of deaths per population over the entire three-month course of the pandemic than the face mask non-wearing rate in late April to early May may suggest that face mask wearing from the early phase of the pandemic may be very important for suppressing COVID-19-related deaths, if mask wearing is assumed to be an independent risk factor.
In the lasso regression, four factors, BMI (male), age ≥ 80 (male), the mask non-wearing rate (mid-March), and the mask non-wearing rate in late April to early May were chosen, and the observed and predicted model of number of deaths per million showed high correlation (Pearson’s coefficient = 0.919, P = 1.635e-09). Although this has not been proven at this time to be the cause of the differences in COVID-19 mortality by country, it may be a useful method for predicting COVID-19 mortality.
The death rate of the country in the early stages of a pandemic could be further predicted if factors such as the timing and number of infected people entering the country from abroad are taken into account.
Data Availability
All data was collected from publicly available secondary sources. % of people in each country who say they are: Wearing a face mask when in an public spaces as face mask wearing rate in mid-March(3/9-18) 2020 and late April to early May(4/26-5/1) 2020 of countries listed below were obtained from: Smith M. International COVID-19 Tracker Update: 2 May. 2 May 2020, yougov.co.uk/topics/international/articles-reports/2020/05/01/international-covid-19-tracker-update-2-may Smith M. International COVID-19 Tracker Update: 18 May. 18 May 2020, yougov.co.uk/topics/international/articles-reports/2020/05/18/international-covid-19-tracker-update-18-may Total COVID death per million of countries listed below except for Taiwan were obtained from: Ritchie, Hannah. Coronavirus Source Data. 16 June 2020, ourworldindata.org/coronavirus-source-data Data of BMIs of countries listed below except for Taiwan were obtained from: Global Status Report on Noncommunicable Diseases 2014. 5 Oct. 2015, www.who.int/nmh/publications/ncd-status-report-2014/en/. Population% by age of countries listed below except for Taiwan were obtained from: Kose, Ayhan, et al. World Bank Open Data. Data, 15 June 2020, data.worldbank.org/.
Acknowledgements
The authors express sincere thanks to a journalist Mr. Junya Iwai for his help in data collection and constructive discussion.