ABSTRACT
Background The burden of the coronavirus disease 2019 (COVID-19) pandemic has been geographically disproportionate. Certain weather factors and population characteristics are thought to drive transmission, but studies examining these factors are limited. We aimed to identify weather, sociodemographic, and geographic drivers of COVID-19 at the global scale using a comprehensive collection of country/territory-level data, and to use discovered associations to estimate the timing of community transmission.
Methods We examined COVID-19 cases and deaths reported up to May 2, 2020 across 205 countries and territories in relation to weather data collected from capital cities for the eight weeks prior to and four weeks after the date of the first reported case, as well as country/territory-level population, geographic, and planetary data. We performed univariable and multivariable regression modeling and odds ratio analyses to investigate associations with COVID-19 cases, deaths, and epidemic growth rate. We also conducted maximum likelihood analysis to estimate the timing of initial community spread.
Findings Lower temperature (p<0.0001), lower humidity (p=0.006), higher altitude (p=0.0080), higher percentage of urban population (p<0.0001), increased air travelers (p=0.00019), and higher prevalence of obesity (p<0.0001) were strong independent predictors of national COVID-19 incidence, mortality, and epidemic growth rate. Temperature at 5–7 weeks before the first reported case best predicted epidemic growth, suggesting that significant community transmission was occurring on average 1–2 months prior to detection.
Conclusions The results of this ecologic analysis demonstrate that global COVID-19 burden and timing of country-level epidemic growth can be predicted by weather and population factors. In particular, we find that cool, dry, and higher altitude environments, as well as more urban and obese populations, may be conducive to more rapid epidemic spread.
BACKGROUND
Coronavirus disease 2019 (COVID-19) is a newly emerged infectious disease that has caused unprecedented global suffering and mortality since its emergence in late 2019 in Wuhan, China. COVID-19 is caused by the severe acute respiratory syndrome virus coronavirus 2 (SARS-CoV-2), an enveloped single stranded RNA virus that is closely related to SARS coronavirus 1 (SARS-Cov-1) and middle east respiratory syndrome coronavirus (MERS-CoV) (1). SARS-CoV-2 is a zoonotic virus that likely originated in horseshoe bats (Rhinolophus affinis), with recent genomic evidence suggesting subsequent passage through an intermediate mammalian host, possibly the heavily trafficked and critically endangered pangolin (mammalian order Pholidota) (2, 3). Following spillover into humans, SARS-CoV-2 displayed remarkably efficient human-to-human spread, with transmission likely mediated primarily by aerosols and respiratory droplets (4, 5). The exponential spread of the virus quickly overwhelmed initial containment efforts, leading to a global pandemic of historic proportions.
Many aspects of COVID-19 epidemiology have yet to be elucidated. For reasons that are not immediately apparent, some countries have experienced significantly higher incidence, mortality, and epidemic growth rate than others. Many factors have been proposed to explain the marked discrepancies in national incidence and mortality following introduction of COVID-19, including weather, demographic, health, and geographic factors (6–8). For instance, it has been proposed that COVID-19, like other droplet-borne respiratory viruses, may spread more efficiently in cold and dry climates (9). Studies in support of this ‘low temperature-low humidity’ theory have focused primarily on incidence and weather data from cities across China, with contradictory findings (10, 11). COVID-19 severity has also been linked to chronic medical conditions, male gender, older age, air pollution, and cigarette smoking (12–15). National case and death counts are also expected to be affected by differences in COVID-19 containment and mitigation strategies, testing capabilities and coverage, health access, and reporting of cases and deaths. Identification of factors that are associated with COVID-19 incidence and mortality and rate of spread on the global scale may shed light on COVID-19 risk factors, transmission, and epidemic dynamics. Therefore, our objective was to test hypotheses for observed differences in global incidence, mortality, and epidemic growth rate by examining country-level COVID-19 case and death data in relation to a suite of publicly available weather, demographic, health, geographic, and planetary data.
METHODS
Study design
In this global country-level analysis, we included all cases and deaths reported across 205 countries and territories that have reported at least one COVID-19 case. Daily COVID-19 case and death counts in each country were collected up to May 2, 2020 from the WHO and European Union Centers for Disease Control database (16). We collected the following demographic data for the corresponding countries and territories: median age, population age 0–14 years (%), population age 15–64 years (%), population over 65 years (%), sex ratio, population size (number), population density (pop/km2), urban population (%), Gross Domestic Product (GDP) per capita, Human Development Index (HDI), number of airports, and number of air travelers per annum. Sources of demographic data included the CIA World Factbook (for median age, sex ratio, urban population, GDP, number of airports) (17), the World Bank (for age structure, number of airline passengers carried) (18), the International Monetary Fund (for GDP) (19), and the United Nations Human Development Program (for HDI) (20).
We collected the following population-level health data: average body mass index (BMI), obesity (%), diabetic (%), cigarette consumption (annual average per capita), hospital beds (per 100,000 population), physicians (per 100,000 pop), health expenditure (US$ per capita), PM2.5 (µg/m3), ozone exposure (ppb), household air pollution (%), Climate Risk Index (2018), and number of COVID-19 tests performed (per 100,000 population). The sources of health data include the World Health Organization (for BMI, health expenditure) (21), the CIA World Factbook (for obesity) (17), the World Bank (for diabetes, hospital beds, physicians) (18), the Tobacco Atlas (for cigarette consumption) (22), the Health Effects Institute (for PM2.5, ozone exposure, and household air pollution) (23), and Germanwatch (for Climate Risk Index) (24). We collected the following climate data from weather stations in each capital city from a global climate data repository, TuTiempo.net (25), which has been used in previous epidemiological studies (26, 27): average temperature (°C), maximum temperature (°C), minimum temperature (°C), atmospheric pressure at sea level (hPA), average relative humidity (%), total rainfall and/or snowmelt (mm), average visibility (km), average wind speed (km/h), total days with snow, total days with thunderstorm, and total days with fog. If the capital city weather station had no data or incomplete data, we selected the largest city closest to the capital city. We collected daily weather data for up to 8 weeks before and four weeks after the date of the first reported case. For countries with a land mass > 1,000,000 km2 we collected climate data from the city where the first confirmed case occurred for sensitivity analyses. We also gathered data on population size, population density, COVID-19 testing, and number of days of sunshine from sources that compiled the latest censuses and official estimates or projections.
Statistical analysis
We used total number of COVID-19 cases, total number of deaths, and cumulative number of cases at 28 days after the first reported case as our main outcomes. The number of cases at four weeks (28 days) was used as a measure of the epidemic growth rate, and enabled normalization of differences in epidemic duration across countries. We averaged (temperature, humidity, atmospheric pressure, visibility, wind speed) or summed (precipitation, days of snow, days of fog, days of thunderstorm) weather variables by week (for sufficient resolution to generate an estimate of the timing of early epidemic spread) and by month prior to the first reported case (for regression modeling). We performed univariable and multivariable negative binomial regression analyses to examine COVID-19 outcomes in relation to weather, demographic, health, and geographic variables as continuous variables. We included variables significant in univariate analyses, as well as those selected a priori based on hypotheses regarding increased risk of infection, in multivariable analyses. We also calculated the odds of all outcomes (total cases, total deaths, and cases at 28 days) as categorical variables (total cases <650 or ≥650, total deaths <15 or ≥15, cases at 28 days <100 or ≥100) by stratifying on categories of weather, demographic, health, and geographic variables. Variable cutoffs were set to obtain roughly equal counts across low, middle, and high categories for each variable, and outcome cutoffs were set to obtain roughly equal counts in the low and high outcome categories. Temperature was stratified into five categories, for higher resolution, while variables with high ‘0’ or ‘1’ count data were stratified into two categories instead of three.
We fit negative binomial regression models to examine the hypothetical timing of COVID-19 exposure. This was achieved by subtracting a range of exposure-to-presentation delays from the day of first reported case and examining likelihood scores for models fit with the different delay durations. Maximum likelihood estimates were generated by calculating the negative log-likelihood (NLL) from the Akaike information criterion (AIC) of a given model, as previously described (26). All log-likelihood scores within 1.92 units of the maximum score were considered to be within the 95% confidence interval (28). All statistical analyses were performed using R (version 4.0.0). Regressions were performed using the glm.nb function in the MASS package with a log-link function. Given overdispersion of count data, a likelihood ratio test was used to determine that the negative binomial regression model was required instead of a standard Poisson model. The Mann-Whitney test was used for country profile comparisons.
Results
COVID-19 burden by country and continent
As of May 2, 2020, 3,306,882 COVID-19 cases and 238,424 COVID-19 deaths were reported globally (overall case fatality rate: 7.2%). We examined cumulative cases, deaths, and epidemic growth rates (cumulative cases in the first 28 days after the first reported case) reported in 205 countries and territories reporting at least one COVID-19 case (Supplementary Table 1). The epidemic growth rate varied greatly by country/territory: in the first 28 days after the first reported case, the cumulative number of cases ranged from 1 to 34,109 cases (median: 91 cases) (Figure 1A). Of total cases in the first 28 days, 45.7% occurred in Asia, 36.1% in Europe, 7.7% in South America, 5.3% in Africa, 4.9% in North America., and 0.4% in Oceania (Figure 1B). The number of deaths in the first 28 days was also highly variable and largely reflected case distributions (Figures 1C and 1D). The total number of cases and deaths up to May 2, 2020 also varied greatly (see Supplementary Figures 1A and 1C), with cases ranging from 3 to 1,103,781 (median: 644) and deaths ranging from 0 to 65,068 (median: 13). Of total cases, 36.9% occurred in Europe, 36.3% in North America, 19.6% in Asia, 5.7% in South America, 1.2% in Africa, and 0.3% in Oceania (Supplementary Figure 1B). Of total deaths, 57.0% occurred in Europe, 29.9% in North America, 8.5% in Asia, 3.9% in South America, 0.7% of deaths in Africa, and 0.1% in Oceania (Supplementary Figure 1D). Countries/territories reported their first case in the months of December (n=1), January (n=21), February (n=36), March (n=138), and April (n=9). Epidemic duration ranged from 23 to 124 days (median: 54 days), although countries did not have the same amount of time in observation.
COVID-19 and country weather factors
Daily weather data for a 12-week period surrounding the date of first reported case (8 weeks prior to four weeks after) were extracted for the capital city of each of 188 countries/territories with available weather data and analyzed in relation to COVID-19 outcomes (total cases, total deaths, and epidemic growth rate). In univariable regression analyses, all examined outcomes were strongly associated with lower temperature (average, minimum, maximum) and lower relative humidity (Table 1 and Figure 2A-C), as well as lower number of days with thunderstorm and lower number of sunshine days (Table 1 and Supplementary Figure 2). In multivariable analysis adjusting for temperature, median age, and average BMI, the only weather variables that remained significantly associated with epidemic growth rate were lower average temperature (p<0.0001), lower minimum temperature (p<0.0001), lower maximum temperature (p<0.0001) and lower relative humidity (p = 0.0055) (Table 1). When examined by temperature, more rapid epidemic growth was evident in countries with average temperatures <15°C (Figure 2D). Countries with temperatures between 6–15°C showed the highest average epidemic growth rate (Figure 2D). Figures 3A and 3B provide a cartographic representation of these data, showing the average temperature in the four weeks preceding the first case in each country, and the case counts at 28 days and total case counts, respectively. Weather variable findings were corroborated when examined in odds ratio (OR) analyses. Countries with low temperature (≤15°C) were at markedly increased odds of having increased total cases, increased total deaths, and rapid epidemic growth relative to countries with high temperature (≥27°C) (Table 2). The odds of rapid epidemic growth were greatest for countries with temperatures between 6 and 15°C (OR: 8.0, 95% CI: 2.9–22.3), followed by countries with temperatures <6°C (OR: 3.4, 95% CI: 1.4–8.6). Countries that experienced at least one day of snow or fog were at significantly increased risk of COVID-19 cases and deaths, while countries with more hours of sunshine and greater visibility were relatively protected (Supplementary Table 2). Rain, humidity, wind, and thunderstorms did not alter the odds of having high case and death counts. For countries with a landmass greater than 1,000,000 km2 (n=29), weather data were also collected from the city where the first COVID-19 case was reported, for sensitivity analysis. For seven countries, this city was different from the capital city, and all univariable and multivariable regression results remained robust to this change.
COVID-19 and demographic characteristics at the country level
In univariable regression analyses (Table 1), COVID-19 total cases, total deaths, and epidemic growth rate were strongly associated with older overall median age, lower percentage of population between 0–14 years of age, higher percentage of population over 65 years of age, increased population size, increased population density, increased urban population, increased GDP, increased HDI, and increased number of annual airline passengers. Sex ratio was not associated with cases at 28 days or total cases, but was associated with total deaths (p<0.0001) in univariable regression analysis. In multivariable models (which adjusted simultaneously for temperature, median age, and BMI), increased population size, increased urban population, and higher number of air travelers remained significantly associated with epidemic growth rate (Table 1). Median age was no longer significant when adjusting for temperature and BMI. Countries with increased population size, urban percentage, higher HDI, and annual airline passengers also had greater risk of cases and deaths (Table 2). Sex ratio was not associated with increased deaths in the odds ratio analysis (OR: 0.83, 95% CI: 0.41–1.7) (Supplementary Table 2).
COVID-19 and health factors at the country level
In univariable regression analyses, epidemic growth rate, cases, and deaths were strongly associated with BMI, high prevalence of obesity, diabetes, cigarette consumption, number of physicians, and health expenditure (Table 1 and Supplementary Figure 2). Number of COVID-19 tests performed was positively associated with total cases (p<0.0001) and total deaths (p<0.0001), but not with epidemic growth rate (p = 0.31). After adjusting for temperature and median age, BMI (p = 0.005) and obesity (p<0.0001) remained positively associated with epidemic growth rate. When adjusting for temperature, median age, and BMI, the number of hospital beds was negatively associated with epidemic growth rate. In odds ratio analyses (Table 2 and Supplementary Table 2), countries with high BMI and high prevalence of obesity, high cigarette consumption, high number of physicians, and high health expenditure were at significantly greater odds of having increased epidemic growth rate. COVID-19 mortality was associated with increased BMI (OR: 4.4, 95% CI: 2.0–9.6), obesity (OR: 2.7, 95% CI: 1.3–5.7), cigarette consumption (OR: 9.5, 95% CI: 3.8–23.8), number of hospital beds (OR: 2.7, 95% CI: 1.3–5.6), number of physicians (OR: 11.0, 95% CI: 4.6–26.3), health expenditure (OR: 8.2, 95% CI: 3.5–19.0), and number of COVID-19 tests performed (OR: 11.0, 95% CI: 2.8–43.1) (Table 2 and Supplementary Table 2).
COVID-19 and geographic/planetary factors at the country level
Latitude, longitude, and altitude data were collected for each country/territory and analyzed in relation to COVID-19 incidence and mortality. More northern latitude (relative to the equator) was significantly positively associated with total cases (p<0.0001), total deaths (p<0.0001), and epidemic growth rate (p<0.0001) in univariable analysis; however, when examined together with temperature, BMI, and median age in a multivariable model, it was no longer significantly associated (p = 0.49) (Table 1 and Supplementary Figure 3). More eastern longitude (relative to the Prime Meridian) was significantly associated with total cases (p = 0.001) and total deaths (p<0.0001), but not epidemic growth rate (p = 0.23), in univariable analyses. In multivariable analysis, the association between longitude and epidemic growth rate was borderline significant (p = 0.047). Higher altitude was significantly associated with increased epidemic growth rate (p<0.0001) and total deaths (0.0003) in univariable analysis, and remained significantly associated in multivariable analysis (p = 0.0008). Among all planetary factors examined, ozone exposure (p<0.0001) was found to be positively associated with cases at 28 days, total cases, and total deaths, while household air pollution and climate risk index were both negatively associated with cases, in univariable analysis. In multivariable analysis, household air pollution (p<0.0001) and climate risk index (p = 0.046) remained significantly negatively associated with epidemic growth rate, while ozone exposure did not (p = 0.11). In odds ratio analyses, high household air pollution (OR: 0.2, 95% CI: 0.09–0.43) and high climate risk index (OR: 0.34, 95% CI: 0.16–0.73) were associated with reduced odds of rapid epidemic growth, while high ozone exposure was associated with increased odds of high epidemic growth rate (OR: 2.8, 95% CI 1.3–6.0) (Table 2 and Supplementary Table 2).
Profile of highly COVID-19 susceptible and non-susceptible countries
Table 3 compares the profiles of countries with the highest epidemic growth rates (top 20%; n=40) and lowest epidemic growth rates (bottom 20%; n=42). Countries in the top 20% had a median of 1241 cases while countries in the bottom 20% had a median of 9.5 cases at 28 days after the first case. Compared to countries in the bottom 20%, countries in the top 20% had lower temperature (median: 10.4°C vs 26.0°C), lower relative humidity (69.3% vs 72.2%), higher altitude (106m vs 37m), older median age (35.5 vs. 33.3), increased urban population (74.4% vs 61.3%), higher GDP ($29,441 vs. $15,124), higher BMI (26.3 vs. 25.9), and increased cigarette consumption (1204.3 vs. 635.1 annual per capita).
Maximum likelihood estimation of epidemic start date
The strong association between temperature and COVID-19 cases allowed us to generate a conditional estimate of the average epidemic start date, relative to the date of first reported case. Temperature was selected because it was the strongest weather predictor of COVID-19 cases in regression analyses. Comparison of the countries with highest vs. lowest epidemic growth rate with the countries with lowest epidemic growth rate revealed significantly lower temperatures among the countries with high case counts (Figure 4A). In addition, for countries with highest epidemic growth rate in the first 28 days, temperatures were lowest 5–7 weeks prior to the first reported case. We used a maximum likelihood approach to generate an estimate of the epidemic start date, conditional on the validity of the association with temperature. We incorporated hypothetical exposure-to-presentation delays in our regression model linking case counts at an intermediate epidemic time point (six weeks) to temperature and compared the goodness-of-fit of these models for delays of 0–8 weeks. The likelihood scores incorporating these delays indicate that the temperature 5–7 weeks prior to the first reported case best predicted the epidemic size at six weeks (Figure 4B), reflecting average temperature patterns (Figure 4A). This suggests that community transmission was occurring on average 1–2 months before the first laboratory-confirmed case.
DISCUSSION
The goal of this study was to identify weather, demographic, health, geographic, and planetary factors associated with national COVID-19 epidemic growth rate, incidence, and mortality. We included all cases and deaths reported globally from the date of the first reported case up until May 2, 2020, and examined global weather data contemporaneous to the first reported case in each country, as well as a comprehensive suite of demographic, health, geographic, and planetary factors. We identified lower temperature, lower relative humidity, higher altitude, high prevalence of obesity, and higher number of air travelers as key predictors of COVID-19 incidence, mortality, and epidemic growth rate at the country level. In addition, we used the strong association with temperature to generate a maximum likelihood estimate of the true epidemic start date, revealing that community transmission likely began on average 1–2 months prior to detection of the first reported case.
Several studies have examined weather factors (29, 30), demographic factors (12), health factors (13), geographic factors (31), and planetary factors (14, 32) in relation to COVID-19 incidence and mortality. To our knowledge, this is the first study to do so at high spatial and temporal resolution using global incidence, death, and weather data, and to simultaneously examine a panel of demographic, health, and geographic factors. Our findings strongly link epidemic growth rate, incidence, and mortality with lower temperatures (6–15°C) and lower humidity at the global scale, corroborating the findings of two studies in China (6, 30). Our results suggest that temperatures <15°C, but especially between 6 and 15°C, may be particularly conducive to COVID-19 spread, and argue for especially vigilant surveillance in countries anticipating average daily temperatures in this range over the coming year (Figure 5). Other weather variables did not offer any additional explanatory power in our analyses. Our findings are consistent with the proposed mechanism of COVID-19 transmission, and what is known about other viral infections transmitted primarily by respiratory droplet, such as influenza (33). Cold and dry climates are thought to be conducive to viral survival in the environment and host susceptibility to infection (34). Relative humidity and temperature have been shown to determine respiratory droplet evaporation kinetics and ultimate equilibrium droplet size, which in turn determines how long the droplet will remain suspended in air and where it will deposit in the respiratory system once inhaled (35). Increased temperature also promotes more rapid viral degradation (36). Although UV light has been shown to promote viral degradation in laboratory settings (37), including in SARS-CoV-1 (38), our observed association between COVID-19 and reduced sunshine appears to be entirely explained by covariation in temperature. Our findings regarding temperature and humidity may provide an explanation for outbreaks related to indoor air-conditioning systems (39), and highlight the importance of further research on indoor transmission risk factors. Our estimation of the true epidemic start date (5–7 weeks prior to first reported case) is in line with findings from retrospective screening studies (40, 41). Our analysis suggests 1–2 months of undetected COVID-19 community transmission as a ubiquitous feature of COVID-19 epidemics, a feature that almost certainly promoted rapid regional and global spread. We speculate that the 5–7 week delay in detection that we observe was likely due to combination of factors: high degree of pre-symptomatic COVID-19 transmission (42), delayed recognition of community spread (40), and delayed roll-out of testing. Our analysis demonstrates the feasibility of using a weather variable to estimate and predict the timing of community transmission in a fast-moving pandemic, when widespread testing and serologic data may not yet be available.
A number of studies have linked older age, obesity, and chronic medical conditions such as diabetes and hypertension to increased COVID-19 severity (13, 43), and it is therefore not surprising that BMI and obesity were significantly associated with COVID-19 incidence and mortality at the country level. Although we observed a strong association between age and COVID-19 outcomes in univariable analyses, age was not independently predictive in multivariable models. Prior studies have reported mixed findings on the effect of cigarette smoking on COVID-19 risk and severity (44–46). Our analysis points to a clear and significant association between increased cigarette consumption and increased national COVID-19 outcomes in univariate and odds ratio analyses, which aligns with well-established literature on increased severity of respiratory infections due to smoking (47). In multivariable modeling, however, this association did not reach statistical significance. This may be due, in part, to the fact that per capita cigarette consumption is a poor proxy for actual exposure. Our observed association with number of air travelers likely reflects a potentiating effect of repeated ‘seeding’ of COVID-19 on epidemic initiation and growth. Higher altitude was strongly associated with epidemic growth rate, even when adjusting for temperature and humidity. It is possible that the air at higher altitudes is more conducive to COVID-19 transmission for unknown reasons, and this may explain lay reports of especially explosive epidemics in ski resorts across the world. Our observed positive association with increased HDI is surprising and appears to contradict our negative association with number of hospital beds. These associations are not explained by differences in testing, as they remain significant when adjusting for total tests performed.
Our study is limited by its ecologic study design, precluding unbiased inference regarding causality or individual risk (48). In addition, although we attempted to include an exhaustive list of potentially confounding variables in this study, our analysis did not include data on all potential confounders. For instance, we could not control for differences in mitigation efforts (such as social distancing and national lockdown orders), accuracy of case and death reporting, number of superspreaders, or genetic susceptibility to infection. Use of capital city weather data also limited the resolution of our analyses, as these may not necessarily represent the weather of the entire country, particularly for countries with larger land masses. This was necessary, however, given the unavailability of accurate city-level incidence and death data for most countries. In addition, many of the demographic variables included in this analysis are measured at the population level and may therefore fail to accurately reflect risks faced at the individual level. Finally, it should be noted that the factors included in this analysis are unable to explain all of the variability in COVID-19 burden. Some generally warm and humid countries, such as Brazil and Indonesia, are experiencing severe COVID-19 epidemics. This suggests that epidemic intensity cannot be fully explained by weather or population characteristics, and mitigation strategies should be enforced. This study was strengthened by inclusion and simultaneous examination of data on a wide range of weather, demographic, health, geographic, and planetary factors. Our findings may help guide prevention and mitigation efforts by predicting future epidemic hotspots and the likelihood and timing of seasonal peaks in incidence. They also highlight the need for widespread and early testing to detect ongoing community transmission, particularly in areas anticipating the arrival of lower temperatures. Further study of the weather, sociodemographic, and geographic factors governing the rate of COVID-19 transmission will be critical to combatting this unprecedented pandemic.
Data Availability
Data referred to in this manuscript will be made publicly available at the time of peer-reviewed publication.
AUTHOR CONTRIBUTIONS
N.Y.L and P.L.B designed research. N.Y.L, M.A.B., and P.L.B performed research. N.Y.L and P.L.B. analyzed data. N.Y.L, M.A.B., and P.L.B wrote the paper.
CONFLICT OF INTEREST
The authors declare no conflict of interest.
ACKNOWLEDGEMENTS
We thank Drs. Ann Chao and Marc Bulterys for their careful reading of the manuscript.