Abstract
As new cases of SARS CoV-2 (aka 2019-nCoV) Coronavirus are confirmed throughout the world and millions of people are being put into quarantine, few doubt the virus will reach pandemic state. Some worry it could badly hit the developing world, such as sub-Saharan Africa, potentially leading to a global human calamity. It is still early days, but using existing data we develop a large ensemble of ecological niche models that project monthly variation in climate suitability of SARS-CoV-2 Coronavirus throughout a typical climatological year. The current spread suggests a degree of climate determination with Coronavirus displaying preference for cool and dry conditions. The predecessor SARS-CoV was linked to similar climate conditions. Should the spread of SARS CoV-2 continue to follow current trends, a worst-case scenario of synchronous global pandemic is improbable. More probable is the emergence of asynchronous seasonal global outbreaks much like other respiratory diseases. People in temperate warm and cold climates are more vulnerable. Those in arid climates follow next in vulnerability, while the disease will likely marginally affect the tropics. Our projections minimize uncertainties related with spread of SARS CoV-2, providing critical information for anticipating the adequate social, economic and political responses.
Introduction
Biogeography studies the patterns and processes underlying the distribution of Life on earth. One generalization emerging from hundreds of years of natural history observations is that all organisms have a degree of environmental specialization. That is, while biomes in the planet have a range of different types of organisms1, individual types of organisms cannot occur in every biome even when, distant apart as they might be, they converge into playing the same ecological roles within ecosystems2. Biogeographers and ecologists alike resort to the concept of ecological niche3-5 to examine the relationship between the distributions of organisms and other biotic or abiotic factors controlling them. An organism is said to be within its ecological niche if death rates of the organism are lower that birth rates6,7. That is, an organism cannot persist beyond its ecological niche, in a sink, unless there is a regular influx of individuals from source populations. Even if organisms are regularly reaching a sink area, as one might expect with an easily dispersed pathogen, the spread and establishment of the organism will be limited by ecological constraints. Although biogeographic concepts, such as the species ecological niches, are commonly used and applied to multicellular organisms (eukaryotes), there is an increased number of studies utilizing the ecological niche concepts and associated analytical tools to investigate relationships between the distributions of unicellular organisms (prokaryotic), or viruses, and a range of environmental factors8.
Building on the concept of ecological niche, we develop projections of monthly changes in the likelihood of SARS-CoV-2 Coronavirus outbreaks. Projections are obtained from an ensemble of 10 familiar machine learning and statistical ecological niche models9, each with 20 copies generated with bootstrapping to account for and enable the quantification of intra-model variability to the initial conditions10,11. Models were trained using the distribution of all recorded SARS-CoV-2 Coronavirus cases by the 10/03/2020 with data compiled and made available to the John Hopkins University Mapping 2019-nCoV portal12. Regions with fewer than 5 positive cases were not included in the models. Exclusion of such sites was based on the working assumption that sites with small numbers of positive cases are likely imported from infected regions, thus failing to provide evidence that the SARS-CoV-2 Coronavirus is being transmitted locally within its ecological niche. Predictors were temperature and precipitation values expected between January and March using 1979–2013 as reference, and with data downloaded from the high-resolution climatology database for the earth’s land surface13. Models were then projected monthly for the rest of the year.
Results
Analysis of all positive cases of SARS-CoV-2 Coronavirus plotted against monthly temperature and precipitation values reveals that the interquartile range of average environmental temperatures associated with positive cases so far is between −4,01ºC to 15,58ºC (99% range) and −2,04ºC to 9,49ºC (95% range). For precipitation, the interquartile range ranges from 4,68 mm to 116,06 mm (99% range) and 19,75 mm to 94,43 mm (95% range). These values are estimated taking into account total numbers of positive cases, which are obviously strongly determined by contingent factors linked with the origin of the SARS-CoV-2 Coronavirus outbreak (the city of Wuhan in China) and subsequent pattern of spread. While the pattern of spread is, as it seems based on our analysis, constrained by climate, the actual numbers of positive cases are affected by non-climatic factors14, some of which might be stochastic. Less sensitive measurements can be obtained by using presence absence of positive cases. With such an approach, the estimated interquartile ranges for temperature is −18,10ºC to 28,64ºC (99% range) and −8,81ºC to 25,65ºC (95% range). For precipitation it is 1,00 mm to 345,55 mm range (99% range) and 2,16 mm to 151,31 mm (95% range). Regardless of the approach used to quantify the climate envelope of the SARS-CoV-2 Coronavirus, we are not characterizing the exact local temperature and precipitation conditions constraining the virus spread but rather determining the type of macro-climate conditions in the places where spreading is occurring. Nevertheless, regardless of whether we calculate environmental preferences of the SARS-CoV-2 Coronavirus using the total numbers of incidences or their presence and absence, it appears the virus favors cool and dry conditions being largely absent under extremely cold and very hot and wet conditions (Figure 1).
We summarized projections by ensembles of ecological niche models by climate zones15. The analysis reveals that SARS-CoV-2 strives in warm temperate climates between October to May and cold temperate climates between April and September (Figure 2). Arid environments follow the temperate warm trend of seasonal probability of contracting the SARS-CoV-2 Coronavirus but with generally more moderate levels. Much of the tropics have low levels of climate suitability for spread of SARS-CoV-2 Coronavirus owing to their high temperatures and precipitation (used here as a surrogate for humidity), followed by polar climates, where conditions of extreme cold temperatures seem to be beyond the virus critical minimum tolerance values. In most of such low climate suitability areas, human populations will likely be spared from outbreaks arising from local transmissions (Figure 2).
The analysis of risk provided at the climate zone level (Figure 2), masks the sharp seasonality and the fine-grained regional variation in risk that emerges when analyzing the patterns in geographical space (Figure 3). From June to September, much of higher latitude regions of the southern hemisphere, like Argentina, Australia, Brazil, Chile, New Zealand, and Southern Africa will likely be become exposed to new outbreaks of SARS-CoV-2. Models also project highest latitude regions of the northern hemisphere to be badly hit by the Coronavirus during this period, including Canada and Russia, but also the Scandinavian countries. High elevation areas in the Andes and the Himalayas share the same prospects. Concurrently, areas that, as we speak, are of extreme concern in the northern hemisphere (chiefly Italy, Spain, France, Germany, UK, and USA) should witness a reduction in the incidence of new positive cases SARS-CoV-2 Coronavirus. Beyond September and until the end of May, conditions will be suitable for renewed outbreaks in much of warmer temperate regions of Asia, Europe and North America.
Discussion
Not all viruses are climate determined. HIV/AIDS, for example, is not affected by external environmental factors. The virus is transmitted by sexual intercourse, blood transfusions, or from mother to child during pregnancy, delivery or breastfeeding, so it never leaves the host’s internal environmental conditions. In contrast, SARS-CoV-2, like other respiratory viruses, namely its predecessor SARS-CoV, involves aerial transmissions of respiratory droplets or fomites, exposing the virus to external environmental conditions.
SARS-CoV-2 Coronavirus has already set foot in most parts of the world, but virulent outbreaks with large numbers of local infections are still not global. Instead, outbreaks concentrate in the northern hemisphere, chiefly Asia, the Middle East, Central, Southern and Western Europe, and the USA. Our models support the view that the incidence of the virus will follow a seasonal pattern with outbreaks being favored by cool and dry weather, while being slowed down by extreme conditions of cold and heat as well as moist. Prevalence of respiratory disease outbreaks, such as influenza, during wintering conditions is common16,17. But the similarity of climate determination of SARS-CoV-2 with its predecessor SARS-CoV is noteworthy given hope that fundamental traits shared by the two Coronavirus might be conserved.
Analyses of SARS-CoV outbreaks in relation to meteorology reveal significant correlations between the incidence of positive cases and aspects of weather. For example, an initial investigation linking SARS outbreaks and temperature in Hong Kong, Guangzhou, Beijing, and Taiyuan18, revealed significant correlations between SARS-CoV incidences and temperature seven days (the known period of incubation of SARS-CoV) before the outbreak, with environmental temperatures associated with positive cases of SARS-CoV ranging between 16ºC to 28ºC. They also found that incidence of the Coronavirus was inversely related to humidity. Another study conducted between 11 March and 22 May 2003 in Hong Kong19 showed that SARS-CoV incidences sharply decreased as temperature increased from 15ºC to 29ºC, after which it practically disappeared. In this study, incidences under the cooler end of the gradient were 18-fold higher than under the opposite warmer end of the gradient.
The mechanism underlying these patterns climate determination is likely linked with the ability of the virus to survive external environmental conditions prior to reaching a host. For example, a recent study examined survival of dried SARS-CoV Coronavirus on smooth surfaces and found that it would be viable for over 5 days at temperatures ranging between 11-25ºC and relative humidity of 40-50%, drastically loosing viability as temperatures and humidity increased20. Heat intolerance of the Corona viruses is probably related to their being covered by a lipid bilayer21,22, which could breakdown easily as temperatures increase. Humidity in the air is also expected to affect the transmissibility of respiratory viruses. Once the pathogens have been expelled from the respiratory tract by sneezing, they literally float in the air and they do so for a longer period when the humidity is greater.
More detailed examination of SARS-CoV-2 outbreak relationships with weather events will only be possible once the spread of the virus has stabilized. The current macroecological-level analysis enables inferences that would otherwise not be possible with high-resolution data for specific case studies. That is, substitute the familiar analysis of meteorological variation at site levels matched with specific SARS-CoV-2 cases by examination of all known positive cases worldwide against an analysis of large-scale climatological variation. It is, obviously, possible that, as the virus spreads and additional climate regions witness outbreaks of positive cases, inferences made herein are altered. We are skeptical this will happen for two reasons. Firstly, there is little reason to suspect that out-of-China contaminations would have occurred only, or mainly, with trade partners in the northern hemisphere14. China is a big world player, having key commercial partnerships with Africa and Latin America. Yet there is not indication that meaningful local infections have taken place in these areas despite the global reporting of Coronavirus cases generally attributed to travellers coming from infected regions. Secondly, the climatic discrimination of the outbreaks is such that is seems unlikely to be a consequence of random chance, trade preferences with China, or just the outcome of poorly developed public policies. On the contrary, the SARS-CoV-2 Coronavirus, although being still expanding, seems to have followed closely the expected (given what we know of SARS-CoV) pattern of climate suitability. This suggests the Coronavirus might have reached equilibrium with climate23,24, which, if true, would have contributed to significantly reduce the unaccounted for data biases and uncertainties entering our ecological niche models25.
Understanding the underlying factors involved in the successful spread of SARS-CoV-2 Coronavirus is critical to manage the timing and scale of the social, economic, and political reactions to it. While the Coronavirus is likely to spread much more widely than at present, owing to the seasonal changes of climate suitability, it is unlikely to do so with the same intensity, simultaneously. Our results will allow anticipate the timing and the magnitude of the likely public interventions to mitigate the adverse consequences of the Coronavirus on public health. Only with adequate planning will unnecessary collateral damages be imposed on individuals and the global economy.
Data Availability
All data is in public repositories.
Methods
SAR-CoV-2 Coronavirus data
We downloaded the geo-referenced coordinates of SAR-CoV-2 Coronavirus cases from the data repository (https://github.com/CSSEGISandData/COVID-19/blob/master/README.md) operated by the Johns Hopkins University Center for Systems Science and Engineering with support from ESRI Living Atlas Team and the Johns Hopkins University Applied Physics Lab. The data was downloaded on 08/03/2020.
Climate data
We downloaded temperature (mean, maximum, minimum) and precipitation (accumulated) from CHELSA (Climatologies at high resolution for the earth’s land surface areas; http://chelsa-climate.org)13. This is a high-resolution (30 arc sec) climate data set for the earth land surface hosted by the Swiss Federal Institute for Forest, Snow and Landscape Research WSL. It provides monthly summaries covering the period starting in January 1979 until December 2013. The time series data were then aggregated monthly though averaging so to provide expected values for a typical climatological month in the recent past.
Ecological niche models
The spatial distribution of incidence SAR-CoV-2 Coronavirus records were linked to the corresponding monthly climate data. We used SDM-R platform9 for ensemble ecological niche modeling10 (or species distributions modeling25), to characterize climate conditions associated with outbreaks of SARS-CoV-2 between January and March 2020. We used 10 commonly used machine learning methods including generalized linear model (GLM)26, generalized additive model (GAM)27, classification and regression trees (CART)28, boosted regression trees (BRT)29, random forests (RF)30, multiple discriminant analysis (MDA)31, support vector machine (SVM)32, multi-layer perceptron neural networks (MLP)33, maximum entropy (Maxent)34, and multivariate adaptive regression splines (MARS)35. We used a bootstrapping resampling procedure36, with 20 replications, to generate the training and test datasets. We then fitted the ecological niche models for each replication using the training dataset and evaluated them for their performance using the test dataset. We used the area under curve (AUC) of receiver operating characteristic (ROC) plot and the true skill statistic (TSS) to measure the predictive performance of models37. A ROC curve plots sensitivity values (true positive fraction) on the y-axis against ‘1 – specificity’ values (false positive fraction) for all thresholds on the x-axis. AUC is a threshold-independent metric that varies from 0 to 1 and provides a single measure of model performance. AUC values under 0.5 indicate discrimination worse than chance; a score of 0.5 implies random predictive discrimination; and a score of 1 indicates perfect discrimination. TSS is calculated as “sensitivity + specificity −1” and ranges from −1 to +1, where +1 indicates perfect agreement, a value of 0 implies agreement expected by chance, and a value of less than 0 indicates agreement worse than chance. We then used the ensemble of 200 models to calculate and project a consensus distribution of spreading risk for each month across the globe. Consensus was achieved through AUC-weighted mean across all models38.
Author contributions
MBA conceived the study and wrote the manuscript. BN conducted the analysis and prepared graphical material.
Author’s information
Reprints and permissions information is available at www.nature.com/reprints
Competing interests: None to declare.
Correspondence and requests for materials should be addressed to maraujo{at}mncn.csic.es
Acknowledgements
The authors thank and congratulate the John Hopkins University for real-time compilation and release of incidence data for SAR-CoV-2 Coronavirus, without which this investigation would not have been possible.