Diverse local epidemics reveal the distinct effects of population density, demographics, climate, depletion of susceptibles, and intervention in the first wave of COVID-19 in the United States ================================================================================================================================================================================================ * Niayesh Afshordi * Benjamin Holder * Mohammad Bahrami * Daniel Lichtblau ## Abstract The SARS-CoV-2 pandemic has caused significant mortality and morbidity worldwide, sparing almost no community. As the disease will likely remain a threat for years to come, an understanding of the precise influences of human demographics and settlement, as well as the dynamic factors of climate, susceptible depletion, and intervention, on the spread of localized epidemics will be vital for mounting an effective response. We consider the entire set of local epidemics in the United States; a broad selection of demographic, population density, and climate factors; and local mobility data, tracking social distancing interventions, to determine the key factors driving the spread and containment of the virus. Assuming first a linear model for the rate of exponential growth (or decay) in cases/mortality, we find that population-weighted density, humidity, and median age dominate the dynamics of growth and decline, once interventions are accounted for. A focus on distinct metropolitan areas suggests that some locales benefited from the timing of a nearly simultaneous nationwide shutdown, and/or the regional climate conditions in mid-March; while others suffered significant outbreaks prior to intervention. Using a first-principles model of the infection spread, we then develop predictions for the impact of the relaxation of social distancing and local climate conditions. A few regions, where a significant fraction of the population was infected, show evidence that the epidemic has partially resolved via depletion of the susceptible population (i.e., “herd immunity”), while most regions in the United States remain overwhelmingly susceptible. These results will be important for optimal management of intervention strategies, which can be facilitated using our online dashboard. ## Introduction The new human coronavirus SARS-CoV-2 emerged in Wuhan Province, China in December 2019 (Chen et al., 2020; Li et al., 2020), reaching 10,000 confirmed cases and 200 deaths due to the disease (known as COVID-19) by the end of January this year. Although travel from China was halted by late-January, dozens of known introductions of the virus to North America occurred prior to that (Holshue et al., 2020; Kucharski et al., 2020), and dozens more known cases were imported to the US and Canada during February from Europe, the Middle East, and elsewhere. Community transmission of unknown origin was first detected in California on February 26, followed quickly by Washington State (Chu et al., 2020b), Illinois and Florida, but only on March 7 in New York City. Retrospective genomic analyses have demonstrated that case-tracing and self-quarantine efforts were effective in preventing most known imported cases from propagating (Ladner et al., 2020; Gonzalez-Reiche et al., 2020; Worobey et al., 2020), but that the eventual outbreaks on the West Coast (Worobey et al., 2020; Chu et al., 2020b; Deng et al., 2020) and New York (Gonzalez-Reiche et al., 2020) were likely seeded by unknown imports in mid-February. By early March, cross-country spread was primarily due to interstate travel rather than international imports (Fauver et al., 2020). In mid-March 2020, nearly every region of the country saw a period of uniform exponential growth in daily confirmed cases — signifying robust community transmission — followed by a plateau in late March, likely due to social mobility reduction. The same qualitative dynamics were seen in COVID-19 mortality counts, delayed by approximately one week. Although the qualitative picture was similar across locales, the quantitative aspects of localized epidemics — including initial rate of growth, infections/deaths per capita, duration of plateau, and rapidity of resolution —were quite diverse across the country. Understanding the origins of this diversity will be key to predicting how the relaxation of social distancing, annual changes in weather, and static local demographic/population characteristics will affect the resolution of the first wave of cases, and will drive coming waves, prior to the availability of a vaccine. The exponential growth rate of a spreading epidemic is dependent on the biological features of the virus-host ecosystem — including the incubation time, Susceptibility of target cells to infection, and persistence of the virus particle outside of the host — but, through its dependence on the transmission rate between hosts, it is also a function of external factors such as population density, air humidity, and the fraction of hosts that are susceptible. Initial studies have shown that SARS-CoV-2 has a larger rate of exponential growth (or, alternatively, a lower doubling time of cases1) than many other circulating human viruses (Park et al., 2020). For comparison, the pandemic influenza of 2009, which also met a largely immunologically-naive population, had a doubling time of 5–10d (Yu et al., 2012; Storms et al., 2013), while that of SARS-CoV-2 has been estimated at 2–5d (Sanche et al., 2020; Oliveiros et al., 2020) (growth rates of ∼0.10d−1 vs. ∼0.25d−1). It is not yet understood which factors contribute to this high level of infectiousness. While the dynamics of an epidemic (e.g., cases over time) must be described by numerical solutions to nonlinear models, the exponential growth rate, *λ*, usually has a simpler dependence on external factors. Unlike case or mortality incidence numbers, the growth rate does not scale with population size. It is a directly measurable quantity from the available incidence data, unlike, e.g., the reproduction number, which requires knowledge of the serial interval distribution (Wallinga and Lipsitch, 2007; Roberts and Heesterbeek, 2007; Dushoff and Park, 2020), something that is difficult to determine empirically (Champredon and Dushoff, 2015; Nishiura, 2010). Yet, the growth rate contains the same threshold as the reproduction number (*λ* = 0 vs. *R* = 1), between a spreading epidemic (or an unstable uninfected equilibrium) and a contracting one (or an equilibrium that is resistant to flare-ups). Thus, the growth rate is an informative direct measure on that space of underlying parameters. In this work, we leverage the enormous data set of epidemics across the United States to evaluate the impact of demographics, population density and structure, weather, and non-pharmaceutical interventions (i.e., mobility restrictions) on the exponential rate of growth of COVID-19. Following a brief analysis of the initial spread in metropolitan regions, we expand the meaning of the exponential rate to encompass all aspects of a local epidemic — including growth, plateau and decline — and use it as a tracer of the dynamics, where its time dependence and geographic variation are dictated solely by these external variables and per capita cumulative mortality. Finally, we use the results of that linear analysis to calibrate a new nonlinear model — a renewal equation that utilizes the excursion probability of a random walk to determine the incubation period — from which we develop local predictions about the impact of social mobility relaxation, the level of herd immunity, and the potential of rebound epidemics in the Summer and Fall. ## Results ### Initial growth of cases in metropolitan regions is exponential with rate depending on mobility, population, demographics, and humidity As an initial look at COVID-19’s arrival in the United States, we considered the ∼100 most populous metropolitan regions — using maps of population density to select compact sets of counties representing each region (see Supplementary Material) — and estimated the initial exponential growth rate of cases in each region. We performed a linear regression to a large set of demographic (sex, age, race) and population variables, along with weather and social mobility (Fitzpatrick and Karen, 2020) preceding the period of growth (Figure 1). In the best fit model (*R*2 = 0.75, BIC = −183), the baseline value of the initial growth rate was *λ* = 0.21d−1 (doubling time of 3.3d), with average mobility two weeks prior to growth being the most significant factor (Figure 1B). Of all variables considered, only four others were significant: population density (including both *population-weighted density* (PWD) — also called the “lived population density” because it estimates the density for the average individual (Craig, 1985)— and *population sparsity, γ*, a measure of the difference between PWD and standard population density, see Methods), *p* < 0.001 and *p* = 0.006; specific humidity two weeks prior to growth, *p* = 0.001; and median age, *p* = 0.04. ![Figure 1:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/07/02/2020.06.30.20143636/F1.medium.gif) [Figure 1:](http://medrxiv.org/content/early/2020/07/02/2020.06.30.20143636/F1) Figure 1: Mobility and COVID-19 incidence data examples, and the results of linear regression to extracted initial exponential growth rates, λexp, in the top 100 metropolitan regions. (A) Three example cities with different initial growth rates. Data for Google mobility (blue points), daily reported cases (black points), and weather (red and blue points, bottom) are shown with a logistic fit to cases (green line). Data at or below detection limit were excluded from fits (dates marked by red points). Thin grey bars at base of cases graphs indicate region considered “flat”, with right end indicating the last point used for logistic fitting; averaging over “flat” values generates the thick grey bars to guide the eye. [See Supp. Mat. for additional information and for complete data sets for all metropolitan regions.] (B) Weighted linear regression results in fit to λexp for all metropolitan regions. (C) Effect of each variable on growth rate (i.e., Δλ values) for those regions with well-estimated case and death rates; white/yellow indicates a negative effect on λ, red indicates positive. While mobility reduction certainly caused the “flattening” of case incidence in every region by late-March, our results show (Figure 1C) that it likely played a key role in reducing the *rate* of growth in Boston, Washington, DC, and Los Angeles, but was too late, with respect to the sudden appearance of the epidemic, to have such an effect in, e.g., Detroit and Cleveland. In the most extreme example, Grand Rapids, MI, seems to have benefited from a late arriving epidemic, such that its growth (with a long doubling time of 7d) occurred almost entirely post-lockdown. Specific humidity, a measure of absolute humidity, has been previously shown to be inversely correlated with respiratory virus transmission (Lowen et al., 2007; Shaman and Kohn, 2009; Shaman, Goldstein, and Lipsitch, 2011; Kudo et al., 2019). Here, we found it to be a significant factor, but weaker than population density and mobility (Figure 1C). It could be argued that Dallas, Los Angeles, and Atlanta saw a small benefit from higher humidity at the time of the epidemic’s arrival, while the dry late-winter conditions in the Midwest and Northeast were more favorable to rapid transmission of SARS-CoV-2. ### Exponential growth rate of mortality as a dynamical, pan-epidemic, measure In the remainder of this report, we consider the exponential rate of growth (or decay) in local confirmed deaths due to COVID-19. The statistics of mortality is poorer compared to reported cases, but it is much less dependent on unknown factors such as the criteria for testing, local policies, test kit availability, and asymptomatic individuals (Pearce et al., 2020). Although there is clear evidence that a large fraction of COVID-19 mortality is missed in the official counts (e.g., Leon et al., 2020; Modi et al., 2020), mortality is likely less susceptible to rapid changes in reporting, and, as long as the number of reported deaths is a monotonic function of the actual number of deaths (e.g., a constant fraction, say 50%), the sign of the exponential growth rate will be unchanged, which is the crucial measure of the success in pandemic management. To minimize the impact of weekly changes, such as weekend reporting lulls, data dumps, and mobility changes from working days to weekends, we calculate the regression of ln [Mortality] over a 14-day interval, and assign this value, *λ*14(*t*), and its standard error to the last day of the interval. Since only the data for distinct 2-week periods are independent, we multiply the regression errors by ![Graphic][1] to account for correlations between the daily estimates. Together with a “rolling average” of the mortality, this time-dependent measure of the exponential growth rate provides, at any day, the most up-to-date information on the progression of the epidemic (Figure 2). ![Figure 2:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/07/02/2020.06.30.20143636/F2.medium.gif) [Figure 2:](http://medrxiv.org/content/early/2020/07/02/2020.06.30.20143636/F2) Figure 2: COVID-19 mortality incidence (7-day rolling average, left) and exponential growth rate (λ14, determined by regression of the logged mortality data over 14-day windows, right) for the four US counties with >2400 confirmed COVID-19 reported deaths (as of 8th June, 2020). In the following section, we consider a linear fit to *λ*14, to determine the statistically-significant external (non-biological) factors influencing the dynamics of local exponential growth and decline of the epidemic. We then develop a first-principles model for *λ*14 that allows for extrapolation of these dependencies to predict the impact of future changes in social mobility and climate. ### Epidemic mortality data explained by mobility, population, demographics, depletion of susceptible population and weather, throughout the first wave of COVID-19 We considered a spatio-temporal dataset containing 3933 estimates of the exponential growth measure, *λ*14, covering the three month period of 8 March 2020 – 8 June 2020 in the 187 US counties for which information on COVID-19 mortality and all potential driving factors, below, were available (the main barrier was social mobility information, which limited us to a set of counties that included 69% of US mortality). A joint, simultaneous, linear fit of these data to 12 potential driving factors (Table 1) revealed only 7 factors with *independent* statistical significance. Re-fitting only to these variables returned the optimal fit for the considered factors (BIC = −5951; *R*2 = 0.674). View this table: [Table 1:](http://medrxiv.org/content/early/2020/07/02/2020.06.30.20143636/T1) Table 1: Joint Linear Fit to λ14(t) data (Top). Any dependence with t-Statistic below 2.5σ is considered not statistically significant. Joint Linear Fit to λ14(t), including only statistically significant dependencies (Bottom). For all coefficients, the population-weighted baseline is subtracted from the linear variable. We found, not surprisingly, that higher population density, median age, and social mobility correlated with positive exponential growth, while population sparsity, specific humidity, and susceptible depletion correlated with exponentially declining mortality. Notably the coefficients for each of these quantities was in the 95% confidence intervals of those found in the analysis of metropolitan regions (and vice versa). Possibly the most surprising dependency was the negative correlation, at ≃ −3.7*σ* between *λ*14 and the *total* number of annual deaths in the county. In fact, this correlation was marginally more significant than a correlation with log(population), which was −3.3*σ*. One possible interpretation of this negative correlation is that the number of annual death is a proxy for the number of potential outbreak clusters. The larger the number of clusters, the longer it might take for the epidemic to spread across their network, which would (at least initially) slow down the onset of the epidemic. ### Nonlinear model To obtain more predictive results, we developed a mechanistic nonlinear model for infection (see Supplementary Material for details). We followed the standard analogy to chemical reaction kinetics (infection rate is proportional to the product of susceptible and infectious densities), but defined the generation interval (approximately the incubation period) through the excursion probability in a 1D random walk, modulated by an exponential rate of exit from the infected class. This approach resulted in a *renewal equation* (Heesterbeek and Dietz, 1996; Champredon and Dushoff, 2015; Champredon, Dushoff, and Earn, 2018), with a distribution of generation intervals that is more realistic than that of standard SIR/SEIR models, and which could be solved formally (in terms of the Lambert W function) for the growth rate in terms of the infection parameters: ![Formula][2] The model has four key dependencies, which we describe here, along with our assumptions about their own dependence on population, demographic, and climate variables. As mortality (on which our estimate of growth rate is based) lags infection (on which the renewal equation is based), we imposed a fixed time shift of Δ*t* for time-dependent variables: 1. We assumed that the susceptible population, which feeds new infections and drives the growth, is actually a sub-population of the community, consisting of highly-mobile and frequently interacting individuals, and that most deaths occurred in separate sub-population of largely immobile non-interacting individuals. Under these assumptions, we found (see Supp. Mat.) that the susceptible density, *S*(*t*), could be estimated from the cumulative per capita death fraction, *f**D*, as: ![Formula][3] where *D*tot is the cumulative mortality count, *N* is the initial population, and the initial density is *S*(0) = *k* PWD. 2. We assumed that the logarithm of the “rate constant” for infection, *β*, depended linearly on social mobility, *m*, specific humidity, *h*, population sparsity, *γ*, and total annual death, *A**D*, as: ![Formula][4] where a barred variable represents the (population-weighted) average value over all US counties, and where the mobility and humidity factors were time-shifted with respect to the growth rate estimation window: ℳ = *m* (*t* − Δ*t*) and ℋ = *h* (*t* − Δ*t*). 3. The characteristic time scale to infectiousness, *τ*, is intrinsic to the biology and therefore we assumed it would depend only on the median age of the population, *A*. We assumed a power law dependence: ![Formula][5] where we fixed the pivot age, *A*, to minimize the error in *τ*. 4. The exponential rate of exit from the infected class, *d*, was assumed constant, since we found no significant dependence for it on other factors in our analysis of US mortality. From the properties of the Lambert W function, when the infection rate or susceptibility density approach zero — through mobility restrictions or susceptible depletion — the growth rate will tend to *λ ≈* −*d*, its minimum value. With these parameterizations, we performed a nonlinear regression to *λ*14(*t*) using the entire set of US county mortality incidence time series (Table 2). Compared to the linear model of the previous section (Table 1b), the fit improved by 7.6*σ* (BIC = − 6008; *R*2 = 0.724), despite both having 9 free parameters. Through the estimated parameter values, the model makes predictions for an individual’s probability of becoming infectious, and the distributions of incubation period and generation interval, all as a function of the median age of the population (see Supplementary Material). View this table: [Table 2:](http://medrxiv.org/content/early/2020/07/02/2020.06.30.20143636/T2) Table 2: Best-fit parameters for the nonlinear model using parametrization defined in the text. The model was very well fit to the mortality growth rate measurements for counties with a high mortality (Figure 3). More quantitatively, the scatter of measured growth rates around the best-fit model predictions was (on average) only 13% larger than the measurement errors, independent of the population of the county 2. ![Figure 3:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/07/02/2020.06.30.20143636/F3.medium.gif) [Figure 3:](http://medrxiv.org/content/early/2020/07/02/2020.06.30.20143636/F3) Figure 3: Nonlinear model prediction (Eqn. 1, red) for the actual (blue) mortality growth rate, in the six counties with highest reporteddeath. Bands show 1-σ confidence region for both the model mean and the λ14 value. Importantly, when the model was calibrated on only a subset of the data — e.g., all but the final month for which mobility data is available — its 68% confidence prediction for the remaining data was accurate (Figure 4) given the known mobility and weather data for that final month. This suggests that the model, once calibrated on the first wave of COVID-19 infections, can make reliable predictions about the ongoing epidemic, and future waves, in the United States. ![Figure 4:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/07/02/2020.06.30.20143636/F4.medium.gif) [Figure 4:](http://medrxiv.org/content/early/2020/07/02/2020.06.30.20143636/F4) Figure 4: Forecasts of COVID-19 mortality (orange) — based on the best-fit nonlinear model to data prior to May 16th, 2020 — versus actual reported mortality (blue) for 4 large US counties. The 68% confidence range (orange regions) were determined from 100 random 60-day long simulations (see Supplementary Methods). The vertical red lines indicate June 21st. Forecasts for most US counties can be found at our online dashboard: [https://wolfr.am/COVID19Dash](https://wolfr.am/COVID19Dash) ### Predictions for relaxed mobility restrictions, the onset of summer, and the potential second wave Possibly the most pressing question for the management of COVID-19 in a particular community is the combination of circumstances at which the virus fails to propagate, i.e., at which the growth rate, estimated here by *λ*14, becomes negative (or, equivalently, the reproduction number *R**t* falls below one). In the absence of mobility restrictions this is informally called the threshold for “herd immunity,” which is usually achieved by mass vaccination (e.g., John and Samuel, 2000; Fine, Eames, and Heymann, 2011). Without a vaccine, however, ongoing infections and death will deplete the susceptible population and thus decrease transmission. Varying the parameters of the nonlinear model individually about their Spring 2020 population-weighted mean values (Figure 5) suggests that this threshold will be very much dependent on the specific demographics, geography, and weather in the community, but it also shows that reductions in social mobility can significantly reduce transmission prior the onset of herd immunity. ![Figure 5:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/07/02/2020.06.30.20143636/F5.medium.gif) [Figure 5:](http://medrxiv.org/content/early/2020/07/02/2020.06.30.20143636/F5) Figure 5: Dependence and 68% confidence bands of the mortality growth rate — as specified by the nonlinear model (Eqn. 1) — on various parameters for an “average county.” All parameters not being varied are fixed at their population-weighted mean values (as of 8th June, 2020): log10[PWD / km−2)] = 3.58, population sparsity = 0.188, COVID death fraction = 5.1 × 10−4 (510 deaths/million population), Median Age = 37.5 yr, log(Annual Death) = 4.04, social mobility ![Graphic][6], and specific humidity ![Graphic][7]. To determine the threshold for herd immunity in the absence or presence of social mobility restrictions, we considered the “average US county” (i.e., a region with population-weighted average characteristics), and examined the dependence of the growth rate on the cumulative mortality. We found that in the absence of social distancing, a COVID-19 mortality rate of 0.13% (or 1300 per million population) would bring the growth rate to zero. However, changing the population density of this average county shows that the threshold can vary widely (Figure 5). Examination of specific counties showed that the mortality level corresponding to herd immunity varies from 10 to 2500 per million people (Figure 9). At the current levels of reported COVID-19 mortality, we found that, as of June 22nd, 2020, only 128 *±* 55 out of 3142 counties (inhabiting 9.4 *±* 2.1% of US population) have surpassed this threshold at 68% confidence level (Figure 8). Notably, New York City, with the highest reported per capita mortality (2700 per million) has achieved mobility-independent herd immunity at the 10*σ* confidence level, according to the model (Figure 6). A few other large-population counties in New England, New Jersey, Michigan, Louisiana, Georgia and Mississippi that have been hard hit by the pandemic also appear to be at or close to the herd immunity threshold. This is not the case for most of the United States, however (Figure 8). Nationwide, we predict that COVID-19 herd immunity would only occur after a death toll of 340, 000 *±* 61, 000, or 1058 *±* 190 per million of population. ![Figure 6:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/07/02/2020.06.30.20143636/F6.medium.gif) [Figure 6:](http://medrxiv.org/content/early/2020/07/02/2020.06.30.20143636/F6) Figure 6: Nonlinear model prediction of the exponential growth rate, λ14, vs. cumulative COVID-19 mortality (top panels), assuming baseline social mobility ![Graphic][8], in the “average US county” (see caption of Figure 5) on the left, and New York City, on the right. The curves show 68% predictions for the nonlinear model (Table 2), while the points with errorbars are linear fits to all the data within bins of death fraction. The threshold for “herd immunity” (λ14 = 0) is reached at a mortality of approximately 1300 (1700) per million for an average county (NYC), but this would be higher in counties with more unfavorable values of the drivers. The eventual mortality burden of the average county will be determined by its path through a “phase space” of Daily vs. Total Mortality (bottom panel). An epidemic without intervention (red curves, with the particular trajectory starting at zero death shown in bold) will pass the threshold for herd immunity (1300 deaths per million; note that at zero daily deaths this is a fixed point) and continue to three times that value due to ongoing infections. A modest 33% reduction in social mobility (blue curves), however, leads to mortality at “only” the herd immunity level (the green disk). The black curve on the bottom right panel shows the 7-day rolling average of reported mortality for NYC, which appears to have “overshot” the “herd immunity threshold”. ![Figure 7:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/07/02/2020.06.30.20143636/F7.medium.gif) [Figure 7:](http://medrxiv.org/content/early/2020/07/02/2020.06.30.20143636/F7) Figure 7: Epidemic Phase Portraits for the same four counties as in Figure (4), similar to the Phase portrait in Figure (6). The blue curves are for the county’s average Social Mobility during Feb. 15 through June 12, 2020, while red curves/arrows are at normal (pre-covid) social mobility. The thick black curve is the 7-day rolling average of the official reported mortality, while the green disk shows the threshold for “herd immunity”. ![Figure 8:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/07/02/2020.06.30.20143636/F8.medium.gif) [Figure 8:](http://medrxiv.org/content/early/2020/07/02/2020.06.30.20143636/F8) Figure 8: Top: United States counties that have passed (blue), or are within (cyan), the threshold for “herd immunity” at the 1-σ level, as predicted by the nonlinear model. Bottom: Predicted confidence in the growth of COVID-19 outbreak (defined as predicted daily growth rate divided by its uncertainty), for all counties should they return today to their baseline (pre-COVID) social mobility. Counties that have approached the threshold of herd immunity have lower growth rates due to the depletion of susceptible individuals. ![Figure 9:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/07/02/2020.06.30.20143636/F9.medium.gif) [Figure 9:](http://medrxiv.org/content/early/2020/07/02/2020.06.30.20143636/F9) Figure 9: Histogram of reported COVID-19 deaths per million for all US counties, showing the proportion that have passed “herd immunity” threshold, according to fit of the nonlinear model. We found that the approach to the herd immunity threshold is not direct, and that social mobility restrictions and other non-pharmaceutical interventions must be applied carefully to avoid excess mortality beyond the threshold. In the absence of social distancing interventions, a typical epidemic will “overshoot” the herd immunity limit (e.g., Handel, Longini Jr, and Antia, 2007; Fung, Antia, and Handel, 2012) by up to 300%, due to ongoing infections (Figure 6). At the other extreme, a very strict “shelter in place” order would simply delay the onset of the epidemic; but if lifted (see Figures 6 and 7), the epidemic would again overshoot the herd immunity threshold. A modest level of social distancing, however — e.g., a 33% mobility reduction for the average US county — could lead to fatalities “only” at the level of herd immunity. Naturally, communities with higher population density or other risk factors (see Figure 5), would require more extreme measures to achieve the same. Avoiding the level of mortality required for herd immunity will require long-lasting and effective non-pharmaceutical options, until a vaccine is available. The universal use of face masks has been suggested for reducing the transmission of SARS-CoV-2, with a recent meta-analysis (Chu et al., 2020a) suggesting that masks can suppress the rate of infection by a factor of 0.07–0.34 (95% CI), or equivalently Δ ln(transmission) = −1.9 *±* 0.4 (at 1*σ*). Using our model’s dependence of the infection rate constant on mobility, this would correspond to an equivalent social mobility reduction of ![Graphic][9]. Warmer, more humid weather has also considered a factor that could slow the epidemic (e.g., Wang et al., 2020a; Notari, 2020; Xu et al., 2020a). Annual changes in specific humidity are ![Graphic][10] (Figure 10b in Supplementary Material), which can be translated in our model to an effective mobility decrease of ![Graphic][11]. Combining these two effects could, in this simple analysis, yield a modestly effective defense for the summer months:![Graphic][12]. Therefore, this could be a reasonable strategy for most communities to manage the COVID-19 epidemic at the aforementioned −33% level of mobility needed to arrive at herd immunity with the least excess death. More stringent measures would be required to keep mortality below that level. Of course, this general prescription would need to be fine-tuned for the specific conditions of each community. ![Figure 10:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/07/02/2020.06.30.20143636/F10.medium.gif) [Figure 10:](http://medrxiv.org/content/early/2020/07/02/2020.06.30.20143636/F10) Figure 10: (Top)(a)The 14-day rolling average of (population-weighted) social mobility (Fitzpatrick and Karen, 2020) for NYC, as well as all US counties considered here. For our analysis, we only use “work places” as an indicator, as others do not appear to show any independent correlation with λ14(t). (Bottom)(b) 28-day moving average of historical annual specific humidity in the United States (weighted-averaged by population). ## Discussion and Conclusions By simultaneously considering the time series of mortality incidence in every US county, and controlling for the time-varying effects of local social distancing interventions, we demonstrated for the first time a dependence of the epidemic growth of COVID-19 on population density, as well as other climate, demographic, and population factors. We further constructed a realistic, but simple, first-principles model of infection transmission that allowed us to extend our heuristic linear model of the dataset into a predictive nonlinear model, which provided a better fit to the data (with the same number of parameters), and which also accurately predicted late-time data after training on only an earlier portion of the data set. This suggests that the model is well-calibrated to predict future incidence of COVID-19, given realistic predictions/assumptions of future intervention and climate factors. We summarized some of these predictions in the final section of Results, notably that only a small fraction of US counties (with less than ten percent of the population) seem to have reached the level of herd immunity, and that relaxation of mobility restrictions without counter-measures (e.g., universal mask usage) will likely lead to increased daily mortality rates, beyond that seen in the Spring of 2020. In any epidemiological model, the infection rate of a disease is assumed proportional to population density (Jong, Diekmann, and Heesterbeek, 1995), but, to our knowledge, its explicit effect in a real-world respiratory virus epidemic has not been demonstrated. The universal reach of the COVID-19 pandemic, and the diversity of communities affected have provided an opportunity to verify this dependence. Indeed, as we show here, it must be accounted for to see the effects of weaker drivers, such as weather and demographics. A recent study of COVID-19 in the United States, working with a similar dataset, saw no significant effect due to population density (Hamidi, Sabouri, and Ewing, 2020), but our analysis differs in a number of important ways. First, we have taken a dynamic approach, evaluating the time-dependence of the growth rate of mortality incidence, rather than a single static measure for each county, which allowed us to account for the changing effects of weather, mobility, and the density of susceptible individuals. Second, we have included an explicit and real-time measurement of social mobility, i.e., cell phone mobility data provided by Google (Fitzpatrick and Karen, 2020), allowing us to control for the dominant effect of intervention. Finally, and perhaps most importantly, we calculate for each county an estimate of the “lived” population density, called the population-weighted population density (PWD) (Craig, 1985), which is more meaningful than the standard population per political area. As with any population-scale measure, this serves as a proxy — here, for estimating the average rate of encounters between infectious and susceptible people — but we believe that PWD is a better proxy than standard population density, and it is becoming more prevalent, e.g., in census work (Dorling and Atkins, 1995; Wilson, 2012). We also found a significant dependence of the mortality growth rate on specific humidity (although since temperature and humidity were highly correlated, a replacement with temperature was approximately equivalent), indicating that the disease spread more rapidly in drier (cooler) regions. There is a large body of research on the effects of temperature and humidity on the transmission of other respiratory viruses (Moriyama, Hugen-tobler, and Iwasaki, 2020; Kudo et al., 2019), specifically influenza (Barreca and Shimshack, 2012). Influenza was found to transmit more efficiently between guinea pigs in low relative-humidity and temperature conditions (Lowen et al., 2007), although re-analysis of this work pointed to absolute humidity (e.g., specific humidity) as the ultimate controller of transmission (Shaman and Kohn, 2009). Although the mechanistic origin of humidity’s role has not been completely clarified, theory and experiments have suggested a snowballing effect on small respiratory droplets that cause them to drop more quickly in high-humidity conditions (Tellier, 2009; Noti et al., 2013; Marr et al., 2019), along with a role for evaporation and the environmental stability of virus particles (Morawska, 2005; Marr et al., 2019). It has also been shown that the onset of the influenza season (Shaman et al., 2010; Shaman, Goldstein, and Lipsitch, 2011) — which generally occurs between late-Fall and early-Spring, but is usually quite sharply peaked for a given strain (H1N1, H3N2, or Influenza B) — and its mortality (Barreca and Shimshack, 2012) are linked to drops in absolute humidity. It is thought that humidity or temperature could be the annual periodic driver in the resonance effect causing these acute seasonal out-breaks of influenza (Dushoff et al., 2004; Tamerius et al., 2011), although other influences, such as school openings/closings have also been implicated (Earn et al., 2012). While little is yet known about the transmission of SARS-CoV-2 specifically, other coronaviruses are known to be seasonal (Moriyama, Hugentobler, and Iwasaki, 2020; Neher et al., 2020), and there have been some preliminary reports of a dependence on weather factors (Xu et al., 2020b; Schell et al., 2020). We believe that our results represent the most definitive evidence yet for the role of weather, but emphasize that it is a weak, secondary driver, especially in the early stages of this pandemic where the susceptible fraction of the population remains large (Baker et al., 2020). Indeed, the current early-summer rebound of COVID-19 in the relatively dry and hot regions of the Southwest suggests that the disease spread will not soon be controlled by seasonality. We developed a new model of infection in the framework of a renewal equation (see, e.g., Champredon, Dushoff, and Earn, 2018 and references therein), which we could formally solve for the exponential growth rate. The incubation period in the model was determined by a random walk through the stages of infection, yielding a non-exponential distribution of the generation interval, thus imposing more realistic delays to infectiousness than, e.g., the standard SEIR model. In this formulation, we did not make the standard compartmental model assumption that the infection of an individual induces an autonomous, sequential passage from exposure, to infectiousness, to recovery or death; indeed, the model does not explicitly account for recovered or dead individuals. This freedom allows for, e.g., a back passage from infectious to noninfectious (via the underlying random walk) and a variable rate of recovery or death. We assumed only that the exponential growth in mortality incidence matched (with delay) that of the infected incidence — the primary dynamical quantity in the renewal approach — and we let the cumulative dead count predict susceptible density — the second dynamical variable in the renewal approach — under the assumption that deaths arise from a distinct subset of the population, with lower mobility behavior than those that drive infection (see Supplementary Material). Therefore, we fitted the model to the (rolling two-week estimates of the) COVID-19 mortality incidence growth rate values, *λ*14, for all counties and all times, and used the per capita mortality averaged over that period, *f**D*, to determine susceptible density. Regression to this nonlinear model was much improved over linear regression, and, once calibrated on an early portion of the county mortality incidence time series, the model accurately predicted the remaining incidence. Because we accounted for the precise effects of social mobility in fitting our model to the actual epidemic growth and decline, we were then able to, on a county-by-county basis, “turn off” mobility restrictions and estimate the level of cumulative mortality at which SARS-CoV-2 would fail to spread even without social distancing measures, i.e., we estimated the threshold for “herd immunity.” Meeting this threshold prior to the distribution of a vaccine should not be a goal of any community, because it implies substantial mortality, but the threshold is a useful benchmark to evaluate the potential for local outbreaks following the first wave of COVID-19 in Spring 2020. We found that a few counties in the United States have indeed reached herd immunity in this estimation — i.e., their predicted mortality growth rate, assuming baseline mobility, was negative — including counties in the immediate vicinity of New York City, Detroit, New Orleans, and Albany, Georgia. A number of other counties were found to be at or close to the threshold, including much of the greater New York City and Boston areas, and the Four Corners, Navajo Nation, region in the Southwest. All other regions were found to be far from the threshold for herd immunity, and therefore are susceptible to ongoing or restarted outbreaks. These determinations should be taken with caution, however. In this analysis, we estimated that the remaining fraction of susceptible individuals in the counties at or near the herd immunity threshold was in the range of 0.001% to 5% (see Supplementary Materials). This is in strong tension with initial seroprevalence studies (Rosenberg et al., 2020; Havers et al., 2020a) which placed the fraction of immune individuals in New York City at 7% in late March and 20% in late April, implying that perhaps 75% of that population remains susceptible today. We hypothesize that the pool of susceptible individuals driving the epidemic in our model is a subset of the total population — likely those with the highest mobility and geographic reach — while a different subset, with very low baseline mobility, contributes most of the mortality (see Supplementary Material). Thus, the near total depletion of the susceptible pool we see associated with herd immunity corresponds to the highly-mobile subset, while the low-mobility sub-set could remain largely susceptible. One could explicitly consider such factors of population heterogeneity in a model — e.g., implementing a saturation of infectivity as a proxy for a clustering effect (Capasso and Serio, 1978; Mollison, 1985; De Boer, 2007; Farrell et al., 2019) — but we found (in results not shown) that the introduction of additional of parameters left portions of the model unidentifiable. Despite these cautions, it is interesting to note that the epidemic curves (mortality incidence over time) for those counties that we have predicted an approach to herd immunity are qualitatively different than those we have not. Specifically, the exponential rise in these counties is followed by a peak and a sharp decline — rather than the flattening seen in most regions — which is a typical feature of epidemic resolution by susceptible depletion. At the time of this writing, in early Summer 2020, confirmed cases are again rising sharply in many locations across the United States — particularly in areas of the South and West that were spared significant mortality in the Spring wave. The horizon for an effective and fully-deployed vaccine still appears to be at least a year away. Initial studies of neutralizing antibodies in recovered COVID-19 patients, however, suggest a waning immune response after only 2–3 months, with 40% of those that were asymptomatic becoming seronegative in that time period (Long et al., 2020). Although the antiviral remdesivir (Beigel et al., 2020; Grein et al., 2020; Wang et al., 2020b) and the steroid Dexametha-sone (Horby et al., 2020) have shown some promise in treating COVID-19 patients, the action of remdesivir is quite weak, and high-dose steroids can only be utilized for the most critical cases. Therefore, the management of this pandemic will likely require non-pharmaceutical intervention — including universal social distancing and mask-wearing, along with targeted closures of businesses and community gathering places — for years in the future. The analysis and prescriptive guidance we have presented here should help to target these approaches to local communities, based on their particular demographic, geographic, and climate characteristics, and can be facilitated through our online simulator dashboard. Finally, although we have focused our analysis on the United States, due to the convenience of a diverse and voluminous data set, the method and results should be applicable to any community worldwide, and we intend to extend our analysis in forthcoming work. ## Data Availability All the data used in this work have been collected from the following publicly available repositories: https://www.wolfram.com/covid-19-resources/ https://www.google.com/covid19/mobility/ https://github.com/nytimes/covid-19-data https://github.com/CSSEGISandData/COVID-19 https://www2.census.gov/programs-surveys/popest/datasets/ https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html ftp.ncdc.noaa.gov https://ghsl.jrc.ec.europa.eu/ghs_pop2019.php https://www.census.gov/data/tables/time-series/demo/popest/ [https://nafshordi.com/covid](https://nafshordi.com/covid) [https://wolfr.am/COVID19Dash](https://wolfr.am/COVID19Dash) ## Supplementary Material ### Data & Methods #### Datasets, Resources, Definitions All data for cases and mortality, demographics, mobility, and weather were incorporated into the publicly-available Wolfram COVID-19 Dataset and the Wolfram|Alpha Knowledgebase (*COVID-19 Data & Resources* 2020). COVID-19 confirmed case and mortality data were obtained from both the NYTimes and Johns Hopkins University Github repositories (Johns Hopkins University, CSSE, 2020; The New York Times, 2020); the former was used for the analysis initial case data in metropolitan regions, while the average of the two data sets was used for all other analyses. In each case, daily confirmed counts were utilized. Demographic data by county, including people per household, estimated 2019 population, annual births, and annual deaths were obtained from the US Census 2019 *County Population Estimates* data set (United States Census, 2019a). Median ages were determined from the US Census 2018 *County Characteristics Resident Population Estimates* data set (United States Census, 2018). For the Median Age, Wolfram|Alpha has curated the raw data from *United States Census Bureau, American Community Survey 5-Year Estimates: B01002, the Median Age By Sex, American FactFinder*; for the People per Household and Annual Death, the source of curated data is *United States Census Bureau, State & County QuickFact*s. County outline polygons were obtained from the US Census 2019 *TIGER/Line shapefiles* database (United States Census, 2019c). Local weather data (Figure 10 was obtained from the NOAO *Global Surface Summary of the Day* (GSOD) database (National Oceanic and Atmospheric Observatory, 2020). The nearest WBAN station with daily dew point and pressure values (for calculation of specific humidity), and daily average temperature was chosen for each county or metropolitan region. Weather data was averaged over a two-week period for *λ*14, and over a window equal to the growth period for metropolitan regions. Google’s *COVID-19 Community Mobility Reports* dataset (Fitzpatrick and Karen, 2020), specifically “Workplace mobility,” was used to estimate the human social mobility in each county (Figure 10). Population-weighted population density (or, population weighted density, PWD) (Craig, 1985; Wilson, 2012; Dorling and Atkins, 1995), was calculated using the Global Human Settlement Population raster dataset (European Commission Joint Research Centre, 2019), which contains 250 m-resolution population values worldwide, taken from census data. The value of PWD for a county — or for a set of counties, in the metropolitan region analysis — was calculated as the population-weighted average of density over all (250 m)2-area pixels contained within the region, i.e., ![Formula][13] where *p**j* is the value (i.e., the population) of the *j*th pixel, *a**j* = 0.0625 *k*m2 is the area of each pixel (the GHS-POP image uses the equal-area Molleweide projection), and ∑*i**pi* is the total population of the region. This measure has also been called the *lived population density* because it is the population density experienced by the average person. In high density counties, the population weighted density PWD is close to the mean density of the county *D* = Pop*/*Area, suggesting a uniform distribution of population (see Figure 11). However, in lower density counties, the mean density is much lower than the population weighted density, due to heterogeneous dense pockets of population amidst vast empty spaces outlined by political boundaries. To represent the degree to which the population density changes across the region (county or metropolitan region) we define the *population sparsity index, γ*. Assuming that the population-weighted population density declines approximately as a power law with “pixel” area, PWD ![Graphic][14], we define: ![Figure 11:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/07/02/2020.06.30.20143636/F11.medium.gif) [Figure 11:](http://medrxiv.org/content/early/2020/07/02/2020.06.30.20143636/F11) Figure 11: Comparison of the distribution of Population Weighted Density with the Crude population density of US counties: (a)(Top): Histograms, (b)(Bottom): Relative distributions: The blue line shows the one-to-one correspondence, while the orange line is the best-fit power-law PWD250 (km−2) ≃ 430× [*D*(km−2)] 1*/*4. ![Formula][15] In other words we estimate the assumed power-law decline using two data points. The distribution of *γ* and its correlation with county population and population density are shown in Figure (12). We can see that *γ* ranges from 0.09 (i.e., very uniform) for the most populous/dense counties to 0.88 (i.e. very sparse) for least populated/dense counties. For reference, the value of *γ* for New York City is 0.14. ![Figure 12:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/07/02/2020.06.30.20143636/F12.medium.gif) [Figure 12:](http://medrxiv.org/content/early/2020/07/02/2020.06.30.20143636/F12) Figure 12: (a)(Top)Distribution of Population Sparsity Index γ (b)(Middle) its correlation with total population of the county (c)(Bottom) and its average population density #### Initial growth of confirmed cases for metropolitan regions For each of the top 100 metropolitan regions (United States Census, 2019b), a logarithmic-scale population heat map, windowed from the full GHS-POP raster image, was used to select a minimal connected set of US counties enclosing the region of population enhancement. In this process, overlap and merger reduced the total number of metropolitan regions under consideration to 89. As is discussed in the main text, nearly every metropolitan region saw, in mid-March 2020, an exponential increase of daily confirmed cases, followed by a flat/plateau period of nearly constant daily confirmed cases. In a few cases, the second phase — primarily caused by the country-wide lockdown — lasted only days or weeks (possibly signaling a depletion of the susceptible population, see discussion in main text), but for most metropolitan regions the plateau persisted for months (indeed, persists or is again increasing at the time of this writing). Thus, the initial value of the exponential growth rate, *λ*, of daily confirmed cases could be reliably and automatically estimated by fitting the case numbers to a logistic function ![Formula][16] where *t* represents the transition time from exponential growth to a constant, *f*max is the plateau value in case numbers, and *f*logistic(*t*) *∝* exp[*λt*] for *t ≪ t*. Fits were performed on the logarithm of the case numbers, yielding the maximum likelihood estimation of parameters under the assumption lognormally-distributed errors (an analysis of the fit residuals, not shown, confirmed this assumption: case number fluctuations exhibit a variance far in excess of Poisson noise, but are well modeled by a log-normal probability density function with constant width), and associated estimates of the variance in each parameter. To avoid polluting the exponential growth phase with singular early cases, a “detection limit” of 3 was imposed, and all daily case values less than or equal to that limit were ignored in fitting. The only manual intervention required for this fit was the specification of the upper limit of its range, i.e., the *end* of the plateau region, for each data set. To analyze the effect of demographic, population, mobility, and weather variables on this initial growth rate, we perform a weighted linear regression to the lambda values (and their standard errors) of the 89 metropolitan regions. To choose representative cities for the visual examples in Figures 1A and 1C, we performed an additional logistic fit to the mortality incidence data of each region and retained for Figure 1C only those that had (1) less than 15% error in both growth rates, and (2) |*λ*case −*λ*death |< 0.15*d*−1. This was done in an effort to specifically comment on or highlight only those cities for which the growth rate was accurately determined, and was well correlated with the more reliable measure, mortality growth, that we used for the remainder of the analysis. #### Linear Dynamical Model of Mortality Data A standard weighted least squares analysis was performed on the measured exponential growth rate, *λ*14, as a function of demographic, mobility, population and weather variables, with weights equal to inverse root of the estimated variance. #### Nonlinear Model We construct a model where, in the standard analogy to chemical reaction kinetics, the incidence of infections per unit area at time, *i*(*t*), is proportional to the product of the density of susceptible individuals, *S*(*t*), and the density of infected individuals, *I*(*t*). But, we allow for the rate constant for infection3 in the encounter, *β*, to depend on the infected individual’s “stage” of infection, *C*, with *C* = 0 immediately following infection. The incidence then has the form: ![Formula][17] where ℐ (*C, t*) is the density of infected per stage at time *t*, and the first equality expresses that we neglect changes to the susceptible population by all means other than infection. The density of infected individuals is found by integration over the stages of infection, ![Formula][18] If the rate constant were taken to be independent of stage, i.e., ![Graphic][19], we would obtain the familiar expression ![Graphic][20]. We will assume spatial homogeneity and that the total density of individuals is constant and equal to *S*(0) for a particular region, but, that the density could vary when comparing different regions. We assume that an infected individual’s evolution through the stages of infection, *C*, follows a Gaussian random walk in time, but modulated by an exponential rate, *d*, of death or recovery. Therefore we have ![Formula][21] where *a* is the “age” of an infection (time since infection), and the probability density function for the stage at a given age is given by ![Formula][22] where *τ* is the characteristic time scale of the random walk4. Integrating the expression for ℐ (*C, t*) over all stages and taking the derivative with respect to time yields the familiar expression *İ*(*t*) = *i*(*t*) - *dI*(*t*), showing that the model reduces to the SIR model if a stage-independent rate constant, ![Graphic][23], is assumed. As we show here, using the random walk to specify the dependence of infection stage on time allows for both a non-exponential distribution of delays to infectiousness (which is more realistic than that of the simplest model with incubation, the SEIR model) and a formal solution for the exponential growth rate. Inserting the expression for ℐ (*t, C*) into the incidence equation yields ![Formula][24] which is in the form of a *renewal equation* (Heesterbeek and Dietz, 1996; Champredon and Dushoff, 2015; Champredon, Dushoff, and Earn, 2018), with the bracketed expression being the expected infectivity of an individual with infection age *a*. To obtain the simplest nontrivial incubation period, we assume that ![Graphic][25]— where Θ(*x*) is the Heaviside step function — meaning that an infected individual is only infectious once they reach stage *C* = 1, and the infection rate constant is otherwise unchanging. This implies that the incidence is ![Formula][26] where ![Formula][27] is the probability that an individual infected *a* time units ago is infectious. If we now assume that the density of susceptibles is constant ![Graphic][28] over some interval of time, and that the incidence grows (or decays) exponentially in that interval, *i*(*t*) = *A*e*λt*, we find ![Formula][29] which, assuming *λ* + *d >* 0 (i.e., the exponential growth rate cannot go below −*d*), can be integrated to obtain ![Formula][30] This expression for (*λ* + *d*) has a formal solution in terms of the *Lambert W-function*, with simple asymptotic forms: ![Formula][31] For the early stages of the epidemic, when we can assume that the population of susceptibles is approximately constant and large, we see that the growth rate depends approximately linearly on the square of the logarithm of the density. In later stages, when either the base contact rate declines due to social distancing interventions, or the population of susceptibles decreases, we see that the exponential rate takes the value *λ ≈* −*d*. In practice, we utilize the exact Lambert W-function expression as our “nonlinear model” for fitting *λ*14, where we parameterize *β* and *τ* by the demographic, population, and weather variables (see main text). To estimate the susceptible density, ![Graphic][32], in this procedure we must use the reported mortality statistics. Thus far we have not specified the dynamics of death. We now make the assumption that the probability of death increases proportionally to the number of exposures an individual experiences. As we prove in a separate section, below, this implies that the susceptible density is related to the fraction of dead in the community, *f**D* = *D*tot*/N* (where *D*tot is the cumulative mortality and *N* is the total population), by *S*(*t*) = *S*(0) exp [−*C**D* *f**D*]. The *basic reproduction number, R*, and the distribution of *generation intervals, g*(*t**g*), are defined (Champredon and Dushoff, 2015; Nishiura, 2010) through the function *F*(*a*): ![Formula][33] The generation interval (or, generation time), *t**g*, is the time between infections of an infector-infectee pair, and is often estimated from clinical data by the *serial interval*, which is the time between the start of symptoms (Britton and Scalia Tomba, 2019), and the basic reproduction number is the average number of infectees produced by a single infected individual, assuming a completely susceptible population. These quantities can be calculated exactly for our model, as ![Formula][34] and ![Formula][35] where the expected value and variance of the generation interval are then: ![Formula][36] ## Extended Results and Analysis ### Relation between the remaining susceptible density, *S*(*t*) and the death fraction, *f**D*(*t*) In epidemic models the infection of susceptible individuals is typically determined by ![Formula][37] where *I**** is the density of infectious (contagious) individuals, and for our model, *βSI**** is the right-hand side of Eqn. 7. This can be solved, formally, as: ![Formula][38] Alternatively, the susceptible density can be expressed in terms of the cumulative number of infected individuals, *I*tot, i.e., ![Formula][39] where *f**I* (*t*) = *I*tot*/N*, with *N* the total population. When fitting the exponential growth rate of mortality, *λ*14(*t*) to our nonlinear model, Eqn. (16) (see main text), we must estimate the value of the susceptible density driving growth at that time. Without any reliable information about the true infected or infectious populations, we must do so using the mortality statistics. We show here how the previous two equations can be used, along with reasonable assumptions about distinct sub-populations driving infection and death, to determine a relationship between the reported cumulative mortality (per capita) and the remaining susceptible density. Our basic assumption is that there are two different categories of susceptible individuals underlying the dynamics of the epidemic: (A) highly mobile individuals with a large geographic reach that frequently interact with other individuals (in particular, infectious individuals) and thus drive the dynamics of infection (these could be termed “super-spreaders” (Liu, Eggo, and Kucharski, 2020)); and (B) essentially non-mobile individuals that have quite rare contacts with infectious individuals, but have a much higher probability of death once infected, and therefore make up the majority of the mortality burden. The dynamics of each susceptible population is governed by an equation of the form in Eqn. (21), with a common density of infectious individuals, *I****, but with different rate constants, *β**A* *≫ β**B*. From Eqn. (22), we see that the susceptible densities of the two populations are then related, at any time, by: ![Formula][40] Expressing the non-mobile population in terms of the cumulative fraction infected, we have ![Formula][41] and, assuming that the infection fatality rate (IFR) is a constant factor, *f**D*(*t*) = IFR × *f**I* (*t* −Δ*t*), where Δ*t* is the delay from infection to death, we can write: ![Formula][42] Finally, having assumed that the ratio of infection rates is large, we can approximate this as: ![Formula][43] The “A” category of individuals, as defined above, are exactly those individuals driving the infection in our model (and, presumably, in the real world), and, therefore, the susceptible density *S**A* is exactly that which must be estimated for Eqn. (16). On the other side, with people aged 65 and over accounting for ∼ 80% of COVID-19 deaths, and with approximately ∼45% of deaths linked to nursing homes, the mortality statistics are clearly tracing individuals similar to category “B.” Therefore, we use this relationship, ![Formula][44] to estimate the susceptible density in terms of the reported per capita mortality, where we assume *S*(0) is proportional to the population weighted density (PWD). We also considered the standard approach, in which the population is a single homogeneous group. In that case, the susceptible density could be estimated as ![Formula][45] In testing both models, we found that the two-component population scenario was preferred by the data at the ∼10*σ* confidence level, with the homogeneous population model failing to capture the observed dependence of the growth rate on the per capita mortality (Figure 13). ![Figure 13:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/07/02/2020.06.30.20143636/F13.medium.gif) [Figure 13:](http://medrxiv.org/content/early/2020/07/02/2020.06.30.20143636/F13) Figure 13: Comparison of hypotheses regarding the relationship between the susceptible fraction, S(t)/S(0), and the dead fraction, fD (t). A model in which deaths are suffered by a largely immobile population, while the infection is driven by a mobile category of individual (bottom) was preferred to one with a single homogenous population (top) at the 9σ confidence level. Similar to Figure 6, the points show the best-fit linear model prediction for NYC, fitted independently in different bins of mortality to population ration, while the lines show best-fit ± 1σ nonlinear models. Note that while we show the nonlinear model predictions for a county similar to NYC, we use all US data to find the best fits. The broader implications of our assumption of two populations is that the required proportion of individuals with immunity for “herd immunity” to take effect, is lower than population with homogeneous mobility characteristics, i.e., the epidemic will slow as a significant proportion of the “super-spreader” category have been infected (category A, above), regardless of the level of infection and immunity in the rest of the population. Indeed, the effect of population heterogeneity on lowering the “herd immunity” threshold for COVID-19 was recently noted (Britton, Ball, and Trapman, 2020), and will be important in interpreting the results of randomized serology tests across the entire population (Havers et al., 2020b). ### Incubation Time The nonlinear epidemic model described above posits that the incubation of SARS-COV2 virus within an infected individual can be modelled by a stochastic random walk starting at zero, with excursions beyond *±* 1 corresponding to episode(s) of infectiousness. This makes our model distinct from the standard SE*m*I*n*R compartmental models (see, e.g., (Champredon, Dushoff, and Earn, 2018)), where the progress of the disease is only in one direction — *E*1 → *E*2 →… →*I*1 →*I*2 →… — while in our model (Figure 14), the individual can jump back and forth between different stages (with the obvious exception of Death), with a constant exit rate of *d* for quarantine, recovery, or death. This can be described using a (leaky) diffusion equation: ![Figure 14:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/07/02/2020.06.30.20143636/F14.medium.gif) [Figure 14:](http://medrxiv.org/content/early/2020/07/02/2020.06.30.20143636/F14) Figure 14: The stages of COVID-19 infection, as a continuous variable C. We model the disease as a random walk, starting at C = 0, and a uniform exit rate (due to either quarantine or recovery). The curves show the resulting distribution in C, during the growing (early) and decaying (late) epidemic. Note that the Hospitalization and Death are not directly modelled in the Nonlinear model, as they should have a small effect on the spread of the epidemic. ![Formula][46] Based on this picture, and the best-fit parameters to the US county mortality data (Table 2), we can infer the probabilities associated with the different stages of the disease. For example, by looking at the steady-state solutions of Equation (29), we can compute the probability that an exposed individual (who starts at *C* = 0) will ever become infectious (i.e., make it beyond *C* = *C*inf = 1): ![Formula][47] This is plotted as a function of the median age of the community in Figure (15a). For example, for the median age of all US counties, *A* = 37.4-yr, we get: ![Figure 15:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/07/02/2020.06.30.20143636/F15.medium.gif) [Figure 15:](http://medrxiv.org/content/early/2020/07/02/2020.06.30.20143636/F15) Figure 15: (Top)(a) The probability that an individual exposed to the virus will ever become infectious. (Middle)(b) The probability distribution for incubation period for onset of virus shedding. (Bottom)(c) Distribution of the generation Interval, g(a), i.e., the time from an individual’s infection to them infecting another. ![Formula][48] i.e., less than 12% of exposed individuals will ever be able to infect others, although this fraction increases in older communities. Next, we can compute the distribution of times for the onset of infectiousness, i.e., the incubation period. This can be done by using a first crossing probability of a random walk, which we did by solving Equation (29) using a discrete Fourier series in the (0, +1) interval. The resulting probability density is shown in Figure (15b), again showing a shorter incubation period in older communities. Finally, we can compute the probability density function for the generation interval, *g*(*t**g*), defined in Equation (19) (Figure 15c). This shows a similar qualitative dependence on age as the incubation period, but the median incubation period is, as expected, shorter than the generation interval for each age group. Using Eqn. (20) and our parameterization of *τ*, we find a mean generation interval of ![Formula][49] This estimate is much longer than those found by tracking the serial interval (time from between the start of symptoms for an infector-infectee pair) in COVID-19 patients (Ganyani et al., 2020; Nishiura, Linton, and Akhmetzhanov, 2020), which are on the order of 5–10 d. It is possible that the long tail of these distributions, generated by the slow asymptotic exponential decay at rate *d ≈* 0.06*d*−1, raises the mean generation interval, while a clinical study, is necessarily biased toward shorter serial intervals. ### Error Diagnostics and Forecasting COVID-19 Mortality One of the most pressing questions in any exercise in physical modelling is whether we have a good understanding of the uncertainty in the predictions of the model. While we have an estimate of the measurement uncertainties for the mortality growth rates, *λ*14, which we discussed in the main text, we also should characterize whether the deviation of the best-fit model from the measurements are consistent with statistical errors. To evaluate this, we can look at the average of the ratio of the variance of the model residuals to that of the measurement errors, otherwise known as reduced *χ*2, or ![Graphic][50]. This is shown in Figure (16), demonstrating that we see no systematic error in model that is significantly bigger than statistical errors, across counties with different populations. ![Figure 16:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/07/02/2020.06.30.20143636/F16.medium.gif) [Figure 16:](http://medrxiv.org/content/early/2020/07/02/2020.06.30.20143636/F16) Figure 16: Root Mean Square of the Ratio of the Simplified Nonl pinear model residual to measurement errors (i.e. the ![Graphic][51]), as a function of the population of the county. We see that the residuals are consistent with measurement errors. As another consistency check, Table (3) examines whether the parameters of the model change significantly from urban counties with large, uniform populations, to rural counties with a small and more sparse population (Figures 11-12). From counties with enough COVID mortality data, roughly those with population ≳106 inhabit half of the total population, which we chose as our threshold, separating large from small counties. We notice no statistically significant difference, and Table (3) even suggests that Fisher errors quoted here might be overestimating the true errors. This comparison brings further confidence in the universality of the nonlinear model across geography and demography. View this table: [Table 3:](http://medrxiv.org/content/early/2020/07/02/2020.06.30.20143636/T3) Table 3: Comparison of nonlinear model parameters between small (population < 1 million) with large (population > 1 million) counties. We see no significant statistical difference, as demonstrated by the values in the last column that remain below 1. Note that, in contrast to Table (2), we use temperature rather than specific humidity (CT rather than Cℋ), as the latter was not available for some small counties. Nevertheless, the parameters remain also statistically consistent with Table (2). On average, we find that (either county-weighted or population-weighted) ![Graphic][52], suggesting that the model errors are only 13% bigger than statistical errors. We further compare the model-prediction vs measured mortality growth rate in Figure (17) for all our available data. We find that the 1-*σ* error in the model prediction (in excess measurement errors) is on average *σ*14 = 0.0180, i.e. 1.8% error in the daily mortality growth rate. This is shown in Figure (17) as the red region, which compares the model prediction with the observed mortality growth rates. We can also see that there appears to be no significant systematic deviation from the predictions, at least for *λ*14 < 0.23/day. ![Figure 17:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/07/02/2020.06.30.20143636/F17.medium.gif) [Figure 17:](http://medrxiv.org/content/early/2020/07/02/2020.06.30.20143636/F17) Figure 17: Observed versus Best-Fit model prediction for bins of λ14. The points show the mean of measured I7 within each predicted bin, as well as the error on the mean. The red region shows the mean excess model error, on top of measurement uncertainties. Given an understanding of the physical model and its uncertainties, we can provide realistic simulations to forecast the future of mortality in any community, similar to those provided in the main text (Figure 4), which can be made on-demand using our online dashboard: [https://wolfr.am/COVID19Dash](https://wolfr.am/COVID19Dash). In order to perform these simulations, we follow these simple steps. To predict the daily mortality on day *t* + 1, *D*(*t* + 1), we use the prior 13 days of *D*(*t*), as well as the total mortality up to that point: 1. Use Equation (1), plugging in prior total mortality, county information, weather, mobility and parameters in Table 2 to find *λ*14. Every simulation uses a random realization of model parameters (from their posterior fits), which remain fixed through that simulation. 2. Add the random model uncertainty to *λ*14 using: ![Formula][53] where *g**t ′* s are random independent numbers drawn from a unit-variance normal distribution. This captures the model uncertainty mentioned above, while ensuring that it remains correlated across the 14 days that are used to define *λ*14. 3. Having fixed the logarithmic slope for daily mortality *λ*14, find the best-fit intercept and its standard error for ln[*D*(*t*′) + 1*/*2] for the preceding 13 days, i.e. *t* −12 *≤ t ′ ≤t*, which can then be used to find a random realization for ln[*D*(*t* + 1) + 1*/*2] 4. Advance to next day and return to step 1. ## Acknowledgement We are indebted to helpful comments and discussions by our colleagues, in particular Bruce Bassett, Ghazal Geshnizjani, David Spergel, and Lee Smolin. NA is partially supported by Perimeter Institute for Theoretical Physics. Research at Perimeter Institute is supported in part by the Government of Canada through the Department of Innovation, Science and Economic Development Canada and by the Province of Ontario through the Ministry of Colleges and Universities. ## Footnotes * 1 The doubling time is ln 2 divided by the exponential growth rate. * 2 See Supplementary Material for more detailed discussion of Error Diagnostics. * 3 In a physical picture of collisions, the rate constant of infection is *β*(*C*) = ⟨*σv*⟩eff (*C*), i.e., the scattering cross section of an encounter between a susceptible individual and an infectious individual in stage *C, σ*, multiplied by their relative velocity, *v*. * 4 More precisely: *C* is the absolute value of the position of a 1D random walker, taking one step every Δ*t*, with step size drawn from a normal distribution with mean zero and variance Δ*t/τ*. The variance of the walker position at time *t* is then *t/τ* * Received June 30, 2020. * Revision received June 30, 2020. * Accepted July 2, 2020. * © 2020, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), CC BY-NC 4.0, as described at [http://creativecommons.org/licenses/by-nc/4.0/](http://creativecommons.org/licenses/by-nc/4.0/) ## Bibliography 1. Baker, Rachel E et al. (2020). “Susceptible supply limits the role of climate in the early SARS-CoV-2 pandemic”. In: Science. 2. Barreca, Alan I. and Jay P. Shimshack (Oct. 2012). “Absolute Humidity, Temperature, and Influenza Mortality: 30 Years of County-Level Evidence from the United States”. In: American Journal of Epidemiology 176.suppl7, S114–S122. issn: 0002-9262. doi: 10.1093/aje/kws259. eprint: [https://academic.oup.com/aje/article-pdf/176/suppl_7/S114/17340361/kws259.pdf](https://academic.oup.com/aje/article-pdf/176/suppl_7/S114/17340361/kws259.pdf). url: [https://doi.org/10.1093/aje/kws259](https://doi.org/10.1093/aje/kws259). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/aje/kws259&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23035135&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F07%2F02%2F2020.06.30.20143636.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000309686100012&link_type=ISI) 3. Beigel, John H. et al. (2020). “Remdesivir for the Treatment of Covid-19: A Preliminary Report”. In: New England Journal of Medicine 0.0, null. doi: 10.1056/NEJMoa2007764. eprint: [https://doi.org/10.1056/NEJMoa2007764](https://doi.org/10.1056/NEJMoa2007764). url: [https://doi.org/10.1056/NEJMoa2007764](https://doi.org/10.1056/NEJMoa2007764). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMoa2007764&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32445440&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F07%2F02%2F2020.06.30.20143636.atom) 4. Britton, Tom, Frank Ball, and Pieter Trapman (2020). “A mathematical model reveals the influence of population heterogeneity on herd immunity to SARS-CoV-2”. In: Science. 5. Britton, Tom and Gianpaolo Scalia Tomba (2019). “Estimation in emerging epidemics: biases and remedies”. In: Journal of The Royal Society Interface 16.150, p. 20180670. doi: 10.1098/rsif.2018.0670. eprint: [https://royalsocietypublishing.org/doi/pdf/10.1098/rsif.2018.0670](https://royalsocietypublishing.org/doi/pdf/10.1098/rsif.2018.0670). url: [https://royalsocietypublishing.org/doi/abs/10.1098/rsif.2018.0670](https://royalsocietypublishing.org/doi/abs/10.1098/rsif.2018.0670). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1098/rsif.2018.0670&link_type=DOI) 6. Capasso, Vincenzo and Gabriella Serio (1978). “A generalization of the Kermack-McKendrick deterministic epidemic model”. In: Mathematical Biosciences 42.1-2, pp. 43–61. 7. Champredon, David and Jonathan Dushoff (2015). “In-trinsic and realized generation intervals in infectious-disease transmission”. In: Proceedings of the Royal Society B: Biological Sciences 282.1821, p. 20152026. 8. Champredon, David, Jonathan Dushoff, and David JD Earn (2018). “Equivalence of the Erlang-distributed SEIR epidemic model and the renewal equation”. In: SIAM Journal on Applied Mathematics 78.6, pp. 3258– 3278. 9. Chen, Nanshan et al. (2020). “Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study”. In: The Lancet 395.10223, pp. 507–513. 10. Chu, Derek K et al. (2020a). “Physical distancing, face masks, and eye protection to prevent person-to-person transmission of SARS-CoV-2 and COVID-19: a systematic review and meta-analysis”. In: The Lancet. 11. Chu, Helen Y et al. (2020b). “Early Detection of Covid-19 through a Citywide Pandemic Surveillance Platform”. In: COVID-19 Data & Resources (2020). Tech. rep. Available at: [https://www.wolfram.com/covid-19-resources/](https://www.wolfram.com/covid-19-resources/). Wolfram Research Inc. 12. Craig, John (1985). “Better measures of population density”. In: Population Trends 39, pp. 16–21. 13. De Boer, Rob J (2007). “Understanding the failure of CD8+ T-cell vaccination against simian/human immunodeficiency virus”. In: Journal of virology 81.6, pp. 2838–2848. 14. Deng, Xianding et al. (2020). “Genomic surveillance reveals multiple introductions of SARS-CoV-2 into Northern California”. In: Science. doi: 10.1126/science.abb9263. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEyOiIzNjkvNjUwMy81ODIiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMC8wNy8wMi8yMDIwLjA2LjMwLjIwMTQzNjM2LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 15. Dorling, Daniel and DJ Atkins (1995). “Population density, change and concentration in Great Britain 1971, 1981 and 1991”. In: 16. Dushoff, Jonathan and Sang Woo Park (2020). “Speed and strength of an epidemic intervention”. In: bioRxiv. 17. Dushoff Jonathan et al. (2004). “Dynamical resonance can account for seasonality of influenza epidemics”. In: 101.48, pp. 16915–16916. doi: 10.1073/pnas.0407293101 [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czoxMjoiMTAxLzQ4LzE2OTE1IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMDcvMDIvMjAyMC4wNi4zMC4yMDE0MzYzNi5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 18. Earn, David JD et al. (2012). “Effects of school closure on incidence of pandemic influenza in Alberta, Canada”. In: Annals of internal medicine 156.3, pp. 173–181. 19. European Commission Joint Research Centre (2019). Global Human Settlement Population Grid (GHS-POP) 2015 Epoch. [https://ghsl.jrc.ec.europa.eu/ghs_pop2019.php](https://ghsl.jrc.ec.europa.eu/ghs_pop2019.php). 20. Farrell, Alex et al. (2019). “Coinfection of semi-infectious particles can contribute substantially to influenza infection dynamics”. In: BioRxiv, p. 547349. 21. Fauver, Joseph R et al. (2020). “Coast-to-coast spread of SARS-CoV-2 during the early epidemic in the United States”. In: Cell. 22. Fine, Paul, Ken Eames, and David L. Heymann (Apr. 2011). ““Herd Immunity”: A Rough Guide”. In: Clinical Infectious Diseases 52.7, pp. 911–916. issn: 1058-4838. doi: 10.1093/cid/cir007. eprint: [https://academic.oup.com/cid/article-pdf/52/7/911/847338/cir007.pdf](https://academic.oup.com/cid/article-pdf/52/7/911/847338/cir007.pdf). url: [https://doi.org/10.1093/cid/cir007](https://doi.org/10.1093/cid/cir007). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/cid/cir007&link_type=DOI) 23. Fitzpatrick, Jen and DeSalvo Karen (2020). COVID-19 community mobility reports. Tech. rep. Available at: [https://www.google.com/covid19/mobility/.Google](https://www.google.com/covid19/mobility/.Google). 24. Fung, Isaac Chun-Hai, Rustom Antia, and Andreas Handel (2012). “How to minimize the attack rate during multiple influenza outbreaks in a heterogeneous population”. In: PLoS One 7.6, e36573. 25. Ganyani, Tapiwa et al. (2020). “Estimating the generation interval for coronavirus disease (COVID-19) based on symptom onset data, March 2020”. In: Eurosurveil-lance 25.17, p. 2000257. 26. Gonzalez-Reiche, Ana S et al. (2020). “Introductions and early spread of SARS-CoV-2 in the New York City area”. In: medRxiv. 27. Grein, Jonathan et al. (2020). “Compassionate Use of Remdesivir for Patients with Severe Covid-19”. In: New England Journal of Medicine 382.24, pp. 2327–2336. doi: 10.1056/NEJMoa2007016. eprint: [https://doi.org/10.1056/NEJMoa2007016](https://doi.org/10.1056/NEJMoa2007016). url: [https://doi.org/10.1056/NEJMoa2007016](https://doi.org/10.1056/NEJMoa2007016). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMoa2007016&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32275812&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F07%2F02%2F2020.06.30.20143636.atom) 28. Hamidi, Shima, Sadegh Sabouri, and Reid Ewing (2020). “Does Density Aggravate the COVID-19 Pandemic?” In: Journal of the American Planning Association 0.0, pp. 1–15. doi: 10.1080/01944363.2020.1777891. eprint: [https://doi.org/10.1080/01944363.2020.1777891](https://doi.org/10.1080/01944363.2020.1777891). url: [https://doi.org/10.1080/01944363.2020.1777891](https://doi.org/10.1080/01944363.2020.1777891). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1080/01944363.2020.1777891&link_type=DOI) 29. Handel, Andreas, Ira M Longini Jr, and Rustom Antia (2007). “What is the best control strategy for multiple infectious disease outbreaks?” In: Proceedings of the Royal Society B: Biological Sciences 274.1611, pp. 833– 837. 30. Havers, Fiona P. et al. (2020a). “Seroprevalence of Antibodies to SARS-CoV-2 in Six Sites in the United States, March 23-May 3, 2020”. In: medRxiv. doi:10.1101/2020.06.25.20140384. eprint: [https://www.medrxiv.org/content/early/2020/06/26/2020.06.25.20140384.full.pdf](https://www.medrxiv.org/content/early/2020/06/26/2020.06.25.20140384.full.pdf). url: [https://www.medrxiv.org/content/early/2020/06/26/2020.06.25.20140384](https://www.medrxiv.org/content/early/2020/06/26/2020.06.25.20140384). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoibWVkcnhpdiI7czo1OiJyZXNpZCI7czoyMToiMjAyMC4wNi4yNS4yMDE0MDM4NHYxIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMDcvMDIvMjAyMC4wNi4zMC4yMDE0MzYzNi5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 31. – (2020b). “Seroprevalence of Antibodies to SARS-CoV-2 in Six Sites in the United States, March 23-May 3, 2020”. In: medRxiv. doi: 10.1101/2020.06.25.20140384. eprint: [https://www.medrxiv.org/content/early/2020/06/26/2020.06.25.20140384.full.pdf](https://www.medrxiv.org/content/early/2020/06/26/2020.06.25.20140384.full.pdf). url: [https://www.medrxiv.org/content/early/2020/06/26/2020.06.25.20140384](https://www.medrxiv.org/content/early/2020/06/26/2020.06.25.20140384). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoibWVkcnhpdiI7czo1OiJyZXNpZCI7czoyMToiMjAyMC4wNi4yNS4yMDE0MDM4NHYxIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMDcvMDIvMjAyMC4wNi4zMC4yMDE0MzYzNi5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 32. Heesterbeek, JAP and Klaus Dietz (1996). “The concept of Ro in epidemic theory”. In: Statistica Neerlandica 50.1, pp. 89–110. 33. Holshue, Michelle L et al. (2020). “First case of 2019 novel coronavirus in the United States”. In: 34. Horby, Peter et al. (2020). “Effect of Dexamethasone in Hospitalized Patients with COVID-19: Preliminary Report”. In: medRxiv. doi: 10.1101/2020.06.22.20137273. eprint: [https://www.medrxiv.org/content/early/2020/06/22/2020.06.22.20137273.full.pdf](https://www.medrxiv.org/content/early/2020/06/22/2020.06.22.20137273.full.pdf). url: [https://www.medrxiv.org/content/early/2020/06/22/2020.06.22.20137273](https://www.medrxiv.org/content/early/2020/06/22/2020.06.22.20137273). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoibWVkcnhpdiI7czo1OiJyZXNpZCI7czoyMToiMjAyMC4wNi4yMi4yMDEzNzI3M3YxIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMDcvMDIvMjAyMC4wNi4zMC4yMDE0MzYzNi5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 35. John, T. Jacob and Reuben Samuel (2000). “Herd immunity and herd effect: new insights and definitions”. In: European Journal of Epidemiology 16.7, pp. 601– 606. doi: 10.1023/A:1007626510002. url: [https://doi.org/10.1023/A:1007626510002](https://doi.org/10.1023/A:1007626510002). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1023/A:1007626510002&link_type=DOI) 36. Johns Hopkins University, CSSE (2020). COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University. [https://github.com/CSSEGISandData/COVID-19](https://github.com/CSSEGISandData/COVID-19). 37. Jong, Mart C.M. de, Odo Diekmann, and Hans Heester-beek (1995). “How does transmission of infection depend on population size?” In: Epidemic Models: Their structure and relation to data. Ed. by Denis Mollison. Cambridge: Cambridge University Press, pp. 84–94. 38. Kucharski, Adam J et al. (2020). “Early dynamics of transmission and control of COVID-19: a mathematical modelling study”. In: The lancet infectious diseases. 39. Kudo, Eriko et al. (2019). “Low ambient humidity impairs barrier function and innate resistance against influenza infection”. In: Proceedings of the National Academy of Sciences 116.22, pp. 10905–10910. 40. Ladner, Jason T et al. (2020). “Defining the Pandemic at the State Level: Sequence-Based Epidemiology of the SARS-CoV-2 virus by the Arizona COVID-19 Genomics Union (ACGU)”. In: medRxiv. 41. Leon, David A et al. (2020). “COVID-19: a need for real-time monitoring of weekly excess deaths”. In: The Lancet 395.10234, e81. 42. Li, Qun et al. (2020). “Early transmission dynamics in Wuhan, China, of novel coronavirus–infected pneumonia”. In: New England Journal of Medicine. 43. Liu, Yang, Rosalind M Eggo, and Adam J Kucharski (2020). “Secondary attack rate and superspreading events for SARS-CoV-2”. In: The Lancet 395.10227, e47. 44. Long, Quan-Xin et al. (2020). “Clinical and immunological assessment of asymptomatic SARS-CoV-2 infections”. In: Nature Medicine, pp. 1–5. 45. Lowen, A.C. et al. (2007). “Influenza Virus Transmission Is Dependent on Relative Humidity and Temper-ature”. In: 3.10, e151. doi: 10.1371/journal.ppat.0030151. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.ppat.0030151&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=17953482&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F07%2F02%2F2020.06.30.20143636.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000251114100012&link_type=ISI) 46. Marr, Linsey C. et al. (2019). “Mechanistic insights into the effect of humidity on airborne influenza virus survival, transmission and incidence”. In: Journal of The Royal Society Interface 16.150, p. 20180298. doi: 10.1098/rsif.2018.0298. eprint: [https://royalsocietypublishing.org/doi/pdf/10.1098/rsif.2018.0298](https://royalsocietypublishing.org/doi/pdf/10.1098/rsif.2018.0298). url: [https://royalsocietypublishing.org/doi/abs/10.1098/rsif.2018.0298](https://royalsocietypublishing.org/doi/abs/10.1098/rsif.2018.0298). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1098/rsif.2018.0298&link_type=DOI) 47. Modi, Chirag et al. (2020). “How deadly is COVID-19? A rigorous analysis of excess mortality and age-dependent fatality rates in Italy”. In: medRxiv. doi: 10.1101/2020.04.15.20067074. eprint: [https://www.medrxiv.org/content/early/2020/05/14/2020.04.15.20067074.full.pdf](https://www.medrxiv.org/content/early/2020/05/14/2020.04.15.20067074.full.pdf). url: [https://www.medrxiv.org/content/early/2020/05/14/2020.04.15.20067074](https://www.medrxiv.org/content/early/2020/05/14/2020.04.15.20067074). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoibWVkcnhpdiI7czo1OiJyZXNpZCI7czoyMToiMjAyMC4wNC4xNS4yMDA2NzA3NHYzIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMDcvMDIvMjAyMC4wNi4zMC4yMDE0MzYzNi5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 48. Mollison, Denis (1985). “Sensitivity analysis of simple endemic models”. In: Population Dynamics of Rabies in Wildlife, pp. 223–234. 49. Morawska, Lidia (2005). “Droplet fate in indoor environments, or can we prevent the spread of infection?” In: Proceedings of Indoor Air 2005: the 10th International Conference on Indoor Air Quality and Climate. Springer, pp. 9–23. 50. Moriyama, Miyu, Walter J. Hugentobler, and Akiko Iwasaki (2020). “Seasonality of Respiratory Viral Infections”. In: Annual Review of Virology 7.1. PMID: 32196426, ull. doi: 10.1146/annurev-virology-012420-022445. eprint: [https://doi.org/10.1146/annurev-virology-012420-022445](https://doi.org/10.1146/annurev-virology-012420-022445). url: [https://doi.org/10.1146/annurev-virology-012420-022445](https://doi.org/10.1146/annurev-virology-012420-022445). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1146/annurev-virology-012420-022445&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32196426&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F07%2F02%2F2020.06.30.20143636.atom) 51. National Oceanic and Atmospheric Observatory (2020). Global Surface Summary of the Day (GSOD). ftp.ncdc.noaa.gov. 52. Neher, Richard A et al. (2020). “Potential impact of seasonal forcing on a SARS-CoV-2 pandemic”. In: Swiss medical weekly 150.1112. 53. Nishiura, Hiroshi (2010). “Time variations in the generation time of an infectious disease: implications for sampling to appropriately quantify transmission potential”. In: Mathematical Biosciences & Engineering 7.4, pp. 851–869. 54. Nishiura, Hiroshi, Natalie M Linton, and Andrei R Akhmetzhanov (2020). “Serial interval of novel coronavirus (COVID-19) infections”. In: International journal of infectious diseases. 55. Notari, Alessio (Mar. 2020). “Temperature dependence of COVID-19 transmission”. In: arXiv e-prints, arXiv:2003.12417, 2003.12417. arXiv: 2003. 12417 [q-bio.PE]. 56. Noti, John D et al. (2013). “High humidity leads to loss of infectious influenza virus from simulated coughs”. In: PloS one 8.2, e57485. 57. Oliveiros, Barbara et al. (2020). “Role of temperature and humidity in the modulation of the doubling time of COVID-19 cases”. In: medRxiv. 58. Park, Sang Woo et al. (2020). “Reconciling early-outbreak estimates of the basic reproductive number and its uncertainty: framework and applications to the novel coronavirus (SARS-CoV-2) outbreak”. In: MedRxiv. 59. Pearce, N et al. (2020). “Accurate Statistics on COVID-19 Are Essential for Policy Guidance and Decisions.” In: American journal of public health, e1. 60. Roberts, MG and JAP Heesterbeek (2007). “Model-consistent estimation of the basic reproduction number from the incidence of an emerging infection”. In: Journal of mathematical biology 55.5-6, p. 803. 61. Rosenberg, Eli S et al. (2020). “Cumulative incidence and diagnosis of SARS-CoV-2 infection in New York”. In: medRxiv. doi: 10.1101/2020.05.25.20113050. eprint: [https://www.medrxiv.org/content/early/2020/05/29/2020.05.25.20113050.full.pdf](https://www.medrxiv.org/content/early/2020/05/29/2020.05.25.20113050.full.pdf). url:[https://www.medrxiv.org/content/early/2020/05/29/2020.05.25.20113050](https://www.medrxiv.org/content/early/2020/05/29/2020.05.25.20113050). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoibWVkcnhpdiI7czo1OiJyZXNpZCI7czoyMToiMjAyMC4wNS4yNS4yMDExMzA1MHYxIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMDcvMDIvMjAyMC4wNi4zMC4yMDE0MzYzNi5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 62. Sanche, S et al. (2020). “High Contagiousness and Rapid Spread of Severe Acute Respiratory Syndrome Coronavirus 2.” In: Emerging infectious diseases 26.7. 63. Schell, Michael et al. (2020). “A Favorable Effect of Higher Ambient Temperature on COVID-19 Deaths in the USA”. In: Available at SSRN 3579744. doi: 10.2139/ssrn.3579744. url: [http://dx.doi.org/10.2139/ssrn.3579744](http://dx.doi.org/10.2139/ssrn.3579744). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2139/ssrn.3579744&link_type=DOI) 64. Shaman, Jeffrey, Edward Goldstein, and Marc Lipsitch (2011). “Absolute humidity and pandemic versus epidemic influenza”. In: American journal of epidemiology 173.2, pp. 127–135. 65. Shaman, Jeffrey and Melvin Kohn (2009). “Absolute humidity modulates influenza survival, transmission, and seasonality”. In: Proceedings of the National Academy of Sciences 106.9, pp. 3243–3248. 66. Shaman, Jeffrey et al. (2010). “Absolute humidity and the seasonal onset of influenza in the continental United States”. In: PLoS Biology 8.2, e1000316. 67. Storms, Aaron D et al. (2013). “Worldwide transmission and seasonal variation of pandemic influenza A (H1N1) 2009 virus activity during the 2009–2010 pandemic”. In: Influenza and other respiratory viruses 7.6, pp. 1328–1335. 68. Tamerius, James et al. (2011). “Global influenza sea-sonality: Reconciling patterns across temperate and tropical regions”. In: Environmental health perspectives 119.4, pp. 439–445. 69. Tellier, Raymond (2009). “Aerosol transmission of influenza A virus: a review of new studies”. In: Journal of the Royal Society Interface 6.suppl_6, S783–S790. 70. The New York Times (2020). Covid-19 Data. [https://github.com/nytimes/covid-19-data](https://github.com/nytimes/covid-19-data). 71. United States Census (2018). County Characteristics Resident Population Estimates. [https://www2.census.gov/programs-surveys/popest/datasets/2010-2018/counties/asrh/cc-est2018-alldata.csv](https://www2.census.gov/programs-surveys/popest/datasets/2010-2018/counties/asrh/cc-est2018-alldata.csv). 72. (2019a). County Population Estimates. [https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/counties/totals/co-est2019-alldata.csv](https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/counties/totals/co-est2019-alldata.csv). 73. (2019b). Metropolitan and Micropolitan Statistical Ar-eas Population Totals and Components of Change: 2010-2019. [https://www.census.gov/data/tables/time-series/demo/popest/2010s-total-metro-and-micro-statistical-areas.html](https://www.census.gov/data/tables/time-series/demo/popest/2010s-total-metro-and-micro-statistical-areas.html). 74. (2019c). TIGER/Line Shapefiles Database. [https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html](https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html). 75. Wallinga, Jacco and Marc Lipsitch (2007). “How generation intervals shape the relationship between growth rates and reproductive numbers”. In: Proceedings of the Royal Society B: Biological Sciences 274.1609, pp. 599– 604. 76. Wang, Jingyuan et al. (2020a). “High temperature and high humidity reduce the transmission of COVID-19”. In: Available at SSRN 3551767. 77. Wang, Yeming et al. (2020b). “Remdesivir in adults with severe COVID-19: a randomised, double-blind, placebo-controlled, multicentre trial”. In: The Lancet. 78. Wilson, Steven G (2012). Patterns of metropolitan and micropolitan population change: 2000 to 2010. US Department of Commerce, Economics and Statistics Administration. 79. Worobey, Michael et al. (2020). “The emergence of SARS-CoV-2 in Europe and the US”. In: bioRxiv. 80. Xu, Ran et al. (2020a). “The Modest Impact of Weather and Air Pollution on COVID-19 Transmission”. In: medRxiv. doi: 10.1101/2020.05.05.20092627.eprint: [https://www.medrxiv.org/content/early/2020/05/24/2020.05.05.20092627.full.pdf](https://www.medrxiv.org/content/early/2020/05/24/2020.05.05.20092627.full.pdf). url: [https://www.medrxiv.org/content/early/2020/05/24/2020.05.05.20092627](https://www.medrxiv.org/content/early/2020/05/24/2020.05.05.20092627). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoibWVkcnhpdiI7czo1OiJyZXNpZCI7czoyMToiMjAyMC4wNS4wNS4yMDA5MjYyN3YzIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMDcvMDIvMjAyMC4wNi4zMC4yMDE0MzYzNi5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 81. (2020b). “The Modest Impact of Weather and Air Pollution on COVID-19 Transmission”. In: medRxiv. doi: 10.1101/2020.05.05.20092627. eprint: [https://www.medrxiv.org/content/early/2020/05/24/2020.05.05.20092627.full.pdf](https://www.medrxiv.org/content/early/2020/05/24/2020.05.05.20092627.full.pdf). url: [https://www.medrxiv.org/content/early/2020/05/24/2020.05.05.20092627](https://www.medrxiv.org/content/early/2020/05/24/2020.05.05.20092627). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoibWVkcnhpdiI7czo1OiJyZXNpZCI7czoyMToiMjAyMC4wNS4wNS4yMDA5MjYyN3YzIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMDcvMDIvMjAyMC4wNi4zMC4yMDE0MzYzNi5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 82. Yu, Hongjie et al. (2012). “Transmission dynamics, border entry screening, and school holidays during the 2009 influenza A (H1N1) pandemic, China”. In: Emerging infectious diseases 18.5, p. 758. [1]: /embed/inline-graphic-1.gif [2]: /embed/graphic-4.gif [3]: /embed/graphic-5.gif [4]: /embed/graphic-6.gif [5]: /embed/graphic-7.gif [6]: F5/embed/inline-graphic-2.gif [7]: F5/embed/inline-graphic-3.gif [8]: F6/embed/inline-graphic-4.gif [9]: /embed/inline-graphic-5.gif [10]: /embed/inline-graphic-6.gif [11]: /embed/inline-graphic-7.gif [12]: /embed/inline-graphic-8.gif [13]: /embed/graphic-17.gif [14]: /embed/inline-graphic-9.gif [15]: /embed/graphic-19.gif [16]: /embed/graphic-21.gif [17]: /embed/graphic-22.gif [18]: /embed/graphic-23.gif [19]: /embed/inline-graphic-10.gif [20]: /embed/inline-graphic-11.gif [21]: /embed/graphic-24.gif [22]: /embed/graphic-25.gif [23]: /embed/inline-graphic-12.gif [24]: /embed/graphic-26.gif [25]: /embed/inline-graphic-13.gif [26]: /embed/graphic-27.gif [27]: /embed/graphic-28.gif [28]: /embed/inline-graphic-14.gif [29]: /embed/graphic-29.gif [30]: /embed/graphic-30.gif [31]: /embed/graphic-31.gif [32]: /embed/inline-graphic-15.gif [33]: /embed/graphic-32.gif [34]: /embed/graphic-33.gif [35]: /embed/graphic-34.gif [36]: /embed/graphic-35.gif [37]: /embed/graphic-36.gif [38]: /embed/graphic-37.gif [39]: /embed/graphic-38.gif [40]: /embed/graphic-39.gif [41]: /embed/graphic-40.gif [42]: /embed/graphic-41.gif [43]: /embed/graphic-42.gif [44]: /embed/graphic-43.gif [45]: /embed/graphic-44.gif [46]: /embed/graphic-47.gif [47]: /embed/graphic-48.gif [48]: /embed/graphic-50.gif [49]: /embed/graphic-51.gif [50]: /embed/inline-graphic-16.gif [51]: F16/embed/inline-graphic-17.gif [52]: /embed/inline-graphic-18.gif [53]: /embed/graphic-55.gif