Abstract
The spread of dengue and other arboviruses constitutes an expanding global health threat. The daunting heterogeneity in population distribution and movement in megacities of the developing world frustrates predictive modeling, even as its importance to disease spread is clearer than ever. Using surveillance data at fine resolution from Rio de Janeiro, we document a scale-invariant pattern in the size of successive epidemics following DENV4 emergence. This pattern emerges from the combined effect of herd immunity and seasonal transmission, and is strongly driven by variation in population density at sub-kilometer scales. It is apparent only when the landscape is stratified by population density and not by spatial proximity as has been common practice. Models that exploit this emergent simplicity should afford improved predictions of epidemic waves.
Introduction
When a new pathogen emerges, how large will successive epidemic waves be? When the infections confer temporary or long-term immunity, the answer to this central question will depend on spatial scale in complex ways we do not yet sufficiently understand. In particular, the size of successive outbreaks will result from the interplay of herd immunity and transmission seasonality across a landscape determined by the distribution and behavior of the human hosts, which has been called the “spatiotemporal geometry of herd immunity” (1). Outbreaks of seasonal influenza, for example, can differ from city to city along multiple axes: epidemic vs. endemic character, depth of inter-seasonal troughs, and duration and shape of epidemic waves (2–4). City size can modulate the influence of climatic drivers of transmission in seasonal influenza (1), and differential crowding within cities of different size has been shown to affect the shape of COVID-19 outbreaks, with longer tails in larger populations (5). Megacities spreading over vast tracts of ground enclose large and finely-articulated heterogeneities in population density and movement, yet the expected effects of this fine-scale structure on disease spread remain largely unexplored (6–8). What patterns have been observed have been deduced from implicit treatments of average crowding and connectivity as functions of city size (1, 5). However, transmission is an intrinsically local process and the local density and structure of the population has the potential to be a critical determinant of infection spread. Therefore it is essential to examine the role of fine-scale population structure on infectious-disease dynamics if we are to improve predictive models of urban disease transmission and spread (9, 10).
Traditional mathematical models with ‘well-mixed’ transmission between individuals within a population have formed a foundation for predictive epidemiological theory (11, 12). However, explicit consideration of the spatial dimension is proving increasingly important (8, 9, 13, 14) due to growing population connectivity at regional to planetary scales, novel sources of data on fine-scale individual movement, and the pronounced heterogeneity of the distribution of the human population across the landscape (15, 16). Whereas connectivity among cities and regions has been addressed with metapopulation formulations that couple local dynamics via movement fluxes (14, 17, 18), the treatment of space within cities remains a challenge (19, 20). This is because it remains unclear at what scales aggregation of data is appropriate and because computational expense and statistical power impose limits on the fineness of the grids that can be used to parameterize movement, local environmental conditions, and the resulting epidemiological dynamics.
The foregoing issues are prominent in the case of vector-borne arbovirus infections, including dengue, Zika, and chikungunya. Dengue virus, in particular, has become a global health threat affecting a large fraction of the world’s population as it continues to expand its geographical range (21, 22). Because of their domesticated lifestyle and close association with human hosts, the mosquito vectors responsible for dengue transmission (and also Zika, chikungunya, and yellow fever), Aedes aegypti and Aedes albopictus, are also expanding their distribution under urbanization and climate change. The population dynamics of these vector-borne diseases exhibit nonlinearity (a consequence of the immunity engendered by infection) and climate-driven transmission seasonality (caused by seasonal cycles in mosquito abundance) over the small spatial scales at which both hosts and vectors vary in density (23, 24). A recent modeling study of the emergence of DENV1 in the megacity of Rio de Janeiro, Brazil, illustrates the challenge. A well-mixed model for the city as a whole clearly failed to predict time to re-emergence (25), possibly due to its inability to accurately track the build-up of herd immunity at this coarse resolution.
Results
The high spatial resolution of dengue surveillance data from Rio de Janeiro (250m x 250m) provides an opportunity to address variation in human population density at a degree of granularity unprecedented for a whole city (Methods). During the five years we analyze, (2010--2014), Rio de Janeiro experienced three major dengue outbreaks, dominated first by the DENV1 serotype, then followed for two consecutive years by the emergent DENV4 serotype, newly arrived in Brazil (Fig. 1A). The intermittent epidemic pattern seen here, with two to three peaks dominated by an emergent single serotype, is typical of dengue dynamics in many cities of South America (26, 27). We address one prominent feature of these emergent epidemics, namely the ratio of consecutive peak sizes when a serotype first enters the city (Methods). This ratio varies widely across the city (Fig. 1C); we demonstrate that it does so as a function of highly localized population density. To understand this phenomenon, we examine the interaction of local herd immunity with seasonal transmission. Specifically, we show that sparsely populated areas experience short-lived outbreaks which reach herd immunity sooner than those in densely-populated areas. Because seasonal declines in mosquito abundance curtail the transmission season, dense areas are left with disproportionately more susceptible hosts at the end of the first wave. Accordingly, the relative size of successive waves is highly sensitive to the timing of the introduction of infection into each local area. We investigate this “spark rate” of infection importation empirically, and examine its dependence on population density. We go on to investigate alternative representations of space in predictive models (Fig. 1B). We propose that spatial geometry based on human density at fine scales is more relevant to disease dynamics than the traditional coupling of spatially proximal localities on Euclidean grids. This suggestion carries implications for predictive metapopulation models, including those for COVID-19.
We analyze the peak ratio for the two consecutive years of DENV4 during which this serotype, unlike DENV1, was new to the city (Fig. 1A). If we neglect heterotypic protection, we can therefore assume that the whole population was initially susceptible to the virus. We find that the ratio of the peak sizes of the second and first seasons of DENV4 varies across the city with values below and above one, and exhibits a clear nonlinear relationship with human density (Fig. 2). The peak ratio is larger at high and low densities than it is at intermediate densities (Fig. 2A). This pattern arises when units are aggregated by population density but disappears when aggregation is constrained by geographical contiguity as is typically done for administrative subdivisions (Fig. 2B). We can expect differences, since the two criteria of aggregation generate a very different organization of the city (Fig. 1B). Notably, the pattern of peak ratio as a function of human density is invariant under the number of groupings considered, as illustrated by the different colors in Fig. 2A. Thus, this dependence becomes scale-independent when aggregation is governed by population density itself.
To explain the peak ratio pattern, we initially investigated the role of population density with a deterministic seasonal SIR model (Methods). Two opposite variables shape the ratio of consecutive peaks by determining how much population immunity is accumulated during the first season, namely the arrival time of infection to a spatial unit and its population density. According to the model, given an arrival time t0, the peak ratio increases with population size, with the second peak becoming larger than the first one (Fig. 3A). That is, smaller units achieve the epidemic peak earlier, because their smaller susceptible pool is more rapidly depleted. When transmission rates vary seasonally, most of the susceptible population is depleted in the first year in a small unit, leaving few to be infected the next season. As the population grows, the number of susceptibles remaining at the beginning of the second season is larger and the size of the second peak increases concomitantly. In addition to this effect, the timing of the local start of transmission also strongly affects the size of the pool of susceptibles remaining after the first season. In particular, if infection arrives late, the local epidemic has less time to grow before the transmission season is curtailed. Thus, peak ratio increases with later arrival (Fig. 3A). We find that time of local infection arrival is strongly associated with human density, whereby the most dense units exhibit the first reported DENV4 cases about three months earlier (Fig. 3B). Thus, population density affects peak ratio in two opposite directions, and the seasonal SIR model recovers the documented nonlinear empirical pattern (Fig. 3C) when population densities and t0 values comparable to those observed in Rio de Janeiro are used (Fig. 3B).
To verify that our hypothesized effects are robust, we consider a more realistic model that introduces demographic stochasticity. To this end, we introduce the empirical rate of infection importation to a local unit σuemp, referred hereafter as the “spark” rate, which allows us to sidestep the explicit coupling between the units. Without loss of generality, the spark rate and its estimation do not explicitly consider the source units from which infections are imported (Methods). A stochastic SIR model under well-mixed conditions applies within each unit, which allows for local extinction of infection and for the spark rate to re-initiate transmission. The initial conditions are self-contained in the model through the arrival of the first infection to a given unit. Specifically, for each unit u the transmission rate is modelled as βSuIu/Nu + σu,where β is the local transmission rate, Iu, Su and Nu denote respectively the number of infected, susceptible, and total individuals in u, and σu is the spark rate per unit. To take into account that we are working from observed cases, we consider a reporting rate ρ ∈ [0,1] and compute the spark rate as σu = Poisson(σuemp/ρ).
Armed with an estimated spark rate, we ask whether it can explain the observed time of initiation of transmission at the unit level, as well as the pattern of peak ratio as a function of population density. We recover the observed delay in arrival time with population density, and find that this time is significantly affected by ρ (Fig. 4A, see Fig S5 for the effects of other parameters). A small ρ increases the spark rate, resulting in a tendency of earlier initiation of infection, but also decreases the detection of these early infections, which delays the observation of the first local case. Since detection of a single case is sufficient to determine arrival time, populated units are less affected by inefficient detection because they generate more local infections. The trade-off between these two effects of ρ becomes increasingly unbalanced for larger population densities. Most importantly, the stochastic model predicts the empirical relationship of peak ratio with human density, and does so more accurately when it also better captures arrival times (Fig. 4B).
The stochastic model relied on an estimated spark rate. We now examine what factors determine this rate. We find a clear dependence on both a local and a global determinant, unit population density and total city prevalence respectively. A positive relationship with the total number of cases CTot is expected, since more infection importations should be produced under higher levels of the virus circulating in the city. We find that the estimated spark rate grows as a power law with CTot (Fig. 4C). This association is itself influenced by the local population density of the units, as illustrated with the different colors in Fig. 4C. More crowded areas would experience higher human movement fluxes than less dense ones, resulting in a higher probability of their inhabitants commuting to infected areas or receiving infected visitors. The spark rate increment with population size is nonlinear, increasing faster when densities are small, and saturating for the most populated units (Fig S4). To analyze these behaviors of the spark rate, we fit a linear relationship between the logarithm of the spark rate and CTot as shown by the solid lines in Fig. 4C. The estimated parameters for each population group, the slope m and intercept b, describe the clear influence of human density (Fig. 4D). These determinants of spark rate reinforce the important role of population density in the behavior of peak ratio. They also provide a handle to potentially reduce the complexity of the infection importation process.
Discussion
Our results demonstrate that human density is a dominant driver of dengue dynamics at fine spatial scales comparable in size to city block and census tract. This effect scales up to explain the relative size of successive epidemic waves, a major epidemiological feature reflecting the depletion of susceptible individuals and the build-up of herd immunity. In other words, this fundamental aspect of the dynamics of an immunizing infection is affected by variation in population density at fine spatial scales. Importantly, this does not mean that spatial aggregation or coarse graining of the landscape is not plausible. We find that the pattern of peak ratio with density is scale-invariant, as long as coarser spatial partitions follow aggregation according to density itself, and not the traditional subdivision of administrative units based on typical contiguous space.
Thus, efforts to model dengue and possibly other infectious diseases in urban landscapes should consider the nature of aggregation space and not just its spatial resolution. On the one hand, administrative regions may better reflect similar environmental conditions such as temperature and socio-economic status influencing transmission intensity according to standard geography. On the other hand, the new partition we propose should by definition better capture human density and its effects on key aspects of dengue transmission such as infection spread and availability of susceptible individuals. Consideration of these different organizations of space can help identify and disentangle the effect of disease drivers, given that variation in incidence within the city occurs along both aggregation axes but for different sets of factors.
The effect of human density on peak ratio has practical relevance for informing public health efforts on the expected size of the next infection wave in different parts of a city. The documented peak ratio pattern demonstrates that the fine-scale spatial structure of urban populations strongly determines the temporal patterns of incidence at coarser resolutions. The importance of population structure was recently suggested by large-scale analyses of Covid-19 and influenza, in comparative studies of the temporal shape and endemicity of outbreaks at the whole-city level (1, 5). Here, we have explicitly described this structure through its effects for dengue.
The high resolution dataset also revealed a clear dependence on human density of the seasonal timing of infection arrival locally. This timing is critical to how much herd immunity will be acquired by the local population before the environmentally suitable transmission season ends, in the case of dengue in Rio de Janeiro due to variation in temperature and rainfall (28–30). Crowded spatial units experience an earlier arrival date of the dengue virus, as previously reported for influenza (31). The dependence of this timing on human density was successfully captured here by a stochastic model in which there is no explicit description of the spatial coupling between local units. Instead, the link between units is implicit in our model, via “sparks” arriving from unspecified locations from a global pool of city-wide infections.
Although the spatial spread of infection involves the complex interplay of connectivity patterns and local transmission (24, 32, 33), our modeling of the spark process reveals that the effective result can be described in terms of two accessible quantities, namely the total number of cases in the city and the local human density. This finding suggests a novel formulation of metapopulation dynamics in urban environments that will be explored in future work, where space is aggregated according to population density and the coupling occurs through a global incidence pool. Whether the complexity of human movement and resulting connectivity patterns can be captured in such a practical way in spatially explicit models of dengue and perhaps other infections remains an open question. This formulation combined with the sufficiently coarse partitions suggested by the scale-invariant pattern we uncovered, provides an alternative to the intractable high-dimensional systems needed to resolve population density heterogeneity when modeling cities and geographical regions.
Methods
Data
Spatial grid
We created a grid whose units measure 250m x 250m based on the census tract layer for the city of Rio de Janeiro from the Instituto Brasileiro de Geografia e Estatística [Brazilian Institute of Geography and Statistics] (IBGE) website <https://www.ibge.gov.br/geociencias/organizacao-do-territorio/malhas-territoriais>. Uninhabited locations were excluded.
Dengue cases on the grid
Dengue is a disease of compulsory notification in Brazil, and cases are notified at the Sistema de Informação de Agravos de Notificação [Information System on Diseases of Compulsory Declaration] (SINAN). Dengue cases notified in Rio de Janeiro between January 2010 and March 2015 were geocoded according to address of residency, and then counted for each grid unit by the Secretariat of Health of the city. We obtained the monthly dengue cases data aggregated at the grid level.
Population on the grid
The population data is obtained from the Census 2010 (IBGE) (https://www.ibge.gov.br/estatisticas/downloads-estatisticas.html) and it is available at the census tract level. The census tract areas vary in size and can be bigger than the unit of the grid, primarily in the least densely populated zones of the city. To overcome this issue, we cropped from the census tract layer the areas classified as non-urbanized (such as water bodies, swamps, agricultural areas, green areas, beaches, rocky outcrops) in 2010 by the City Hall of Rio de Janeiro (layer available at <http://www.data.rio/datasets/uso-do-solo-2010>). The population of each census tract is distributed randomly (uniformly) in the areas obtained after deleting the non-urban areas. The population within the units is computed by adding the grid layer. To create the grid and edit the census tract layer we used QGIS (version 3.6.3) (34), and to obtain the population in the grid we used the R software (35) with the packages tidyverse (36) and sf (37).
Since the units are in fact small and most of them conserve their area of 250 m by 250 m (Fig. S1A), we consider the population density as the population of each unit. Therefore, for consistency, we do not consider units with small effective areas and/or populations sizes less or equal than 10 in our analysis (we excluded 8954 units from 20212). Furthermore, very small areas and population sizes are highly sensitive to the non-urban classification and the random distribution of the census tract population. This consideration of population density facilitates the model simulations and does not affect the pattern of the peak ratio (Fig. S1B).
Peak Ratio and Spatial Aggregation
Since units are small, we binned them into G groups and aggregated their times series of reported cases. The groups were generated according to two aspects: 1) the geographical location of the units as determined by the administrative divisions of the city (10 areas and 33 regions, and 160 neighborhoods); and 2) the population of the units based on quantiles in order to obtain equal size groups. We considered specifically four different partition levels, resulting in 12, 25, 50 and 100 groups with about 900, 450, 225, and 100 units respectively (from a total number of 11247 units for the whole city).
Given a unit u, we define its time series as , where cu (ti) is the number of reported cases of dengue at time ti (i=1,2,, …f). Thus, the aggregated time series is given by with g = 1, 2, …, G.. Then, for each we computed the ratio between the sizes of the second and first DENV-4 peaks, that is
The Deterministic SIR Model
Although dengue is a vector-borne disease, for simplicity we omitted the explicit representation of the dynamics of the mosquito population, and treated vector transmission via the seasonality of the transmission rate (25). Thus, for each unit u, the deterministic SIR model is based on the following traditional differential equations: , where Su, Iu, Ru, are respectively the number of susceptibles, infected, and recovered individuals, and Nu, the number of inhabitants, of the spatial unit u. Parameter μ is the mortality rate (equal to the birth rate), and γ is the recovery rate. The seasonal transmission rate is specified as β(t) = β0 (1 + δ sin(ωt + ϕ)). The units are considered independent of each other, and the initial conditions establish that the whole population of each unit is susceptible to the virus (Su (t0) = Nu and Iu(t0) = Ru(t0) = 0 ∀u). Transmission begins with one infected individual at a time tu* > t0 where tu* is obtained from the data.
Since the goal of this model is to examine the representative dynamics of different population densities, we binned the units according to their population into twelve groups, and computed the mean valu e of their number of inhabitants Ng =< Nu∈g > and of their arrival times of the infection t* g ∼ < t* u∈g > (where g=1,…,12). We then simulated the system considering the twelve sets {Ng, t* g} as given.
The Stochastic Model
Since units will suffer local extinction of transmission, a major component of a stochastic implementation is the description of the local reintroduction of the virus, namely the arrival of a ‘spark’ or imported infection, in analogy to fire spread. Because space is described by a highly-resolved lattice, we considered that within each unit well-mixed transmission applies. Moreover, we specified no explicit spatial coupling between units. We considered instead the importation of infection through the specification of a spark rate.
For this purpose, we constructed a binary representation of the time series of cases per month by defining the spatial units either as positive or negative according to whether they reported cases or not (Fig. S3). Then, to derive a spark rate we explored the dynamics of the number of positive units as follows,
The number of positive units U +within a time t+dt is equal to the number of positive units at time t, plus the number of units that have been infected Unew+(t,t + dt) between t and t+dt, minus the number of units that were infected at t but are no longer infected at t+dt (i.e. the number of ‘extinctions’ between t and t + dt, Uextict+(t,t + dt)).
Since uninfected units (i.e. negative units) require the arrival of a spark to become positive, the following equation specifies the mean of Unew+(t,t + dt) under the assumption that a small unit is unlikely to receive more than a single spark in a period of time dt.
By combining Eqs.(2) and (3), we can now compute the spark rate per unit, we call sparks, from the high-resolution incidence data as
In order to address the effects of human density on the spark rate, we binned the spatial units according to their population into G groups. To avoid statistical effects due to group size, we considered population quantiles. Then, by applying Eq (3) to each of these groups, we obtained an empirical spark rate per unit that depends on human density, , where Ng =< Nu∈g > with g=1,2, …, G.
Simulations
The associated differential equations of the stochastic model are those shown on Eq. (1) but the transmission component has now an additional term σu to describe the importation of infections.
Since the inferred spark rate from the data (Eq. (5)) is obtained from observed infections, we computed the spark rate σu as: where ρ is the reporting rate.
The model shown on Eq.(6) was formulated as stochastic by incorporating demographic noise (with the different events represented as Poisson processes). It was implemented in R with the package POMP (38). We also considered measurement error by assuming that the observed number of cases Cuobs during a period of time T is, , where Cu (T) is the number of cases computed in the unit u. We simulated the 11247 units that compose the city of Rio de Janeiro, and aggregated the resulting time series as for the empirical data (see Peak Ratio section).
The parameters of the model are given in Table S1. We relied on parameters estimated for dengue transmission in Rio de Janeiro by (25). Those estimates were obtained for the aggregated city and for the emergence of DENV1. They result in values and a seasonality of the reproductive number of the disease, R0, consistent with those estimated independently for dengue at a different time. We use them here as sufficiently realistic parameters for the purpose of illustrating and exploring the behavior of the stochastic model with population density.
Data Availability
The data on population density and the code to analyze the data and to simulate the model will be available at Pascuals Github at a site to be specified. Requests concerning the epidemiological data should be made to the Secretariat of Health of Rio de Janeiro city.
Data Availability
The data on population density and the code to analyze the data and to simulate the model will be available at Pascual’s Github at a site to be specified. Requests concerning the epidemiological data should be made to the Secretariat of Health of Rio de Janeiro city.
Authors Contribution
VR-A and MP conceived the study. VR-A conducted the analyses. LPF and OC conducted the preparation of the spatio-temporal data and geographic analyses for this purpose. All authors contributed to the interpretation of results. VR-A, MP, an AAK drafted the manuscript. All authors contributed to its final text.
Competing interests
The authors declare no competing interests.
Material & Correspondence
Mercedes Pascual
Acknowledgments
The authors would like to thank Claudia T. Codeço for her comments on an earlier version of this work, and for her input on dengue in Rio de Janeiro. This research was completed with resources and support provided by the University of Chicago’s Research Computing Center.