Abstract
The inter-cities mobility network is of great importance in understanding outbreaks, especially in Brazil, a continental-dimension country. Grounded on the complex networks approach, cities are here represented as nodes and the flows as weighted edges - these geographical graphs, (geo)graphs, are handled in a Geographical Information System. We adopt the IBGE database from 2016, which contains the weekly flow of people between cities in terrestrial vehicles. The present work aims to investigate the correspondences of the networks’ measures, like strength, degree, and betweenness with the emergence of cities with confirmed cases of COVID-19 in Brazil, and special attention is given to the state of São Paulo. We show that the results are better when certain thresholds are applied to the networks’ flows to neglect the lowest-frequency travels. The correspondences presented statistical significance for most measures up to a certain period. Until the end of April, the best matchings are with the strength measure (total flow related to a node/city) under a high flow threshold in the São Paulo state, when the most connected cities are reached. After this stage, the lower thresholds become more suitable, indicating a possible signature of the outbreak interiorization process. Surprisingly, some countryside cities such as Campina Grande (state of Paraiba), Feira de Santana (state of Bahia), and Caruaru (state of Pernambuco) have higher strengths than some states’ capitals. Furthermore, some cities from the São Paulo state such as Presidente Prudente and Ribeirao Preto are captured in the top-rank positions of all the analyzed network measures under different flow thresholds. Their importance in mobility is crucial and they are potential super spreaders like the states’ capitals. Our analysis offers additional tools for understanding and decision support to inter-cities mobility interventions regarding the SARS-CoV-2 and other epidemics.
INTRODUCTION
The complex network approach(1) emerges as a natural mechanism to handle mobility data computationally, taking areas as nodes (fixed) and movements between origins and destinations as connections (flows)(2,3,4). The inter-cities mobility network is vital for understanding outbreaks, especially in Brazil, a continental-dimension country(5,6,7).
As of May 1st, 2020, the pandemic of COVID-19, caused by the SARS-CoV-2, has globally spread, with about 2,066,023 confirmed cases and 239,447 deaths. In Brazil, there are more than 92,665 confirmed cases and more than 6,439 deaths(8,9,10), with the first documented case located in the city of São Paulo on February 25th, 2020.
This paper presents an investigation on how topological properties of terrestrial mobility networks relate to the emergence of COVID-19 cases in Brazil, considering cities as nodes and flows as weighted edges. We compute three pointwise measures for each node, namely the strength, degree, and betweenness centrality to find the structurally more important cities and contrast them with the documented cases of COVID-19 until May 1st, 2020.
The most common mobility data used in studies of this nature in Brazil are the pendular travels, from the 2010 national census (IBGE)(11). In this paper, we use the roads’ IBGE data from 2016(12), which contains the flows between cities considering terrestrial vehicles in which it is possible to buy a ticket (mainly buses and vans). The information collected by that research seeks to quantify the interconnection between cities, the movement of attraction that urban centers carry out for the consumption of goods and services, and the long-distance connectivity of Brazilian cities. The North region is not included in this paper, because neither the fluvial nor the air modals are covered and their roles are key to understanding the spreading process there, especially in the Amazon region.
Our contributions are the analysis of i) the Brazilian inter-cities mobility networks under different flow thresholds to neglect the lowest-frequency travels, especially in the beginning of the outbreak, when the interiorization of the disease is still not in progress; ii) the correspondence between the networks’ statistics and the emergence of COVID-19 in Brazil. The present investigation offers additional tools for understanding and decision support in the containment of the ongoing epidemiological spreading(13,14) and others in the future. From the mobility data, the authorities have a preliminary list of cities with a high likelihood of having patients to further employ preventive actions like social distancing.
This paper is organized as follows: the Method section presents the data and the techniques we employ, such as the complex networks’ measures, and the geographical visualization tools. Following, the analysis results are exhibited with the discussion and final remarks.
METHOD
The above-cited IBGE data(12) contains the weekly travel frequency (flow) between pairs of Brazilian cities/districts. The frequencies are aggregated within the round trip, which means that the number of travels from city A to city B is the same as from B to A. We produce two types of undirected networks with a different number N of nodes to capture actions in two scales (country and state):
N = 4987 - Brazil without the North region (BRWN): nodes are cities and edges are the flow of direct travels between them.
N = 620 - São Paulo state (SP): a subset of the previous network, containing only cities within the São Paulo state.
We focus on two versions of each network for certain flow thresholds η, the η0 (η = 0) that is the original network from the IBGE data and ηd (η = d), to neglect travels with lower-level frequencies. The d corresponds to the higher flow threshold that produces the network with the largest diameter. The motivation behind ηd is to get a threshold high enough to not consider the least frequent connections and to not disregard the most frequent ones(4).
Complex network measures
The topological degree k of a node is the number of links it has to other nodes. As here the networks are undirected, there is no distinction between incoming and outgoing edges.
In a connected graph, there is at least one shortest path σvw between any pair of nodes v and w. The betweenness(2) centrality b of a node i is the rate of those shortest paths that pass through i:
Although it is a pointwise measure, it takes into account non-local information related to all shortest paths on the network. It is worth highlighting that in the present context this centrality index is not a transportation (physical) measure but a mobility (process) one. Besides, both degree and betweenness do not account for the network flows here, but the binary (weightless) networks. The diameter of a network is the distance between the farthest nodes, given by the maximum shortest path.
The strength of a node on the other hand is the accumulated flow from incident edges: in which Fij is the flow between nodes i and j.
In our context, the degree gives the number of cities that a city is connected to, showing the number of possible destinations for the SARS-CoV-2. The strength captures the total number of people that travel to (or come from) such places in a week. From a probability perspective, the cities that receive more people are more vulnerable to SARS-CoV-2. The betweenness centrality, on the other hand, considers the entire network to depict the topological importance of a city in the routes that are more likely to be used.
Geographical visualization
A geographical approach for complex systems analysis is especially important for mobility phenomena(14). Santos et al. (2017)(15) proposed a graph where the nodes have a known geographical location, and the edges have spatial dependence, the (geo)graph. It provides a simple tool to manage, represent, and analyze geographical complex networks in different domains(4,16) and it is used in the present work. The geographical manipulation is performed with the PostgreSQL Database Management System and its spatial extension PostGIS. Lastly, the maps are produced using the Geographical Information System ArcGIS.
RESULTS AND DISCUSSION
This section presents the results of the topological analysis for the previously mentioned networks. Table 1 showcases the size N of each network, number of edges |E|, average strength 〈s〉, average degree 〈k〉, and average betweenness 〈b〉.
The |E| decreases for increasing η, due to the removal of edges with lower flows. The resulting networks are undirected. Throughout the paper, both the degree and the betweenness measures do not account for the flows, but weightless edges instead. Two nodes are connected when between them there is a nonzero flow, which means that the number of connections decreases for increasing threshold (η). We compute the diameter of the networks for varying η.
Figure 1 shows the exact point (dashed line) where the higher threshold with maximum diameter is found for both networks: ηd = 207.55 for BRWN and ηd = 161.01 for SP.
Following the (geo)graphs approach, it is possible to visualize nodes and edges of the Brazilian mobility network in the geographical space for ηd in Figure 2. The edges for η0 are not plotted, because there are more than 59000 and the visualization was not clear. It is important to highlight some key cities like Belo Horizonte, Rio de Janeiro, São Paulo and Salvador, and the high number of connections between them. Figure 3 depicts the geographical graph regarding the state of São Paulo.
Figure 4 shows the map of the topological degree related to each node/city, considering all original flows (η0), and in Figure 5 there is the equivalent for ηd = 207.55. Key cities are labeled in the maps.
Figures 6 and 7 present the strength for the SP network, with η0 and ηd, respectively. Some cities with high strength also appear in a report(17) of most vulnerable cities to COVID-19 due to their intense traffic of people, namely São Paulo, Campinas, São José do Rio Preto, São José dos Campos, Ribeirão Preto, Santos, Sorocaba, Jaboticabal, Bragança Paulista, Presidente Prudente, Bauru, and many others. Currently, they all have a significant number of confirmed cases.
We now assess which of the computed measures (s, k, and b) better approximates the emergence of COVID-19 in Brazil. We compare the top-ranked η ∈ [1,X] cities of each measure with the η cities that contain confirmed cases. According to the available data of the notified cases from daily state bulletins of the Brazilian Health Ministry(10), until May 1st, 2020, the number of cities with at least one confirmed patient with COVID-19 is X = 1902 in the BRWN network, which corresponds to 38% of the nodes, and X = 323 in SP (52% of the nodes). This provides a way of tracking the response of each measure in detecting vulnerable cities according to the evolution of the virus spreading process.
Some cities from the aforementioned data are not present in our network, due to a simplification that the IBGE does: it groups small neighboring cities with almost no flow into single nodes. For simplicity, and considering that such cities do not contain cases in the first days of the outbreak, they are not accounted for in our analysis.
In order to verify whether the rate of correspondence between the top-ranked cities from the networks’ measures and the cities with COVID-19 cases has statistical significance, we verify what are the results of picking cities at random instead of under the measures’ guidance. We perform 105 simulations for each n ∈ [1, X], choosing n nodes by sort and monitoring what is the rate p of positive cases. Figure 8 presents the correspondence of the first n cities with COVID-19 documented cases and both the simulated data and the top-ranked nodes under s, k, and b. The gray region represents 95% of the rates’ occurrences in the simulations for each n, and the maximum observed value is the dashed line.
In our analysis, on May 1st, about 95% of the simulations have matching rates within 0. 38 ±0.01 for the BRWN network, and the same volume is within 0.52 ±0.03 for the SP. The results for node selection during the first days via the network indexes all lie above the dashed line, which means that all indexes are a better heuristic than picking nodes at random in the beginning. However, immediately after April 21st, k with η0 and b for both thresholds start to cross the dashed line in SP, having results compared to the simulations. Those three curves become to have the worst results for BRWN as well, after a transient.
Oscillations are perceived in Figure 8a) for small n, but they stabilize afterward and follow a tendency. The matching p is at maximum in the beginning, because the first documented case was in the city of São Paulo, which is the first ranked city in all measures. The curve then decreases until reaching a region where the oscillations take place.
The network quantifiers pose good correspondences already in the beginning of the spreading process as the dashed line is not touched until n approaches X. The high-frequency oscillations of Figure 8 a) are pronounced up to March 24th (n ≈ 150). That is probably the transient needed for the spreading process to reach a more steady behavior.
There is no mark on March 24th in Figure 8 b), because the number of new cities with confirmed cases is negligible in the period. Interestingly, on March 31st, a week later, the high-frequency oscillations start to diminish in SP. A few days further, after April 7th, the betweenness centrality with ηd starts to be a bad predictor for BRWN and then for SP.
Following, we quantitatively evaluate the curves from Figure 8 and others with different thresholds, to check exactly which η better captures the spreading process of COVID-19 in the mobility network. Figure 9 displays the integral R of each of those curves with η ∈ [0, μ + 2σ], in which μ is the average flow of the network and σ is the standard deviation. The ηd is marked with the vertical line, showing to be a good threshold in SP, but bad in BRWN. While for SP the strength is always the best measure, there is a certain oscillation in BRWN, where both the s and b are the best predictors for small threshold, switching to b at η ≈ 45 and then to k at η ≈ 110. The best prediction is given by betweenness with η ≈ 60, and similar results are captured by both s and b at η0. When it comes to the SP network, the ηd captures the exact point where s has the best outcome.
Table 2 enumerates the first twenty ordered cities according to the best-evaluated measures and compares them side-by-side with the first twenty cities with COVID-19 cases in the BRWN network. The best measures for SP are compared with each other in Table 3 as well. In both networks, the metrics present high-frequency oscillations in the beginning as shown in Figure 8, but still have some correspondences with the first confirmed cases.
Regarding Table 2, some cities are captured by the three measures but do not appear in the first column, namely Fortaleza (CE), Salvador (BA), Campinas (SP), Ribeirão Preto (SP) and Belo Horizonte (MG), but they soon had patients with COVID-19. Interestingly, the city of Feirade Santanta (BA) appears in all columns - it is the second-largest city of the state and connects the capital to the countryside of Bahia.1 Oppositely, the city of João Pessoa, capital of Paraiba state (PB) does not appear in the top 20 of the second column (best measure - see Figures 7 and 8), but two other cities from the state do, namely Campina Grande (PB) and Patos (PB). Campina Grande and Patos are among the five richest cities of Paraíba.2 Note that within the context of an epidemic, such cities are potential super spreaders along with the states’ capitals. Five cities of Pernambuco state (PE) appear in the second column (best measure - see Figures 7 and 8), namely Caruaru, Carpina, Limoeiro, Paudalho, and Recife. Pernambuco is currently ranked as the second state in the number of confirmed cases of the Northeast region(10).
Table 3, as in Table 2, also displays cities that are captured by the three rightmost columns that do not appear in the first, showing their high level of vulnerability: Ribeirao Preto, Jundiai, Sorocaba, Piracicaba, and Presidente Prudente. They all have documented cases before May 1st, though. Our study also captured the most influential cities that had cases already in the beginning, like São Paulo, Campinas, São José dos Rio Preto, São José dos Campos and Taubaté. Other cities appear in the second column (best metric) but not in the first: Praia Grande, São Vicente, São Carlos, Registro, Sertaozinho.
Due to their importance in mobility, many cities of Table 3, especially in the second column, appear in the report(17) on the vulnerability of microregions of São Paulo state to the SARS-CoV-2 pandemic of April 5th either as potential spreaders or places with a high probability of receiving new cases. They all have notified cases by May 1st and some have the highest numbers of São Paulo state.3
Both s and b with η0 pose good results at the beginning of the pandemics for the BRWN network, but s alone started to be the best predictor from the end of April. The most important cities, due to their high flow of travelers and their role in the most used routes, are reached first, followed by those with smaller flows, probably because of the interiorization of the virus - the outbreak reaching the countryside cities. This behavior is even more pronounced in SP, in which s under ηd is the best option at first, neglecting lower flow venues, but the η0 started to be the best option from the end of April.
In the ongoing pandemics, from May 1st, the s index with η0 is currently the best predictor and may help to figure out which countryside cities are about to receive new cases. Moreover, it may help in the following waves of the disease. In the case of another pandemic, one could first compute the strength of the networks according to the last updated data from IBGE and identify the top-ranked cities. In Brazil, it is enough checking on strength at the original data, as we presented, since it produces similar results as the betweenness centrality and is computationally cheaper to obtain. Regarding the state of São Paulo, one better checks on the strength index with threshold ηd in the first weeks and only then switch to η0. As our results show, the correspondence has statistical significance and, along with other information about the regions such as where are the first notified cases, the pandemic could be closely traced.
FINAL REMARKS
We present a complex network-based analysis in the Brazilian inter-cities mobility networks towards the identification of cities that are vulnerable to the SARS-CoV-2 spreading. The networks are built with the IBGE terrestrial mobility data from 2016 that have the weekly flow of people between cities. The cities are modeled as nodes and the flows as weighted edges and the geographical graphs, (geo)graphs, are visualized within Geographical Information Systems.
Two scales are investigated, the Brazilian cities without the North region, and the state of São Paulo. The former does not account for the North due to the high number of fluvial routes and some intrinsic local characteristics that are not represented with the terrestrial data. The state of São Paulo is important in the ongoing pandemic since the first documented case was in the state capital and it is currently one of the main focus of the virus spreading.
Three network measures are studied, namely the strength, degree, and betweenness centrality, under several flow thresholds to account for different mobility intensities, ranging from the original flow data to networks with only the edges with higher weights. We verified that the strength has the best matching to the cities with COVID-19 confirmed cases. Moreover, the strength measure with the original flows showed to be the best option for Brazil. Oppositely, a more restricted threshold culminates in better correspondences at the beginning of the pandemic in SP. Probably due to the interiorization of the spreading process, a transition is observed after a certain point, when the original flows have better results as the connections to smaller cities are only present when they are accounted for.
Surprisingly, some countryside cities such as Campina Grande (state of Paraiba), Feira de Santana (state of Bahia), and Caruaru (state of Pernambuco) have higher strengths than some states’ capitals. Furthermore, some cities from the São Paulo state such as Presidente Prudente and Ribeirao Preto are captured in the top-rank positions of all the analyzed network measures under different flow thresholds. Their importance in mobility is crucial and they are potential super spreaders like the states’ capitals.
As future work, we intend to analyze aerial and fluvial mobility data as well, as they include valuable information about the transport of people and goods. The former is fundamental to the discussion of the dynamics for the Brazilian North region, especially the Amazon, and the latter captures long-range connections. Lastly, one could check for correspondences between the networks’ measures and data from other epidemic outbreaks.
Data Availability
The data we used are publicly available and can be found in two publications that are properly cited within the manuscript: 1) IBGE. (2017). Ligacoes rodoviarias e hidroviarias: 2016. Coordenacao de Geografia, Rio de Janeiro, ISBN 9788524044175; 2) W. Cota. Monitoring the number of COVID-19 cases and deaths in Brazil at municipal and federative units level. (2020). Scientific Electronic Library Online (SciELO). DOI https://doi.org/10.1590/SciELOPreprints.362.
Financial support
São Paulo Research Foundation (FAPESP) Grant Number 2015/50122-0 and DFG-IRTG Grant Number 1740/2; FAPESP Grant Number 2018/06205-7; CNPq Grant Numbers 420338/2018-7 and 101720/2020-3.
AUTHOR’S CONTRIBUTION
V. L. S. Freitas and L. B. L. Santos conceived the original idea, methodology, collected the data, interpreted results, wrote the paper and developed the methodology. J. Feitosa and C. S. N. Sepetauskas developed the methodology. All authors reviewed and approved the final version. The authors declare no conflict of interest.
ACKNOWLEDGEMENTS
We would like to thank Jussara Angelo (Fiocruz, Rio de Janeiro) for the valuable discussions.
Footnotes
↵1 Source (IBGE): https://agenciadenoticias.ibge.gov.br/agencia-sala-de-imprensa/2013-agencia-de-noticias/releases/25278-ibge-divulga-as-estimativas-da-populacao-dos-municipios-para-2019
↵3 http://www.saude.sp.gov.br/resources/cve-centro-de-vigilancia-epidemiologica/areas-de-vigilancia/doencas-de-transmissao-respiratoria/coronavirus/coronavirus010520_65situacao_epidemiologica.pdf