Abstract
We present an analysis of the relationship between SARS-CoV-2 infection rates and a social distancing metric from data for all the states and most populous cities in the United States and Brazil, all the 22 European Economic Community countries and the United Kingdom. We discuss why the infection rate, instead of the effective reproduction number or growth rate of cases, is a proper choice to perform this analysis when considering a wide span of time. We obtain a strong Spearman’s rank order correlation between the social distancing metric and the infection rate in each locality. We show that mask mandates increase the values of Spearman’s correlation in the United States, where a mandate was adopted. We also obtain an explicit numerical relation between the infection rate and the social distancing metric defined in the present work.
1. Introduction
The current COVID-19 pandemic is the main health crisis in the world in a century, with over 220 million cases and 4.5 million deaths [1]. It began in China at the end of 2019, and has since expanded to every country in the world, with waves occurring at different times in each location. A number of interventions were implement in most countries, such as travel ban, social distancing and mandatory mask use [2,3], and its effects have been discussed in different works, which generally concluded that they were effective in reducing the growth of cases and deaths [4–9]. Possibly the more effective measures being lock-downs, workplaces and business closing and school closing, i. e. the social distancing policies [10], with travel restrictions expected to have modest effects in reducing transmission when there is a high circulation of the virus [11].
In order to quantify and qualify the degree of social distancing and its effects, some different approaches have been proposed: by survey questionnaires in the population in order to assess adherence to social distancing and to compare it to the growth of cases, or deaths [12], or by using mobility data from different sources [13–19]. In the latter case, a mobility or social distancing metric is compared to the growth rate of cases (or deaths) of COVID-19, or to the effective reproduction number Rt. As we discuss below, this introduces a limitation in the analysis due to the fact that the interpretation of both the growth rate and Rt at the beginning of the pandemic, when most of the population is still susceptible to the virus, is different to that at latter stages, when a non-negligible proportion of the population has already been infected, of has already been vaccinated. A more informative parameter, that better represents information on the circulation of the SARS-CoV-2 virus, is the average infection rate , which is proportional to Rt divided by the proportion of the susceptible population (see Eq. 3 below). This explains particularly the result by Gatalo et al. [20] who obtained a strong Pearson correlation to Journal Not Specified https://www.mdpi.com/journal/notspecified between phone mobility data and COVID-19 growth rates at earlier stages, but a weaker correlation at later stages, for 25 counties in the United States.
We present here an analysis of the effect of social distancing for 22 European countries and for the 50 and 27 states of the United States and Brazil, and the most populous cities and municipalities for the latter two, respectively. These localities have different situations and histories of the pandemic. For instance, as mask use became mandatory at different moments for American states, we were able to obtain quantitative evidence on its effect on enhancing social distancing policies.
Our main goal is to evidence a monotonous relationship between social distancing data and the value of the infection rate, and to quantify it explicitly.
2. Material and methods
2.1. Effective reproduction number
The effective reproduction number Rt(i) at day i, estimated from the generation time distribution wj with j the number of days between infections, is given by [21]: with I(t) the number (or proportion) of infected individual at day i. The effective reproduction number can also be estimated from the series of deaths by first determining the number of infected individuals as: where u(t) is the distribution of the number t of days (taken as discrete) between first symptom and death [22], Ndeaths(t) the number of deaths at day t and is the average infection fatality ratio [23], computed from the demographic structure in each locality. We then use Eq. (1 to determine Rt at a given day.
2.2. Infection rate
The infection rate can be estimated as [24]: with S(t) the proportion of susceptible individuals in the population at day t, Rt the time dependent effective reproduction number, and γ the recovery rate from infection with the value reported in the literature. [25]. We can also write that where C is the average number of contacts of one individual per day, and Pc the probability of contagion of a susceptible individual from a single contact with an infected individual. Social distancing acts by reducing the number of contacts C, while other non-pharmaceutical interventions reduce the value of Pc.
2.3. Social distancing metric
As a proxy for the “amount” of social distancing, we define a metric quantifying the deviation from a baseline representing the pre-pandemic normality. Many possibilities exist, and different mobility data are available from different sources [26–29]. We require that data is freely available, with coverage up to the city level. For these sources only Google mobility trends satisfies these two criteria, providing data on the following six categories of locations: retail and recreation (D1); grocery and pharmacy (D2); parks (D3); transit stations (D4); workplaces (D5) and residential (D6), as percentages of variation of time spent in each type of place, with respect to a baseline defined for the period of January 3 to February 6, 2020. The symbols between parenthesis represent the numeric value of the time series for each type of data. An increase in the time spent at residence is expected to decrease the value of the infection rate , and is considered as a negative contribution to the metric, while an increase in the remaining five categories are expected to increase and thus contribute with a positive sign. The social distancing metric is then defined as a weighted average of the data for each category, with the specified sign, with weights given by an (arbitrarily) estimated average proportion of the duration of a day spent in each type of location, and given by where the value of 100 is added such that the baseline is close to this value, and has no effect of the value of the Spearman’s correlation. The resulting metric M for each Brazilian and American state are shown in Figs. 1A and 1B, respectively, with a similar behavior for the other localities considered here (not shown). This definition is such that a smaller value of M represents a more beneficial situation.
2.4. Spearman’s rank-order correlation
Spearman’s rank-order correlation rs(A, B) between two time series, and , of length Ndata, with Ai the value of the series at the i-th data value, is defined as [30]: such that −1≥rs ≥1 and di is the difference in paired ranks of the two series A and B, i. e. the difference in position of the i-th data point for the two data sets when ordered in ascending order. The coefficient rs measures the strength of how two variables are monotonically related, by an increasing or decreasing relation if rs > 0 or rs < 0, respectively.
In order to show the importance to account for the decreasing number of susceptible individuals with time, we show in Fig. 2A the time evolution of Rt and for the Los Angeles county in the United States. As the proportion of susceptible individuals decreases over time, Rt and diverge slowly. By computing the Spearman’s correlation between M and Rt and between M and , for a period of Ndata = 150 days for the same data, we see from Fig. 2B that a small difference between M and has a significant effect on the value of rs. The Spearman’s correlation between M and Rt is close to zero at later times while clearly positive for M and . This is explained by the fact that, from Eq. (2), that the same value of Rt can correspond to different values of the infection rate which is directly related to the circulation of the virus, as it measures the rate at which susceptible individuals are infected, and thus more closely related to the different mitigation policies implemented. We conclude that using Rt to represent the stage of the pandemic can lead to misleading results at later stages in assessing the effectiveness of social distancing, as the number of susceptible individuals decreases, and that of vaccinated individuals increase.
2.5. Data sources
The following data sources were employed in the present work:
Population by age for Europe: World Population Prospects - United Nations – https://population.un.org/wpp.
Time series of deaths and cases by country: World Health Organization – https://www.who.int/emergencies/diseases/novel-coronavirus-2019.
Time series of cases and deaths by US counties and states: New York Times COVID- 19 Tracker data set –
https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv.
Population by age group in US counties and states: United States Census Bureau – https://www.census.gov/data/tables/time-series/demo/popest/2010s-counties-detail.html.
Data on days of mask mandate in the US: Center for Disease Control and Prevention https://data.cdc.gov/Policy-Surveillance/U-S-State-and-Territorial-Public-Mask-Mandates-Fro/62d6-pm5i.
Population by age group for Brazilian municipalities and states: Brazilian Institute for Geography and Statistics – https://brasilemsintese.ibge.gov.br/populacao.
Time series for cases and deaths by COVID-19 by municipality and state in Brazil: Brazilian Ministry of Health – https://covid.saude.gov.br.
Detailed data on vaccination in Brazil: Brazilian Ministry of Health – https://opendatasus.saude.gov.br/dataset/covid-19-vacinacao.
3. Results and discussions
The localities that are analyzed here are:
All 50 United States states, from the first reported case up to December 20, 2021.
The 24 United States counties with a population of at least one million and at least 1000 deaths in 2020 (Nassau was not considered due to inconsistent data for the number of deaths), from the first reported case in each county up to December 20, 2021.
All 27 Brazilian states from February 26, 2020 to June 14, 2021.
The 22 Brazilian cities (municipalities) with a population of at least 750 thousand from February 26, 2020 to June 14, 2021.
All countries in the European Economic Community and the United Kingdom with Google Mobility data and at least one thousand deaths by COVID-19 in 2020, from March, 1st 2020 to December 31, 2020, with a total of 22 countries.
The span of time of the data was chosen to avoid the effect of vaccination in the United States and Europe, while for Brazil detailed and publicly available anonimized data on each vaccine shot delivered allows modeling the time evolution of the pandemic for a longer period. For estimations of susceptible population in Eq. (3 we use the epidemiological model described in [31] to determine the attack rate in each locality and is described in Appendix A. Serological surveys also provide such estimates, but are not available for every locality and for the required time window and, where available, data do not have the required time resolution.
The results of the Spearman’s rank-order correlation between the social distancing metric M and the infection rate for each locality are show in Fig. 3. In order to assess the effect of mandatory mask use in each United States county and state we compute rs for two periods: for the whole period, indicating in the corresponding graphic the percentage of time with a mask mandate, and for the period with a mask mandate, for those counties with a mandate for at least 50% of the days since the beginning of the pandemic, while for the remaining counties we consider the whole period and display the corresponding histogram in black. We also computed the Spearman’s correlation separately for each of the six mobility data reported by Google, with results shown in Figs. 4 and 5. The average of , over the time period considered for each locality, versus the total number of deaths at the end of each period is shown in Fig. 6, where an approximately linear relation is clearly visible, with the exception of a few cases in Brazil.
In order to established a numeric relationship between and M let us assume the linear relation with α a constant, and consider only the time window that allows to an accurate estimation of Rt. The distributions of values of the ratio for the Brazilian states, Brazilian municipalities, European countries, United States states and counties are shown in Figs 7A–E, with values for α/γ (CI 95%) given by 0.015 (0.0096–0.023), 0.019 (0.0081–0.042), 0.014 (0.0089–0.021), 0.015 (0.0091–0.027) and 0.014 (0.0084–0.024), respectively. We also show the best fit with a log-normal distribution for values of α/γ in Figs. 7F–J.
While vaccination reduces the proportion of susceptible individuals in the population, it does not alter the relationship of the infection rate with social distancing policies with M as a proxy, and was explicitly taken into account in our analysis by using an epidemiological model with vaccination compartments. The approach presented here allowed to evidence a monotonous relationship between the infection rate in each locality and the social distancing metric M. It also allowed to explicitly obtain a numeric relationship between and a metric for social distancing. Behavioral changes can also have a significant impact on the evolution of any epidemic, and are difficult to include in the current analysis. Nevertheless, the significant values obtained for the Spearman’s correlation indicate the important role that social distancing has played up to now. This is particularly clear in Belgium (rs = 0.75), Spain (rs = 0.8) and the United Kingdom (rs = 0.88), three countries with a high attack rate. The correlation is somewhat smaller for other localities, but nevertheless with significant positive values, clearly indicating an approximately monotonous relationship between the two variables.
For Brazil and the European countries the results for Spearman’s correlation are quite similar: the variation in time spent at residence is negatively correlated with the infection rate, i. e. the more time spent at home the smaller the value of , while other categories are positively correlated. For the United States, due to a much greater variety of mitigation policies implemented [13], we see a slightly different picture. In general, time at residence is negatively correlated with while time at workplace is positively correlated with the transmission rate, as expected. For the remaining categories (grocery and pharmacy, park, retail and recreation and transit stations) we observe both negative and positive correlations according to the locality, indicating that the most relevant categories are those related to the increase of time spent at home and the decrease of time spent at work places. For the United States case there is a significant increase in the value of rs when considering only the time period with a mask mandate, which indeed shows its effectiveness.
The values of the proportionality constant α/γ between and M(t) are surprisingly close to one another, despite the great differences in the history and implemented policies to mitigate the COVID-19 pandemic. We obtain a log-normal distribution for the value of α/γ (and for α consequently) for all types of localities considered here, with average values significantly closer, despite all the differences between countries, implemented mitigation policies, and timings. This points to a universal efficacy of social distancing, enhanced by a mandatory mask use. The explicit linear relation in Eq. (7) with the value obtained for the proportionality constant α can be used, for instance, in modeling studies with different scenarios for social isolation.
Of course not only social distancing affects the evolution of the infection rate, causing the variation observed for the Spearman’s correlation for the different localities. We note that even a small increase in , and thus a small decrease in M, for a long period of time, results in a significant increase in mortality, as can be seen from Fig. 6. Our analysis does not grasp the impact of great gatherings of individuals and the possible effect of the so-called superspreading events [20], or the implications of contact tracing.
4. Conclusion
A proper choice of a variable to represent the current circulation of the virus is central to assessing the effects of mitigation policies. The infection rate as expressed in Eq. (4) is affected by the reduction of social contacts through the average number contacts C, and by other implemented protocols, such as mask wearing, that reduce the probability of contagion per contact Pc. On the other hand, the effective reproduction number Rt, or any other measure of growth rate of the pandemic, also depends on the current attack rate, and confuses variables in the analysis. This is an important point to consider as a more detailed analysis requires a large data set, and therefore a larger time series, and therefore a significant variation in the proportion of susceptible individuals. Computing Spearman’s correlation, rather than Pearson correlation, for instance, allows us to more clearly evidence a monotonous relationship between the social distancing metric as defined here and the infection rate, and computed from the whole time series for each locality. Future research considering socioeconomic and demographic data would certainly provide valuable information on mitigation strategies targeted at specific groups, such as elders and individuals with comorbidities, as well as the impact of school closure, each considered separately from other factors [32].
We hope that the present work will contribute to a better assessment of the effects of social distancing, and at least partially of mask mandates, on the still ongoing mitigation interventions against the COVID-19 pandemic.
Data Availability
All data used in the manuscript is publicly available and properly referred in the text.
5. Acknowledgments
This work received financial support from the National Council of Technological and Scientific Development - CNPq (grant number 305291/2018-1 MAM) and i3N (grant numbers UIDB/50025/2020 & UIDP/50025/2020 JFFM) - Fundação para a Ciência e Tecnologia/MEC (Portugal).
Appendix A Epidemiological model
In order to determine the proportion of susceptible individuals in a given locality we use the approach described in [31] based on the SEIAHRV epidemiological model with variables described in Table A1
The proportion Si of susceptible individuals is obtained from the epidemiological model described in [31], with model equations: This is a non-linear delayed set of ODEs due ti the time delay between infection, hospitalization and death. The different parameter values used in the model are given in Table 1 of Ref. [31]. The force of infection in Eq. (A1) is given by with βi,j) the infection rate from an infected individual of age group j to infect an individual of age-group i. The epidemiological model is calibrated using the time series of deaths in order to avoid the significant under-notification of cases [33]. The value in Eq. (3) is an age-independent estimate obtained from the total proportion of susceptible individuals obtained from where Pi is the population in age group i and Ptot the total population for the given locality. The model is fitted from the time series of deaths as described in [31].
Footnotes
Citation: Rocha Filho, T. M.; Moret, M. A.; Mendes, J. F. F. Impact and effectiveness of social distancing for COVID-19 mitigation. Journal Not Specified 2021, 1, 0. https://doi.org/
The contents are the same. The paper was just reformatted for a journal and after a review of the grammar.