The Effect of Large-Scale Anti-Contagion Policies on the Coronavirus (COVID-19) Pandemic

Solomon Hsiang; Daniel Allen; Sébastien Annan-Phan; Kendon Bell; Ian Bolliger; Trinetta Chong; Hannah Druckenmiller; Andrew Hultgren; Luna Yue Huang; Emma Krasovich; Peiley Lau; Jaecheol Lee; Esther Rolf; Jeanette Tseng; Tiffany Wu

doi:10.1101/2020.03.22.20040642

Abstract

Governments around the world are responding to the novel coronavirus (COVID-19) pandemic¹ with unprecedented policies designed to slow the growth rate of infections. Many actions, such as closing schools and restricting populations to their homes, impose large and visible costs on society. In contrast, the benefits of these policies, in the form of infections that did not occur, cannot be directly observed and are currently understood through process-based simulations.^2–4 Here, we compile new data on 936 local, regional, and national anti-contagion policies recently deployed in the ongoing pandemic across localities in China, South Korea, Iran, Italy, France, and the United States (US). We then apply reduced-form econometric methods, commonly used to measure the effect of policies on economic growth, to empirically evaluate the effect that these anti-contagion policies have had on the growth rate of infections. In the absence of any policy actions, we estimate that early infections of COVID-19 exhibit exponential growth rates of roughly 45% per day. We find that anti-contagion policies collectively have had significant effects slowing this growth, although policy actions in the US appear to be too recent to have a substantial impact since the magnitude of these effects grows over time. Our results suggest that similar policies may have different impacts on different populations, but we obtain consistent evidence that the policy packages now deployed are achieving large and beneficial health outcomes. We estimate that, to date, current policies have already prevented or delayed on the order of eighty-million infections. These findings may help inform whether or when ongoing policies should be lifted or intensified, and they can support decision-making in the over 150 countries where COVID-19 has been detected but not yet achieved high infection rates.⁵

Introduction

The 2019 novel coronavirus¹ (COVID-19) pandemic is forcing societies around the world to make consequential policy decisions with limited information. After containment of the initial outbreak failed, attention turned to implementing large-scale social policies designed to slow contagion of the virus,⁶ with the ultimate goal of slowing the rate at which life-threatening cases emerge so as to not exceed the capacity of existing medical systems. In general, these policies aim to decrease opportunities for virus transmission by reducing contact among individuals within or between populations, such as by closing schools, limiting gatherings, and restricting mobility. Such actions are not expected to halt contagion completely, but instead are meant to slow the spread of COVID-19 to a manageable rate. These large-scale policies are developed using epidemiological simulations^{2, 4, 7–17} and a small number of natural experiments in past epidemics.¹⁸ However, the actual impacts of these policies on infection rates in the ongoing pandemic are unknown. Because the modern world has never experienced a pandemic from this pathogen, nor deployed anti-contagion policies of such scale and scope, it is crucial that direct measurements of policy impacts be used alongside numerical simulations in current decision-making.

Populations in almost every country are now currently weighing whether, or when, the health benefits of anti-contagion policies are worth the costs they impose on society. For example, restrictions imposed on businesses are increasing unemployment,¹⁹ travel bans are bankrupting airlines,²⁰ and school closures may have enduring impacts on affected students.²¹ It is therefore not surprising that some populations hesitate before implementing such dramatic policies, particularly when these costs are visible while their health benefits – infections and deaths that would have occurred but instead were avoided or delayed – are unseen. Our objective is to measure this direct benefit; specifically, how much these policies slowed the growth rate of infections. We treat recently implemented policies as hundreds of different natural experiments proceeding in parallel. Our hope is to learn from the recent experience of six countries where the virus has advanced enough to trigger large-scale policy actions, in part so that societies and decision-makers in the remaining 180+ countries can access this information immediately.

Here we directly estimate the effects of local, regional, and national policies on the growth rate of infections across localities within China, South Korea, Iran, Italy, France, and the US (see Figure 1 and Appendix Table A1). We compile publicly available sub-national data on daily infection rates and the timing of policy deployments, including (1) travel restrictions, (2) social distancing through cancellation of events and suspensions of educational/commercial/religious activities, (3) quarantines and lockdowns, and (4) additional policies such as emergency declarations or expansions of paid sick leave, from the earliest available dates to the present (March 18, 2020; see complete descriptions in the Appendix). Because the pandemic is still in its early stages, populations in these countries remain almost entirely susceptible to COVID-19, causing the natural spread of infections to exhibit almost perfect exponential growth.^{7, 14, 22} The rate of this exponential growth may change daily and is determined by epidemiological factors, such as disease infectivity and contact networks, as well as policies that induce behavior changes.^{7, 8, 22} We cannot experimentally manipulate policies ourselves, but because they are being deployed while the epidemic unfolds, we can measure their impact empirically. We examine how the growth rate of infections each day in a given locality changes in response to the collection of ongoing policies applied to that locality on that day.

Figure 1: Data on COVID-19 cases and large-scale anti-contagion policies in six countries.

The left-hand-side plots show cumulative confirmed cases of COVID-19 (solid black line w. squares) and deaths attributed to the disease (dashed black line) over time (left axes). Deployment of anti-contagion policies are indicated by vertical lines, with height corresponding to the number of administrative units that instituted the policy on a given day (right axes). For display purposes, not all policies are shown for each country; we highlight up to five policy types in each country, giving priority to those that were enacted in the most administrative units (see SI for all policies). The right-hand-side maps show the number of confirmed cases by administrative unit, as of March 18, 2020. The area of each circle scales with the number of cases. Country-specific longitudinal data are at the region level in France, the state level in the US, the province level in South Korea, Italy, and Iran, and the city level in China. For China, there are some discrepancies between the administrative units used in the epidemiological data and the official administrative units that the map is based on. We use the former in our analysis, but display the administrative units common to both data sets here.

We employ well-established “reduced-form” econometric techniques^{23, 24} commonly used to measure the effect of policies^{25, 26} or other events (e.g., wars²⁷ or environmental changes²⁸) on economic growth rates. Similarly to early COVID-19 infections, economic output generally increases exponentially with a variable rate that can be affected by policy or other conditions. Unlike process-based epidemiological models,^{7–9, 12, 22, 29, 30} the reduced-form statistical approach to inference that we apply does not require explicit prior information about fundamental epidemiological parameters or mechanisms, many of which remain unknown in the current pandemic. Rather, the collective influence of these factors is empirically recovered from the data without modeling their individual effects explicitly (see Methods). Prior work on influenza,³¹ for example, has shown that such statistical approaches can provide important complementary information to process-based models.

To construct the dependent variable, we transform location-specific, sub-national time-series of infections into first-differences of their natural logarithm, which is the per day growth rate of infections (see Methods). We use data from first- or second-level administrative units and data on active or cumulative cases, depending on availability (see Appendix Section 2). We then employ widely-used panel regression models^{23, 24} to estimate how the daily growth rate of infections changes over time within a location when different combinations of large-scale social policies are enacted (see Methods). Our econometric approach accounts for differences in the baseline growth rate of infections across locations due to differences in demographics, socio-economic status, culture, or health systems across localities within a country; it accounts for systemic patterns in growth rates within countries unrelated to policy, such as the effect of the work-week; it is robust to systematic under-surveillance; and it accounts for changes in procedures to diagnose positive cases (see Methods and Appendix Section 2). The reduced-form statistical techniques we use are designed to measure the total magnitude of the effect of changes in policy, without attempting to explain the origin of baseline growth rates or the specific epidemiological mechanisms linking policy changes to infection growth rates (see Methods). Thus, this approach does not provide the important mechanistic insights generated by process-based models; however, it does effectively quantify the key policy-relevant relationships of interest using recent real-world data when fundamental epidemiological parameters are still uncertain.

Results

We estimate that in the absence of policy, early infection rates of COVID-19 grow 45% per day on average, implying a doubling time of approximately two days. Country-specific estimates range from 25.23% per day (p< 0.05) in China to 65.04% per day (p< 0.001) in Iran, although an estimate only using data from Wuhan, the only Chinese city where a meaningful quantity of pre-policy data is available, is 55% per day (p< 0.001). Growth rates in South Korea, Italy, France, and the US are very near the 45% average value (Figure 2A). These estimated values differ from the observed growth rates because the latter are confounded by the effects of policy. In the early stages of most epidemics, a large proportion of the population remains susceptible to the virus, and if the spread of the virus is left uninhibited by policy or behavioral change, exponential growth will continue until the fraction of the susceptible population declines meaningfully.^{7, 29} This decline results from members of the population leaving the transmission cycle, due to either recovery or death.²⁹ At the time of writing, the minimum susceptible population fraction in any of the administrative units analyzed is 99.4% of the total population (Lodi, Italy: 1,445 infections in a population of 230,000). This suggests that all administrative units in all six countries would likely be in a regime of uninhibited exponential growth if policies were removed today.

Figure 2: Estimated “no policy” infection growth rates and the estimated impact of anti-contagion policies.

Markers are country-specific estimates, whiskers are 95% CI that account for spatial autocorrelation and other country-by-day events. Columns report effect sizes as a change in the continous-time growth rate (Δ log(infections) per day) and the day-over-day percentage growth rate (% change per day). (A) Estimates of daily COVID-19 infection rate growth that would occur in the absence of any anti-contagion policy. The vertical red dashed line is the average estimated daily infection rate across the six countries. We display a separate estimate for Wuhan, China because it is the only city in China with substantial pre-policy infection data; but only the national China estimate is included in the average. (B) Estimated total impact of all anti-contagion policies in each country on infection growth rates. These estimates describe the combined impact of all policies deployed in a country, for a locality where policies are fully utilized and affect the entire population. Weekly impacts in China are the week-by-week impact for the same set of policies which remained deployed for multiple weeks. Results are marginally statistically significant for France (p< 0.09). (C) Estimated effects of individual policies (or collections of policies) on the daily growth rate of infections. Policies are sometimes grouped by similarity in goal (e.g. closing libraries and closing museums are grouped) or timing (e.g. policies that are deployed simultaneously in a given country) to reduce the number of estimated parameters. Effects are all estimated simultaneously within a country. For China, we simultaneously estimate separate effects for each week after the policy was implemented (e.g. “China, week 2” is the change in daily growth rates caused by policies implemented 8-14 days prior).

Consistent with predictions from epidemiological models,^{2, 18, 32} we find that the combined effect of all policies within each country reduces the growth rate of infections by a substantial and, except in the US, statistically significant amount (Figure 2B). For example, a locality in Italy with a baseline growth rate of 0.38 (national avg.) that deployed all policy actions used in Italy would be expected to lower its daily growth rate by 0.18 to 0.20. In general, the estimated total effects of policy packages are large enough that they can in principle offset a large fraction of, or even eliminate, the baseline growth rate of infections—although in several countries many localities are not currently deploying the full set of policies. Our estimate for the total growth effect of all US policies is quantitatively substantial (−0.25) but not statistically significant. US estimates are highly uncertain due to the short period of time for which data are available and because the time elapsed since these actions may be too short to observe a significant impact. In China, where policies have been enacted for over seven weeks, we observe that policy impacts have grown over time during the first three weeks of deployment (−0.11 to −0.33). In all other countries except China, we only estimate an average effect for the entire interval of observation, due to the short temporal length of the sample.

The estimates above describe the superposition of all policies deployed in each country, i.e. they represent, for each country, the average effect of policies on infection growth rates that we would expect to observe, if all policies enacted anywhere in the country were implemented simultaneously in a region of the country. We also estimate the effects of individual types of policies or clusters of policies that are grouped based on their similarity in goal (e.g., closing libraries and closing museums are grouped) or timing (e.g., policies that are generally deployed simultaneously in a certain country). In many cases, our estimates for these effects are statistically noisier than the estimates for all policies combined (presented above) because we are estimating multiple effects simultaneously. Thus, we are less confident in individual estimates and in their relative rankings. Estimated effects differ between countries, and policies are neither identical nor perfectly comparable in their implementation across countries or, in many cases, across different localities within the same country. Nonetheless, overall we estimate that almost all policies likely contribute to slowing the growth rate of infections (Figure 2c), except two policies (social distancing in France and Italy) where point estimates are slightly positive, small in magnitude, and not statistically different from zero.

We combine the estimates above with our data on the timing of hundreds of policy deployments to estimate the total effect to date of all policies in our sample. To do this, we use our estimates above to predict the growth rate of infections in each locality on each day given the policies in effect at that location on that date (Figure 3, blue markers). We then use the same model to predict what counterfactual growth rates would be on that date if all policies were removed (Figure 3, red), which we refer to as a “no policy” scenario. The difference between these two predictions is our estimated effect that all anti-contagion policies actually deployed had on the growth rate of infections on that date. We estimate that since the beginning of our sample, on average, all anti-contagion policies combined have slowed the average daily growth rate of infections −0.166 per day (±0.015,p < 0.001) in China, −0.276 (±0.066,p < 0.001) in South Korea, −0.158 (±0.071,p < 0.05) in Italy, −0.292 (±0.037,p < 0.001) in Iran, −0.132 (±0.053,p < 0.05) in France and −0.044 (±0.059,p = 0.45) in the US. Taken together, these results suggest that anti-contagion policies currently deployed in the first five countries are achieving their intended objective of slowing the pandemic, broadly confirming epidemiological simulations. We estimate that anti-contagion policies have not yet had a substantial nor significant impact suppressing overall infection growth rates in the US.

Figure 3: Estimated infection growth rates based on actual anti-contagion policies and in a “no policy” counterfactual scenario.

Predicted daily growth rates of active (China and South Korea) or cumulative (all others) COVID-19 infections based on the observed timing of all policy deployments within each country (blue) and in a scenario where no policies were deployed (red). The difference between these two predictions is our estimated effect of actual anti-contagion policies on the growth rate of infections. Small markers are daily estimates for sub-national administrative units (vertical lines are 95% CI). Large markers are national average values for all sub-national units in our sample on that day. Black circles are observed changes in log(infections), averaged across the same administrative units. Predictions are only for observations in our sample, and we omit observations before sub-national units report ten cumulative cases. To focus our analysis on the impact of new policies, we omit data from China after March 5, 2020 because policies began to be rolled back during this period.

At a particular moment in time, the total number of COVID-19 infections depends on the growth rate of infections on all prior days. Thus, persistent decreases in growth rates have a compounding effect on total infections, at least until a shrinking susceptible population slows growth through a different mechanism. To provide a sense of scale and context for our main results in Figures 2 and 3, we integrate the growth rate of infections in each locality from Figure 3 to estimate total infections to date, both with actual anti-contagion policies and in the “no policy” counterfactual scenario. To account for the declining size of the susceptible population in each administrative unit, we couple our econometric estimates for the effects of policies to a simple Susceptible-Infected-Removed (SIR) model of infectious disease dynamics^{7, 22} (see Methods). This allows us to extend our projections beyond the initial exponential growth phase of infections, a threshold which our results suggest would currently be exceeded in several countries in the “no policy” scenario.

Our results suggest that ongoing anti-contagion policies have already substantially reduced the number of COVID-19 infections observed in the world today (Figure 4). Our central estimates suggest there would be roughly 74-million more cumulative cases in China, 5-million more in South Korea, 1.2-million more in Italy, 2.6-million more in Iran, 650,000 more in France, and 20,000 more in the US had these countries never enacted any anti-contagion policies since the start of the pandemic. The relative magnitudes of these impacts partially reflects the intensity and extent of policy deployment (e.g. how many localities deployed policies) and the duration for which they have been applied. Several of these estimates are subject to large uncertainties (see intervals in Figure 4).

Figure 4: Estimated cumulative COVID-19 infections with and without anti-contagion policies.

The predicted cumulative number of COVID-19 infections based on each country’s actual policy deployments (blue) and in the “no policy” counterfactual scenario (red). Sub-national infection growth rates from Figure 3 are integrated adjusting for SIR system dynamics in each sub-national unit (see Methods). Shaded areas show uncertainty based on 1,000 simulations where estimated parameters are resampled from their joint distribution (dark = inner 70% of predictions; light = inner 95%). Black circles show the cumulative number of reported infections observed in the data. In both scenarios, the sample is restricted to units we analyze in Figures 2 and 3. Note that infections are not projected for administrative units that never report infections in the data, but which plausibly would have experienced infections in a “no policy” scenario. The jump in infections in France on March 2, 2020 occurs due to administrative units entering the sample.

Discussion

Overall, our results indicate that large-scale anti-contagion policies are achieving their intended objective of slowing the growth rate of COVID-19 infections. Because infection rates in the countries we study would have initially followed rapid exponential growth had no policies been applied, our results suggest that these ongoing policies are currently providing large health benefits. For example, we estimate that there would be roughly 621× the current number of infections in South Korea, 36× in Italy, and 153× in Iran if large-scale policies had not been deployed during the early weeks of the pandemic. Consistent with process-based simulations of COVID-19 infections,^{2, 4, 10}–^{12, 14, 17, 29} our empirical analysis of existing policies indicates that seemingly small delays in policy deployment likely produce dramatically different health outcomes.

While the quantity of currently available data poses challenges to our analysis, our aim is to use what limited data exist to estimate the first-order impacts of unprecedented policy actions in an ongoing global crisis. As more data become available, empirical research findings will become more precise and may capture more complex interactions. For example, this analysis does not account for potentially important interactions between populations in nearby localities,^{7, 33} nor the structure of mobility networks.^{3, 4, 10, 12, 17, 34} Nonetheless, we hope the results we are able to obtain at this early stage of the pandemic can support critical decision-making, both in the countries we study and in the other 150+ countries where COVID-19 infections have been reported.

Based on our results from China, where the most post-policy time has elapsed and where a relatively uniform set of policies were imposed during a narrow window of time, it appears that roughly three weeks are required for policies to achieve their full effect. In other countries, these temporal dynamics are more difficult to disentangle with currently available data, in part because less post-policy data is available and also because countries continue to deploy new policies, making it more challenging to precisely measure the lagged effects of earlier policies. Future work should investigate these timing changes after more time has passed and new data become available.

A key advantage of our reduced-form “top down” statistical approach is that it captures the real-world behavior of affected populations without requiring that we explicitly model all underlying mechanisms and processes. This property is useful in early stages of the current pandemic when many process-related parameters remain unknown. However, our results cannot and should not be interpreted as a substitute for process-based epidemiological models specifically designed to provide guidance in public health crises. Rather, our results complement existing models, for example, by helping to calibrate key model parameters. We believe both forward-looking simulations and backward-looking empirical evaluations should be used to inform decision-making.

Here we have focused our analysis on large-scale social policies, specifically, to understand their impact on infection rate growth within a locality. However, contact tracing, international travel restrictions, and medical resource management, along with many other policy decisions, will play key roles in the global response to COVID-19. Our results do not speak to the efficacy of these other policies.

Our analysis accounts for some known changes in the availability of testing for COVID-19 and changes in testing procedures; however, it is likely that other unobserved changes in patterns of testing could affect our results. For example, if growing awareness of COVID-19 caused an increasing fraction of infected individuals to be tested over time, then unadjusted infection growth rates later in our sample would be biased upwards. Because an increasing number of policies are active later in these samples as well, this bias would cause our current findings to understate the overall effectiveness of anti-contagion policies.

It is also possible that changing public information during the period of our study has some unknown effect on our results. If individuals alter their behavior in response to new information unrelated to anti-contagion policies, such as news reports about COVID-19, this could alter the growth rate of infections and thus affect our estimates. Because the quantity of new information is increasing over time, if this information reduces infection growth rates, it would cause us to overstate the effectiveness of anti-contagion policies. We note, however, that if public information is increasing in response to policy actions, then it should be considered a pathway through which policies alter infection growth, not a form of bias. Investigating these potential effects is beyond the scope of this analysis, but it is an important topic for future investigations.

Lastly, we note that the results presented here are not sufficient, on their own, to determine which anti-contagion policies are ideal for individual populations, nor whether the social costs of individual policies are larger or smaller than the social value of their health benefits. Computing a full value of health benefits also requires understanding how different growth rates of infections and total active infections affect mortality rates, as well as determining a social value for all of these impacts. Furthermore, this analysis does not quantify the sizable social costs of anti-contagion policies, a critical topic for future investigations.

Methods

Data Collection and Processing

We have provided a brief summary of our data collection processes here (see Appendix Section 2 for more details, including access dates). Epidemiological and policy data for each of the six countries in our sample were collected from a variety of in-country data sources, including government public health websites, regional newspaper articles, and Wikipedia crowd-sourced information. The available epidemiological and policy data varied across the six countries, and preference was given to collecting data at the most granular administrative unit level. The country-specific panel datasets are at the region level in France, the state level in the US, the province level in South Korea, Italy and Iran, and the city level in China. Below, we describe our data sources.

China

We acquired epidemiological data from an open source GitHub project¹ that scrapes time series data from Ding Xiang Yuan. We extended this dataset back in time to January 10 by manually collecting official daily statistics from the central and provincial (Hubei, Guangdong, and Zhejiang) Chinese government websites. We compiled policies by collecting data on the start dates of travel bans and lockdowns at the city-level from the “2020 Hubei lockdowns” Wikipedia page², the Wuhan Coronavirus Timeline project on Github³, and various other news reports. As we suspect that most Chinese cities have been treated by at least one anti-contagion policy, due to their reported trends in infections, we have dropped cities where we cannot find a policy deployment date to avoid miscategorizing the policy status of cities.

South Korea

We manually collected and compiled the epidemiological dataset in South Korea, based on provincial government reports, policy briefings, and news articles. We compiled policy actions from press releases from the Korean Centers for Disease Control and Prevention (KCDC), the Ministry of Foreign Affairs, local governments’ websites, and news articles.

Iran

We used epidemiological data from the table “New COVID-19 cases in Iran by province”⁴ in the “2020 coronavirus pandemic in Iran” Wikipedia article, which have been compiled from the data provided on the Iranian Ministry of Health website (in Persian). We relied on news media reporting and two timelines of pandemic events in Iran^5,6 to collate policy data.

Italy

We utilized epidemiological data from the GitHub repository⁷ maintained by the Italian Department of Civil Protection (Dipartimento della Protezione Civile). For policies, we primarily relied on the English version of the COVID-19 dossier “Chronology of main steps and legal acts taken by the Italian Government for the containment of the COVID-19 epidemiological emergency” written by the Department of Civil Protection (Dipartimento della Protezione Civile)⁸.

France

We used the region-level epidemiological dataset provided by France’s government website⁹ and supplemented it with scraped number of confirmed cases by region on France’s public health website, which is updated daily.¹⁰ We obtained data on France’s policy response to the COVID-19 pandemic from the French government website,¹¹ press releases from each regional public health site,¹² and Wikipedia¹³.

United States

We used state-level epidemiological data from the GitHub repository¹⁴ associated with the interactive dashboard from Johns Hopkins University (JHU). For policy responses, we relied on a number of sources, including the U.S. Center for Disease Control (CDC), individual state health departments, as well as various press releases from county and city-level government or media outlets.

Policy Data

Policies in administrative units were coded as binary variables, where the policy is coded as either 1 (after the date that the policy was implemented, and before it is removed) or 0 otherwise, for the affected administrative units. There were instances when a policy implementation only affected a portion of the administrative units (e.g. half of the counties within the state). In an attempt to accurately represent the locality and impact of policy implementation, policy variables were weighted by the percentage of population within the administrative unit that was treated by the policy. The most recent estimates available of population data for countries’ administrative units were used (see the Population Data section in the Appendix). Additionally, in order to standardize policy types across countries, we mapped country-specific policies to one of our broader policy categories used as variables in our analysis. In this exercise, we collected 130 policies for China, 37 for South Korea, 195 for Italy, 26 for Iran, 59 for France, and 498 for the United States (see Appendix Table A1).

Epidemiological Data

We collected information on cumulative confirmed cases, cumulative recoveries, cumulative deaths, active cases, and any changes to domestic COVID-19 testing regimes. For our regression analysis (Figure 2), we use active cases when they are available (for China and South Korea) and cumulative confirmed cases otherwise. We document quality control steps in detail in Appendix Section 2. Notably, for China and South Korea we acquire more granular data than the the data hosted on the John Hopkins University (JHU) interactive dashboard¹⁵; we confirm that the number of confirmed cases closely match between the two data sources (see Appendix Figure A2). To conduct the econometric analysis, we merge the epidemiological and policy data to form a single data set for each country.

Econometric analysis

Reduced-Form Approach

The reduced-form econometric approach that we apply here is a “top down” approach that describes the behavior of aggregate outcomes y in data (here, infection rates). This approach can identify plausibly causal effects^{23, 24} induced by exogenous changes in independent policy variables z (e.g. school closure) without explicitly describing all underlying mechanisms that link z to y and without observing intermediary variables x (e.g. behavior) that might link z to y nor other determinants of y unrelated to z (e.g. demographics), denoted w. Let f (·) describe a complex and unobserved process that generates infection rates y:

Process-based epidemiological models aim to capture elements of f (·) explicitly, and then simulate how changes in z, x, or w affect y. This approach is particularly important and useful in forward-looking simulations where future conditions are likely to be different than historical conditions. However, a challenge faced by this approach is that we may not know the full structure of f (·), for example if a pathogen is new and many key biological and societal parameters remain uncertain. Crucially, we may not know the effect that large-scale policy (z) will have on behavior (x(z)) or how this behavior change will affect infection rates (f (·)).

Alternatively, one can differentiate Equation 1 with respect to the k^th policy z_k: which describes how changes in the policy affects infections through all N potential pathways mediated by x₁, …, x_N. Usefully, Equation 2 does not depend on w. If we can observe y and z directly and estimate with data, then intermediate variables x also need not be observed nor modeled. The reduced-form econometric approach^{23, 24} thus attempts to measure directly, exploiting exogenous variation in policies z.

Model

Active infections grow exponentially during the initial phase of an epidemic, when the proportion of immune individuals in a population is near zero. Assuming a simple Susceptible-Infected-Recovered (SIR) disease model (e.g. ref. [²²]), the growth in infections during the early period is where I_t is the number of infected individuals at time t, β is the transmission rate (new infections per day per infected individual), γ is the removal rate (proportion of infected individuals recovering or dying each day) and S is the fraction of the population susceptible to the disease. The second equality holds in the limit S → 1, which describes the current conditions during the beginning of the COVID-19 pandemic. The solution to this ordinary differential equation is the exponential function where the growth rate g = β − γ and t₁ are the initial conditions. Taking the natural logarithm and rearranging, we have

Anti-contagion policies are designed to alter g, through changes to β, by reducing contact between susceptible and infected individuals. Holding the time-step between observations fixed at one day(t₂ −t₁ = 1), we thus model g as a time-varying outcome that is a linear function of a time-varying policy where θ₀ is the average growth rate absent policy, policy_t is a binary variable describing whether a policy is deployed at time t, and θ is the average effect of the policy on growth rate g. ϵ_t is a mean-zero disturbance term that captures inter-period changes not described by policy_t. Using this approach, infections each day are treated as the initial conditions for integrating Equation 4 through to the following day.

We compute the first differences log(I_t) − log(I_t−1) using active infections where they are available, otherwise we use cumulative infections, noting that they are almost identical during this early period (except in China, where we use active infections). We then match these data to policy variables that we construct using the novel data sets we assemble and apply a reduced-form approach to estimate a version of Equation 6, although the actual expression has additional terms detailed below.

Estimation

To estimate a multi-variable version of Equation 6, we estimate a separate regression for each country c. Observations are for sub-national units indexed by i observed for each day Because not all localities began testing for COIVD-19 on the same date, these samples are unbalanced panels. To ensure data quality, we restrict our analysis to localities after they have reported at least ten cumulative infections.

We estimate a multiple regression version of Equation 6 using ordinary least squares. We include a vector of sub-national unit-fixed effects θ₀ (i.e. varying intercepts captured as coefficients to dummy variables) to account for all time-invariant factors that affect the local growth rate of infections, such as differences in demographics, socio-economic status, culture, or health systems.²⁴ We include a vector of day-of-week-fixed effects δ to account for weekly patterns in the growth rate of infections that are common across locations within a country. We include a separate single-day dummy variable each time there is an abrupt change in the availability of COVID-19 testing or a change in the procedure to diagnose positive cases. Such changes generally manifest as a discontinuous jump in infections and a re-scaling of subsequent infection rates (e.g. See China in Figure 1), effects that are flexibly absorbed by a single-day dummy variable because the dependent variable is the first-difference of the logarithm of infections. Denote the vector of these testing dummies μ.

Lastly, we include a vector of P_c country-specific policy variables for each location and day. These policy variables take on values between zero and one (inclusive) where zero indicates no policy action and one indicates a policy is fully enacted. In cases where a policy variable captures the effects of collections of policies (e.g. museum closures and library closures), a binary policy variable is computed for each, then they are averaged, so the coefficient on these variables are interpreted as the effect if all policies in the collection are fully enacted. In some cases (for Italy and the US), policy data is available at a more spatially granular level than infection data (e.g. city policies and state-level infections in the US). In these cases, we code binary policy variables at the more granular level and use population-weights to aggregate them to the level of the infection data. Thus, policy variables may take on continuous values between zero and one, with a value of one indicating that the policy is fully enacted for the entire population.

For each country, our general multiple regression model is thus where observations are indexed by country c, sub-national unit i, and day t. The parameters of interest are the country-by-policy specific coefficients θ_pc. We verify that our residuals ϵ_cit are approximately normally distributed (Appendix Figure A1) and we estimate uncertainty over all parameters by clustering our standard errors at the day level.²³ This approach non-parametrically accounts for arbitrary forms of spatial auto-correlation or systematic misreporting in regions of a country on any given day (it generates larger estimates for uncertainty than clustering by i). When we report the effect of all policies combined (e.g. Figure 2B) we are reporting the sum of coefficent estimates for all policies , accounting for the covariance of errors in these estimates when computing the uncertainty of this sum.

Note that our estimates of θ and θ₀ in Equation 7 are robust to systematic under-reporting of infections, a major concern in the ongoing pandemic, due to the construction of our dependant variable. If only a fraction Ψ of infections are being reported such that we observe rather an actual infections I, then the left-hand-side of Equation 7 will be and is therefore unaffected by the under-reporting. Thus systematic under-reporting does not affect our estimates for the effects of policy θ.

There are some country-specific adjustments to Equation 7 due to idiosyncratic differences between samples. In China, we code policy parameters using weekly lags based on the date that the policy is first implemented in locality i. As discussed in the main text, this is done to understand the temporal dynamics of the response to policy in the one country where policy has been enacted the longest and in the most consistent way. Weekly lags are used because the incubation period COVID-19 is thought to be 5-6 days.⁴ Econometrically, this means the effect of a policy implemented one week ago is allowed to differ arbitrarily from the effect of a policy implemented two weeks ago, etc. These effects are all estimated simultaneously. Also in China, we omit day-of-week effects because there is no evidence to suggest they are present in the data – this could be due to the fact that the outbreak of COVID-19 began during a national holiday and workers never returned to work. In Iran, we estimate a separate effect of policies implemented in Tehran that is allowed to differ from the effect in other locations by creating Tehran-specific dummy variable that is interacted with both policy variables. This is implemented because of the stark and significantly different effect of policies in Tehran relative to effects in other parts of the country.

Projections

Daily growth rates of infections

To estimate the instantaneous daily growth rate of infections if policies were removed, we obtain fitted values from Equation 7 and compute a predicted value for the dependent variable when all P_c policy variables are set to zero. Thus, these estimated growth rates capture the effect of all locality-specific factors on the growth rate of infections (e.g. demographics), day-of-week-effects, and adjustments based on the way in which infection cases are reported. This counterfactual does not account for changes in information that are triggered by policy deployment, since those should be considered a pathway through which policies affect outcomes, as discussed in the main text. When we report an average “no policy” growth rate of infections (Figure 2A), it is the average value of these predictions for all observations in the original sample. Location-and-day specific counterfactual predictions , accounting for the covariance of errors in estimated parameters, are shown as red markers in Figure 3.

Cumulative infections

To provide a sense of scale for the estimated cumulative benefits of effects shown in Figure 3, we link our reduced-form empirical estimates to the key structures in a simple SIR system and simulate this dynamical system from the start of the pandemic to the present in each country. The system is defined as the following: where S is the susceptible population and R is the removed population. Here β is a time-evolving parameter, determined via our empirical estimates as described below. Accounting for changes in S becomes increasingly important as the size of cumulative infections (I_t + R_t) becomes a substantial fraction of the local subnational population, which occurs in some “no policy” scenarios. Our reduced-form analysis provides estimates for the growth rate of active infections (ĝ) for each locality and day, in a regime where S ≈ 1. Thus we know but we do not know the values of either of the two right-hand-side terms, which are required to simulate Equations 8-10. To estimate γ, we note that the left-hand-side term of Equation 10 is which we can observe in our data for China and South Korea. Computing first differences in these two variables (to differentiate with respect to time), summing them, and then dividing by active cases gives us estimates of γ from Equation 10 (medians: China=0.076, Korea=0.029). These values differ slightly from the classical SIR interpretation of γ because, in the public data we are able to obtain, individuals are coded as “recovered” when they no longer test positive for COVID-19, whereas in the classical SIR model this occurs when they are no longer infectious. We adopt the average of these two medians, setting γ = .052. We use medians rather than simple averages because low values for I induce a long right-tail in daily estimates of γ and medians are less vulnerable to this distortion. We then use our empirically based reduced-form estimates of ĝ (both with and without policy) combined with Equations 8-11 to project total cumulative cases in all countries, shown in Figure 4. We simulate infections and cases for each administrative unit in our sample beginning on the first day for which we observe 10 or more cases (for that unit) using a time-step of 4 hours. We estimate uncertainty by resampling from the estimated variance-covariance matrix of all parameters.

Data Availability

All data and code used in this analysis are available at https://github.com/bolliger32/gpl-covid. Updates are posted at http://www.globalpolicy.science/covid19.

Appendix for

1. Appendix Tables and Figures

View this table:

Table A1: Number of policies tabulated by administrative divisions of each country.

Policy data have been collected at various levels of administrative divisions in each country. “AdmO” represents the country level, and higher “Adm*” numbers indicate smaller administrative subdivisions. Each policy is counted at the highest level of specificity of the regions where the policy is applied. For example, if a country has ten regions at the “Adml” level, and a policy is applied across five of those regions, the policy is counted as five separate “Adml” policies rather than a single “AdmO” policy.

Figure A1: Error distributions for estimated growth rates of COVID-19 cases by country.

These plots show the error structure for each country-specific econometric model used to predict the daily growth of active or cumulative COVID-19 cases under the country’s actual policy regime, as compared to the counterfactual world where no policies were enacted. See the full model under the Methods - Econometric analysis section as well as the results in Figure 3 of the main paper.

Figure A2: Validating our disaggregated epidemiological data against data from the Johns Hopkins Center for Systems Science and Engineering.

As an additional check, we compared the cumulative number of confirmed cases from a handful of regions in our collated epidemiological dataset to the same statistics from the 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by the Johns Hopkins Center for Systems Science and Engineering (JHU CSSE).¹ We conducted this comparison for the two countries that we had the most data for and at two different administrative levels. In China, we aggregate city level data up to the province level, and in Korea we aggregate provincial level data up to the country level. The numbers tracked each other for the entire time series we have collected thus far.

2. Data Acquisition and Processing

This section describes the data acquisition and processing procedure for both epidemiological and policy data used in this paper. The sources for both types of data come from a variety of in-country data sources, which include government public health websites, regional newspaper articles, and Wikipedia crowd-sourced information. We have supplemented this data with international data compilations. A list of the epidemiological and policy data compiled for this analysis can be found here.

Epidemiological Data

The epidemiological datasets and sources used in this paper are described below. The main health variables of interest:

“cum_confirmed_cases”: The total number of confirmed positive cases in the administrative area since the first confirmed case.
“cum_deaths”: The total number of individuals that have died from COVID-19.
“cum_recoveries: The total number of individuals that have recovered from COVID-19.
“cum_hospitalized”: The total number of hospitalized individuals.
“cum_hospitalized_symptom”: The total number of symptomatic hospitalized individuals.
“cum_intensive_care” : The total number of individuals that have received intensive care.
“cum_home_confinement”: The total number of individuals that have been self-quarantined in their homes as a result of a positive test.
“active_cases”: The number of individuals who currently still test positive on the date of the observation.
“active_cases_new”: The number of new cases since the previous date.
“cum_tests”: The total number of tests (includes both positive and negative results) conducted in an administrative unit.

Additional metadata accompanying the health outcome variables:

“date”: The date of observation.
“adm0_name”: The ISO3 code to which this observation belongs.
“adm1_name”: The name of the “Adml” region to which this observation belongs.
“adm2_name”: If the dataset contains observations at the “Adm2” level, then this is the name of the “Adm2” region to which this observation belongs.
“adm[1,2]_id”: Any alphanumeric ID scheme to identify different administrative units (e.g. FIPS code).
“lat”: The latitude of the centroid of the administrative unit.
“lon”: The longitude of the centroid of the administrative unit.
“policies_enacted”: The number of active policies that are in place for the administrative unit as of that date. This variable is not population weighted.
“testing_regime”: A categorical variable used to identify when an administrative region (or country) changed their COVID-19 testing regime. This is zero-indexed, with the ordering only indicating chronological progression (there is no external meaning to Regime 2 vs. Regime l vs. Regime 0, and there is no consistency enforced for coding across countries). For example, if China changes their testing regime twice, all observations prior to the first regime change would be coded “testing_regime=0,” all observations in between the two changes would be coded “testing_regime=l,” and all observations after the second change would be coded “testing_regime=2.”

Data Imputation

In instances where health outcome observations are missing or suffer from data quality issues, we have imputed to fill in the missing values. Imputed health outcome variables are denoted by “[health_outcome]_imputed.” For the majority of our analyses we do not use imputed data; France is the exception where we impute two days of missing data. We do this to ensure we have variation in policy variables for use in the analysis.

We impute by:

Taking the natural log of the non-missing observations pertaining to that health outcome variable.
Linearly interpolating over the missing dates for that health outcome variable.
Exponentiating the interpolated values back into levels and rounding to the nearest integer.