Abstract
The corona virus, COVID-19, has been spreading rapidly across the USA since early March, but at a decreasing rate, where the rate r is defined as the exponential increase. I modeled the way the rate of increase y = log(exp(r)-1) has declined through time in each of the 51 states with the goal of determining whether state-at-home orders correlate with reductions in the rate of spread of the virus. A piecewise linear regression was used, with a single break point. This model can identify whether there was a change in the rate of decline, when the change happened, and which states have shown the greatest improvement in reducing the spread of COVID-19. The piecewise model identified a significant breakpoint on 24 Mar for all states combined, and all states had nearly the same breakpoint. Prior to 24 Mar, the average change in y was -0.013 per day, meaning a reduction in the rate of spread from 23.5 pct. per day to 19.5 pct. per day; after 24 Mar, the average change in y was -0.070 per day, a reduction from 19.5 pct. per day to 7.5 pct. per day. Prior to 24 Mar there was no significant variation among states in the decline in y, but after 24 Mar there was substantial variation, and the date on which states issued stay at home orders correlated with that variation. Montana, Idaho, and Vermont showed the greatest improvement, while Nebraska, South Dakota, and Iowa the least. The improvement as measured by the reduction after 24 Mar did not correlate with case density in a state, nor state population.
1 Introduction
The number of COVID-19 infections has increased steadily since early March in every state in the USA. The density of infections (cases per capita) varies substantially among states, but the more pertinent interest is the rate at which the number of infections increases. Various public health measures have been taken to slow that increase, and my goal here is to assess one such measure – the stay-at-home order – has impacted the rate of spread. Such assessments have been done to test for the effect of measures taken to combat the 1918 influenza epidemic (Bootsma and Ferguson 2007). The assessment is based on a rigorous comparison of the rate of COVID-19 spread in all states in the USA. Though many factors might affect that rate, a longitudinal comparison within each state should be a way to judge how effective the measures were. States with sooner measures should show steeper declines in the rate of spread through time.
2 Materials and Methods
2.1 Data assembly
Daily counts of the cumulative total of COVID-19 cases per state was collected from weather.com (weather.com 2020). There is a stable url for each state that I could curl (the unix function to capture web text) with an automated script. Each day’s web presentation was complete, including daily counts back to mid-February. A url for all 51 states (with DC) had to be copied and saved, but once stored, capturing all states’ information was a fully automated process. The text came as a long html script with javascript data arrays giving numbers buried within. I wrote C++ program to extract those arrays and move them into tables in the R programming language. Many individual records were checked to confirm the data were captured correctly. Counts were cross-checked against data from a New York Times Github site (NY Times 2020) and were essentially (but not exactly) identical. Analyses were done on case records through 21 Apr 2020, downloaded on 23 Apr 2020. More recent data are shown in graphs, but were not used in the model.
2.2 New cases and deaths
The data come as the cumulative number of cases of COVID-19 and cumulative number of deaths attributed to the virus. I calculated the number of new cases per day as the difference between total cases on successive days. In a few cases where no new cases were reported on one day, I used the new cases on the next day divided by the number of intervening days and omitted the day with no reports from analyses. This avoided having any zeroes in the new case counts. As an estimate of current cases, I used the cumulative count each day minus the number of deaths. Ideally, I would have used the number of active cases, substracting also all recoveries, but that information was not available.
2.3 Rate of increase
Let the cumulative number of cases on day t be Nt, so the rate constant of population growth r is defined from
If r is constant through time, growth is exponential but I do not make this assumption. The number of new cases on day t is Ct = Nt − Nt−1 so and
Define y = ln(er − 1) as the response variable in a model of rate of spread of COVID-19. It has a roughly Gaussian distribution, and it has been changing linearly through time over the past 6 weeks in the US. Note that if r ~ 0.2 or less, er ~ 1 + r so y ~ r, and r is the fractional daily increase. When r >> 0.2, y increases monotonically with r but is a better choice due to its symmetrical distribution.
2.4 Modeling the changing rate of increase
Since the core question is whether there was a shift in the rate of increase as a result of public health measures, I chose a linear piecewise regression (McGee and Carleton 1970) model of y, the rate of increase of COVID-19, versus time. Since the rate y describes the change in the total number of cases through time, the model describes an acceleration (or deceleration): y is the first derivative, so the change in y through time is the second derivative of the number of cases. The following analysis is all about that second derivative, or how the rate of increase changes. If growth were exponential, the rate would not change, ie the second derivative of Nt would be zero. As everyone watching knows, the rate of spread of COVID-19 has been declining, and the model I create here fits that decline as a linear response to time.
The piecewise component of the regression adds the feature that the decline in the rate of spread changes. Consider a piecewise regression with two phases: there is a single breakpoint where the slope of y versus t shifts, and thus two different slopes, one on either side of the break point. This approach specifically answers the question whether there was a shift in the way the rate changed and whether individual states differed in the shift. I fit the model allowing each state to have a distinct response, so the results can identify states where there was a substantial improvement, that is, a shift toward much slower spread, and other states where the rate steadily declined without any improvement. Alternatively, the model can report no difference states. There is no a priori assumption about when the shift happened – the model will choose the breakpoint based on the data, and the model will report a rigorous test about whether or not there is a break, ie whether the slope changes.
I used a multi-level hierarchical model, also known as a mixed-effects model, in which the 51 states were random effects. This produces an estimate for how the rate of COVID-19 has changed through time in every state, but has the benefit of simultaneously using all the states together. This adds power to the model (Gelman and Hill 2007), important given that there are only 50 days in the analysis and those days must be divided into two phases. Parameters were fitted using a Bayesian approach based on a Gibbs sampler, as detailed in Condit et al. (2007, 2013, 2014). The Bayesian method produces 95 pct. credible intervals based on every parameter estimate examined by the sampler, and I inferred statistical difference between estimates when 95 pct. credible intervals did not overlap. I also tried a three phase piecewise linear model and simple linear regression (ie no break point). The three models were compared using the deviance information criterion (DIC); a lower DIC means a better model fit (Plummer 2008).
2.5 Testing the impact of health measures
A Wikipedia article includes measures and dates of implementation in every state (Wikipedia 2020). The measure most clearly offering a quantitative measure across states is the date on which stay-at-home orders were issued. Four states have not issued such orders (as of 2 May), so they were assigned a date of 10 Apr, three days after the latest order (South Carolina, 7 Apr). Three states had various orders locally, and they were excluded. The rate of improvement in each state after 24 Mar, as estimated from piecewise regression, was correlated against the date of the stay-at-home order.
3 Results
Increase in total cases
The number of COVID-19 cases increased steadily but at a consistently declining rate (Fig.1). That is, growth was less than exponential.
Decrease in daily rate of change
The rate of increase has declined in a roughly linear fashion since early March (Fig. 2). Piecewise regression identified a break on 24 March with strong statistical support. The mean slope of all 51 states (fixed effect of the model) prior to 24 March was −.013, steepening to −.070 after 24 March. Those slopes are equivalent to reducing the rate of spread from 23.3 pct. per day to 19.5 pct. per day for the two weeks prior to 24 Mar, then from 19.5 pct. to 7.5 pct. per day in the two weeks after.
There was no variation across states in the day on which the slope changed: in all 51 states it was either 24 Mar and 25 Mar, and credible for all 51 states overlapped. Likewise, the slope prior to the break did not vary significantly among states; all 51 credible intervals overlapped, and the slope was always between −0.16 and −.010.
There was, however, statistically significant variaton among states in the slope after 24 Mar. Individual states had slopes varying from −0.114 to −0.029. Idaho illustrates strong improvement, at a rate of −0.109 (Montana was slightly better at −0.114; in Idaho, was from 20.6 pct. per day on 24 Mar to 4.8 pct. per day two weeks later. Iowa illustrates poor improvement, with a rate of −.0356, from 16.6 pct. per day on 24 Mar to 10.3 pct. per day two weeks later. Thus, the virus was spreading faster in Idaho than in Iowa on 24 Mar, but the trend had reversed by 8 Apr. Those two states are highlighted in Fig. 1 and Fig. 2 to illustrate extremes. The slopes in two phases in all 51 states are given in Table 1 and are available for download (see Supplementary Data).
There was no correlation between the slope prior to 24 Mar and the slope after that day (Supplemental Fig. S1). This is expected given the lack of variation prior to 24 Mar.
Improvement in phase 2 and stay-at-home orders
There was a significant correlation between the date each state declared a stay-at-home order and the rate of improvement since 24 Mar (Fig. 3). Included in the regression are four states for which there was no stay-at-home order (as of 2 May). The average state which had issued the order on 19 Mar (the earliest) improved, according to the regression, from an increase of 19.5 pct. per day on 24 Mar to 6.5 pct. per day two weeks later. In comparison, in the average state that delayed until 7 Apr (the latest), the estimated improvement was 19.5 pct. per day to 8.6 pct. per day over those two weeks. The impact of the earlier stay-at-home was an improvement of ~ 2 pct. per day over two weeks, relative to waiting 19 days to issue the order.
Improvement in phase 2 and case density
There was no correlation between the improvement in the rate of spread, as measured by the slope after 24 Mar in each state, and the case density (cases per million) on 24 Mar (Fig. 3). The slope was more negative (better improvement) in states with a higher density of cases, but the regression was not significant (Fig. 3).
Improvement in phase 2 and population density
There was no correlation between the improvement in the rate of spread, as measured by the slope after 24 Mar in each state, and the population size of a state. The slope was slightly positive but non-significant (p = 0.63, r2 ~ .01).
Alternative models
Three-phase piecewise regression identified one break matching the sharp shift of the two-phase model, plus a later break that was accompanied by no change in slope. Based on the deviance information criterion (DIC), the two phase model (DIC=2392.9) was superior to the three-phase (DIC=2636.0) or a simple regression, with constant slope throughout (DIC=2766.0).
4 Discussion
The main conclusion is that the rate at which the corona virus has been spreading across the US changed in different ways in different states. The observation that the rate declined through time – meaning growth is less than exponential – could be attributed to many different factors. But the fact that states differed in the degree of improvement, that is how much the rate declined, must be attributed to differences among states. How much improvement states showed could not be attributed to the density of infections around 24 March, nor to the population size of the state.
The rate of improvement was predicted, however, by the date on which dates issued stay-at-home orders. Three of the states with the poorest improvement, Nebraska, Iowa, and North Dakota, do not have stay-at-home orders (as of 2 May), and several states with the best improvement had early orders (Louisiana, Hawaii, Idaho, Vermont). Issuing the order three weeks earlier led to a reduction in the rate of spread of COVID-19 by about 2 pct.: the average state issuing an early order reduced the rate of spread to 6 pct. per day, while in those with a late order, it improved to 8 pct. per day. But the correlation was not strong, and there are many exceptions, particularly cases with early orders but poor improvement, especially Delaware and Ohio. An analysis of of health measures taken during the 1918 flu pandemic also found moderate and variable impacts (Bootsma and Ferguson 2007).
I would suggest that rates of spread of the corona virus should be modeled using the number of active cases, since those are where new infections arise. But the information available now includes only the cumulative number of cases. Successive cumulative counts yield an accurate estimate of new cases per day, but the denominator of the rate is cumulative cases. I subtracted deaths from that to get closer to active cases, and I am working on estimating the number of recoveries using data on time to recovery (Verity et al. 2020).
Other policy measures have been taken to inhibit COVID-19 spread, and some of these undoubtedly correlate with the stay-at-home order. Which measures are most important remains untested. Many other factors must be impacting how quickly the rate of spread has slowed, and further work should examine other predictors of state-level variation. Testing other predictors with county-level variation might be more effective, and I plan to extend the piecewise regression model presented here to counties.
7 Supplementary Data
Data and software can be downloaded from the author’s web page, http://conditdatacenter.org/covid19.
6 Acknowledgments
Alexander Schapiro helped collect web data.