County-Level Estimates of Excess Mortality associated with COVID-19 in the United States
========================================================================================

* Calvin A. Ackley
* Dielle J. Lundberg
* Irma T. Elo
* Samuel H. Preston
* Andrew C. Stokes

## Abstract

The coronavirus disease 2019 (COVID-19) pandemic in the US has been largely monitored on the basis of death certificates containing reference to COVID-19. However, prior analyses reveal that a significant fraction of excess deaths associated with the pandemic were not directly assigned to COVID-19 on the death certificate. The percent of excess deaths not assigned to COVID-19 is also known to vary across US states. However, few studies to date provide information on patterns of excess mortality and excess deaths not assigned to COVID-19 for US counties, despite the importance of this information for health policy and planning. In the present study, we develop and validate a generalized linear model of expected mortality in 2020 based on historical trends in deaths by county of residence between 2011 and 2019. We use the results of the model to generate county estimates of excess mortality and excess deaths not assigned to COVID-19 for each county in the US along with bootstrapped prediction intervals. Overall, the proportion of excess deaths assigned to COVID-19 was 81%, meaning that 19% of excess deaths were not assigned to COVID-19. The proportion assigned to COVID-19 was lower in the South (76%) and West (75%) as compared to counties in the Midwest (81%) and Northeast (94%). Across US Census Divisions, the proportion was especially low in the East South Central Division (67%). Rural counties across all divisions (67%) reported lower proportions of excess deaths assigned to COVID-19 than urban areas (83%). For instance, in the Middle Atlantic and Pacific Divisions respectively, only 47% and 39% of excess deaths were assigned to COVID-19 in nonmetro areas. In contrast, the New England Census Division stood out as the only division where directly assigned COVID-19 deaths actually exceeded excess deaths, meaning there were 1.23 directly assigned COVID-19 deaths for every 1 excess death. However, this finding did not extend to nonmetro areas within New England where only 64% of excess deaths were assigned to COVID-19. The finding that metro areas in New England reported higher direct COVID-19 mortality than excess mortality suggests that reductions in mortality from other causes of death may have occurred in these areas, at least among some populations. Across individual counties, the percentage of excess deaths not assigned to COVID-19 varied substantially, with some counties’ direct COVID-19 tallies capturing only a small fraction of total excess deaths, whereas in other counties the direct COVID-19 death rate far exceeded the number of estimated excess deaths. Taken together, our results suggest that regional inequalities in the mortality burden associated with COVID-19 are not fully revealed by data at the state level and that consideration of excess deaths across US counties is critical for a full accounting of the disparate regional effects of the pandemic on US mortality.

## 1 Introduction

Vital statistics data are critical for tracking the direct and indirect effects of the COVID-19 pandemic on population health. Provisional estimates from the National Center for Health Statistics (NCHS) indicate that 503,976 more deaths occurred among US residents in 2020 than in 2019, representing a 17.7% increase in mortality.[1]. Deaths assigned to COVID-19 explain a large portion of the increase in mortality in 2020, but other causes of death also increased, including deaths from heart disease, unintentional injury, Alzheimer disease, and diabetes.[2] These additional non-COVID-19 deaths may reflect a variety of factors, including COVID-19 deaths that were ascribed to other causes of death due to limited testing[3], indirect deaths caused by interruptions in the provision of health care services[4, 5], or indirect deaths caused by the broader social and economic consequences of the pandemic.[6]

Given potential biases in cause of death assignment, studies of excess mortality, which utilize contemporary and historical measures of all-cause mortality, may provide more accurate estimates of the impact of the COVID-19 pandemic on US mortality levels.[7] Prior studies of excess mortality indicate that official COVID-19 death tallies have significantly underestimated excess mortality related to the COVID-19 pandemic.[8] During the period from March 1 through May 30, 2020, one study identified 122,300 excess deaths, of which only 95,235 (78%) were directly assigned to COVID-19.[9] Another study calculated 522,368 excess deaths from March 1, 2020 through January 2, 2021.[10] Among them, 72% were attributed to COVID-19. The percent of excess deaths not assigned to COVID-19 also varied significantly at the state-level, suggesting that attribution of deaths to COVID-19 may not be uniform across the country. One prior study found that the proportion of excess deaths not assigned to COVID-19 differed significantly by county-level sociodemographic and health factors.[11] While this study produced estimates of the proportion of excess deaths not assigned to COVID-19 for groups of counties, it did not report individual county estimates.

To address this gap in the literature, the objective of the present study is to generate valid estimates of excess mortality at the county-level and examine geographic variation in the proportion of excess deaths not assigned to COVID-19. Examining excess deaths at the county-level has the potential to identify: (1) counties with high excess mortality but low directly assigned COVID-19 mortality, indicating that COVID-19 is underreported on death certificates or substantial numbers of indirect deaths have occurred, (2) counties with negative excess mortality, which could indicate that COVID-19 deaths were offset by reductions in other causes of deaths, and (3) counties with both high excess mortality and high directly assigned COVID-19 mortality, who have been most heavily impacted by the COVID-19 pandemic.

To estimate excess mortality, we use historical mortality data to estimate a generalized linear model with high-dimensional fixed effects, which we then use to predict mortality in each county in 2020. We show that this model has better predictive performance than simple lagged averages, which have been used in prior studies. Our estimates indicate that excess mortality and the proportion of excess deaths that are assigned to COVID-19 vary markedly across the U.S. We emphasize the value of performing this analysis at the county-level, which allows us to highlight variation within individual states in addition to across the country.

## 2 Data

We used provisional data from the National Center for Health Statistics (NCHS) on COVID-19 mortality and all-cause mortality by county of residence from January 1 to December 31, 2020 reported by April 21, 2021. We used data with a fifteen-week lag (December 31 to April 21) to improve the completeness of data, since prior analysis of provisional NCHS vital statistics reveal low completeness within the month following a death but more than 75 percent completeness after eight-weeks.[12]

COVID-19 deaths were identified using the International Statistical Classification of Diseases and Related Health Problems, Tenth Revision (ICD-10) code *U* 07.1 and include deaths assigned to COVID-19 as the underlying cause as well as deaths in which COVID-19 was reported as a cause that contributed to death on the death certificate. Prior reports indicate that COVID-19 was assigned as the underlying cause on death certificate in 92% of deaths.[13] For our historical comparison period, we used CDC Wonder data on all-cause mortality by county of residence from 2011 to 2019. To compute death rates, we used data on estimated populations from the U.S. Census Bureau for the years 2011 through 2020.

To assess geographic patterns in mortality, we classify counties on the basis of urbanicity (urban/rural), metropolitan-nonmetropolitan categories (large central metro, large metro suburb, small/medium metro, and nonmetro)[14] and U.S. Census region (West, Northeast, South, and Midwest). The classifications for urbanicity were developed by the United States Department of Agriculture (USDA) Economic Research Service (ERS) and modified by the National Center for Health Statistics. Large central metros include counties in metropolitan statistical areas with a population of more than 1 million. Large metro suburbs are counties that surround the large central metros. Small or medium metros include counties in metropolitan statistical areas with a population between 50,000 and 999,999. Nonmetropolitan areas include all other counties. We also examine patterns by 10 modified U.S. Census divisions. These include the 9 Census divisions (New England (CT, ME, MA, NH, RI, VT), Middle Atlantic (NJ, NY, PA), East North Central (IL, IN, MI, OH, WI), West North Central (IA, KS, MN, MO, NE, ND, SD), South Atlantic (DE, DC, FL, GA, MD, NC, SC, VA), East South Central (AL, KY, MS, TN), West South Central (AR, LA, OK, TX), Mountain (AZ, CO, ID, MT, NV, NM, UT, WY), and Pacific (AK, CA, HI, OR, WA)) and Appalachia, as defined by the Appalachian Regional Commission to include all of West Virginia and counties from 12 other states. We then stratified these divisions by metropolitan-nonmetropolitan categories to yield 40 distinct geographic units. Further details about these geographic units are provided by Elo et al.[14]

Our provisional data includes 3115 counties with reported all-cause mortality for 2020. We exclude counties for which we do not have mortality information over the entire period from 2011 to 2019, resulting in a final sample of 3,109 counties.

## 3 Methodology

### 3.1 General Model for County-Level Mortality

To generate a prediction of expected mortality in 2020, we estimate a statistical model of mortality using historical mortality data from 2011-2019. In particular, we model mortality at the county-year level using a generalized linear model (GLM).1 Specifically, let *Dit* denote the raw number of deaths in county *i* during year *t*, let *Pit* denote the county’s population in this year, and let *Yit* denote the ratio *Dit/Pit*. Then, we assume that the conditional mean of *Yit* depends flexibly on a linear index through a link function *g*: ![Formula][1]</img>  In the linear index, we include a county-specific intercept term, *αi*, which captures latent characteristics of each county that may be correlated with mortality. Importantly, this term picks up relevant information such as the distribution of age and health in each county.2 We include one lag of the dependent variable, *Yt−*1, to capture potential serial correlation in mortality.3 We include a time trend, *t*, as well as an interaction term of state and time, to capture state-specific trends in mortality. This accounts for demographic and other trends across states that may affect mortality.4 This model is both flexible and tractable for a variety of link functions and distributional families of the dependent variable.

### 3.2 Model Testing and Selection

To select our primary specification for a distribution and link function, we computed the out-of-sample predictive accuracy for a number of candidate specifications. Specifically, we estimated for four canonical GLM specifications for continuous and count data: Gaussian-identity, Poisson, negative binomial, and gamma-log models using data from 2011-2017. We then used the estimated model parameters to make out-of-sample predictions of 2018-2019 mortality, and computed the mean-squared prediction error and mean absolute prediction error for each. Some prior studies have used a simple average of mortality in prior years to compute excess mortality in 2020. This has the advantage of being much simpler than fitting a high-dimensional GLM, although it will fail to capture trends, and may overweight more recent years. To gauge the improvement (if any) of using a regression model rather than a prior mean to predict deaths, we compute the predictive accuracy using both a one-year lagged value as well as a six-year prior average. Table A1 reports the mean squared and mean absolute prediction error associated with each model used to generate a fitted value for mortality in 2018-2019. All of the GLMs perform substantially better than both the prior year value and the prior 6-year mean.

Among the GLMs, the Poisson model performs the best, so we take this model as our primary specification in computing excess mortality in 2020.5 To obtain predicted mortality in each county in 2020, we apply our estimated parametric conditional expectation function to 2020 data.

We use the model estimates described above to compute fitted values of total deaths and the death rate per 1000 person-years for each county and year from 2011-2020.6

Figure A1 displays the actual deaths and predicted deaths for six randomly selected counties. Figure A2 provides the same visualization for six counties with large populations. Figure A3 shows the distribution of deaths in 2019 vs. 2020.

### 3.3 Uncertainty Intervals

We also construct a 95% prediction interval around the predicted death rates for 2020 to help identify counties in which 2020 mortality falls outside the normal range of year-to-year fluctuations. Importantly, while the point estimates of predicted mortality do not depend on the particular variance function assumed, the prediction intervals depend on both the variance of estimated parameters and the variance of outcome variable.7 To compute the prediction intervals, we first compute cluster-robust standard errors (Wooldridge, 1999)[17] for the individual parameter estimates, and then perform a parametric bootstrap procedure over the parameter and outcome distributions.8

### 3.4 Excess Deaths

We define excess deaths as the difference between the number of predicted all-cause deaths in 2020 and the number of observed all-cause deaths in 2020. For each county in our sample, we produced an excess death rate for 2020. We also compute a z-score-like statistic, which we call adjusted excess deaths, by dividing the excess death estimate by the standard error of the prediction. This adjusts for county-level variation in the precision of expected death estimates. Figure A4 shows the distribution of excess deaths in 2019 vs. 2020. For 2019, the distribution is narrow and centered around zero, illustrating our model fit. In 2020, there is a large rightward shift in the distribution, reflecting the impact of the pandemic.

### 3.5 Excess Deaths Not Assigned to COVID-19

We define excess deaths not assigned to COVID-19 as the difference between the number of excess deaths in 2020 and the number of observed directly assigned COVID-19 deaths in 2020. For each county in our sample, we decomposed the excess death rate into: (1) the observed directly assigned excess death rate and (2) the excess death rate not assigned to COVID-19. Next, we define the proportion of excess deaths assigned to COVID-19 as the ratio of the COVID-19 death rate to the excess death rate.

### 3.6 Summary Statistics

When calculating summary statistics, we limited to counties with statistically significant increases in excess deaths. We defined statistically significant increases in excess deaths as counties in which actual deaths exceed the 95% prediction interval of expected deaths for 2020. Counties without statistically significant increases in excess mortality could be: (1) counties with incomplete data; (2) counties with small populations where we lack precision in our estimates, leading to large 95% CIs; (3) counties that avoided the COVID-19 pandemic due to effective policy measures or geographic isolation; (4) counties that had low or negative excess mortality due to net reductions in mortality during the COVID-19 pandemic because of reductions in flu, motor vehicle accidents, and other causes of death. Since it is not possible to distinguish counties with incomplete data from these other cases, we limited our summary statistics to those counties that reported statistically significant increases in excess mortality.

## 4 Results

Figure 1 visualizes the observed directly assigned COVID-19 death rates in 2020 for each of the 3,109 U.S. counties included in our sample, along with the excess death rate predictions generated by our model. Figure A5 provides a visualization of the adjusted excess death rate in each of these counties, and Figure 2 shows the proportion of excess deaths not assigned to COVID-19 for the counties. Next, Table B1 presents specific estimates of the excess death rate, the observed directly assigned COVID-19 death rate, and the excess death rate not assigned to COVID-19 for each county during 2020.

![Figure 1:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/04/25/2021.04.23.21255564/F1.medium.gif)

[Figure 1:](http://medrxiv.org/content/early/2021/04/25/2021.04.23.21255564/F1)

Figure 1: COVID-19 Deaths by County (Per 1000 Person-Years)
*Notes*: Heat maps of direct COVID-19 deaths per 1000 person-years by county (top) and excess death rate per 1000 person-years by county (bottom). Numbers are based on provisional data from the National Center for Health Statistics (NCHS) on COVID-19 mortality by county of residence from January 1 to December 31, 2020 reported by April 21, 2021. Note that estimates for counties in North Carolina may be unreliable due to reporting lags.

![Figure 2:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/04/25/2021.04.23.21255564/F2.medium.gif)

[Figure 2:](http://medrxiv.org/content/early/2021/04/25/2021.04.23.21255564/F2)

Figure 2: Classification of Counties based on Excess and COVID-19 Deaths
*Notes*: U.S. counties colored according to four major categories: (1) high excess death rate and less than 68% of excess deaths assigned to COVID-19, (2) high excess death rate and between 68% and 100% of excess deaths assigned to COVID-19, (3) high excess death rate and directly assigned death rate exceeds excess death rate, and (4) low or negative excess death rate. High excess indicates that the county had total deaths in 2020 that exceeded the upper 95% prediction interval threshold. Ratio denotes the ratio of direct COVID-19 deaths to excess deaths. Note that estimates for counties in North Carolina may be unreliable due to reporting lags.

Table A2 highlights counties with the highest directly assigned COVID-19 death rates and the highest excess death rates, demonstrating that different counties appear to be most heavily impacted by the pandemic depending on which measure is used. While some counties have high direct COVID-19 death rates and high excess death rates (i.e. McKinley County, NM and Lamb County, TX), other counties with high excess death rates reported low directly assigned COVID-19 mortality (i.e. Aransas County, TX, Dallas County, MO, and Clarke County, AL), suggesting that direct COVID-19 death rates were not an accurate measure for the burden of excess mortality in all areas.

Table 1 reports summary measures, taken over counties with statistically significant increases in excess deaths and stratified by urbanicity, region, and division. Rural counties reported higher directly assigned COVID-19 death rates and excess death rates than urban counties. The proportion of excess deaths directly assigned to COVID-19 was also lower in rural areas (67%) compared to urban areas (83%). Looking by region, the South and Midwest have higher rates of COVID-19 deaths and excess deaths than the Northeast and West. As for the proportion of excess deaths assigned to COVID-19, the mean of the measure was lowest in the West (75%) and South (76%) and highest in the Northeast (94%) and Midwest (81%). When examining differences by US Census Divisions, the proportion of excess deaths assigned to COVID-19 was lowest in the East South Central division (67%). In the New England division, the ratio of COVID-19 to excess deaths exceeded 1.0 (1.23) indicating a potential offset of COVID-19 deaths by reductions in other causes of death.

View this table:
[Table 1:](http://medrxiv.org/content/early/2021/04/25/2021.04.23.21255564/T1)

Table 1: Aggregated Measures by Region and Urbanicity
*Notes*: Aggregated results by various geographic regions. Aggregate rates are computed by summing actual and predicted counts over counties in a particular region and dividing by the summed population. Note that this is equivalent to the population-weighted means of the county-level rates. When calculating summary statistics, we limited to counties with statistically significant increases in excess deaths. Excess is defined at the difference between all-cause and predicted deaths. Residual denotes the difference between excess deaths and COVID-19 deaths. The ratio denotes the ratio of COVID-19 to excess deaths. Urbanicity is based on the 2010 Census urban and rural classification and urban area criteria. US Census Divisions and metro classification are based on Elo et al.[14] The Census Divisions include New England (CT, ME, MA, NH, RI, VT), Middle Atlantic (NJ, NY, PA), East North Central (IL, IN, MI, OH, WI), West North Central (IA, KS, MN, MO, NE, ND, SD), South Atlantic (DE, DC, FL, GA, MD, NC, SC, VA), East South Central (AL, KY, MS, TN), West South Central (AR, LA, OK, TX), Mountain (AZ, CO, ID, MT, NV, NM, UT, WY), Pacific (AK, CA, HI, OR, WA)) and Appalachia, as defined by the Appalachian Regional Commission to include all of West Virginia and counties from 12 other states. The metro classifications include large central metros (counties in metropolitan statistical areas with a population of more than 1 million), large metro suburbs (counties that surround the large central metros), small or medium metros (counties in metropolitan statistical areas with a population between 50,000 and 999,999) and nonmetropolitan areas (all other counties). All metrics other than proportion assigned are in units per 1000 person-years.

Table 2 presents additional summary measures, stratified by 4 metropolitan-nonmetropolitan categories and 10 US Census Divisions into 40 geographic areas. Across divisions, there was a substantial gradient in the percent of excess deaths assigned to COVID-19, comparing nonmetro areas to metro areas. For example, in the Middle Atlantic Division, only 47% of excess deaths were assigned to COVID-19 in nonmetro areas, whereas in large central metros, large fringe metros, and medium or small metros, more than 85% of excess deaths were assigned to COVID-19. Similarly, in the Pacific Division, only 39% of excess deaths were assigned to COVID-19 in nonmetro areas, while 78% are assigned to COVID-19 in large central metros. In New England, metro areas appeared to experience an offset of COVID-19 deaths by reductions in other causes of deaths. In large central metros in New England, 132% of excess deaths were assigned to COVID-19. However, in nonmetro areas in New England, only 64% of excess deaths were assigned to COVID-19. Figure A6 visualizes actual and predicted deaths for six large New England counties, showing that excess deaths occurred in these counties and that directly assigned COVID-19 deaths also exceeded them. This suggests that reductions in some other cause of death likely occurred in these areas. In contrast, Figure A7 visualizes actual and predicted deaths for six large East South Central counties where excess deaths exceeded directly assigned deaths.

View this table:
[Table 2:](http://medrxiv.org/content/early/2021/04/25/2021.04.23.21255564/T2)

Table 2: Aggregated Measures by Geographic Group
*Notes*: Aggregated results by stratified census division and metro-non-metro status. Aggregate rates are computed by summing actual and predicted counts over counties in a particular region and dividing by the summed population. Note that this is equivalent to the population-weighted means of the county-level rates. When calculating summary statistics, we limited to counties with statistically significant increases in excess deaths. Excess is defined at the difference between all-cause and predicted deaths. Residual denotes the difference between excess deaths and COVID-19 deaths. The ratio denotes the ratio of COVID-19 to excess deaths. Census division and metro classification are based on Elo et al.[14] All metrics other than proportion assigned are in units per 1000 person-years.

Figure 3 decomposes the excess death rate into the observed directly assigned COVID-19 death rate and the excess death rate not assigned to COVID-19 for the 50 counties with the highest excess death rates. Figure 4 provides the same decomposition for the 50 counties with the largest populations. Most of these counties report positive excess death rates not assigned to COVID-19. New York City (Bronx County, Queens County and Kings County) reported the highest directly assigned COVID-19 death rates and also had substantial excess death rates not assigned to COVID-19. A few counties stand out for having negative excess death rates not assigned to COVID-19 (i.e. Middlesex County, MA and King County, WA), indicating that the directly assigned COVID-19 death rate was larger than the excess death rate in these counties.

![Figure 3:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/04/25/2021.04.23.21255564/F3.medium.gif)

[Figure 3:](http://medrxiv.org/content/early/2021/04/25/2021.04.23.21255564/F3)

Figure 3: Excess Death Decomposition for Counties with the Highest Adjusted Excess Death Rate
*Notes*: Decomposition of excess deaths per 1000 person-years for the 50 counties with the highest adjusted excess death rate, which is defined as the excess death rate divided by the standard error of the prediction. For each county, the blue region reflects direct COVID-19 deaths and the pink region reflects residual deaths, which is defined as total excess deaths less direct COVID-19 deaths. Note that residual deaths are negative if the number of COVID-19 deaths exceeds our estimate of excess deaths.

![Figure 4:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/04/25/2021.04.23.21255564/F4.medium.gif)

[Figure 4:](http://medrxiv.org/content/early/2021/04/25/2021.04.23.21255564/F4)

Figure 4: Excess Death Decomposition For the 50 Largest Counties by Population
*Notes*: Decomposition of excess deaths per 1000 person-years for the 50 largest counties by population. For each county, the blue region reflects direct COVID-19 deaths and the pink region reflects residual deaths, which is defined as total excess deaths less direct COVID-19 deaths. Note that residual deaths are negative if the number of COVID-19

Lastly, Figure 5 presents the counties in the sample with the highest proportion of excess deaths not assigned to COVID-19, which suggests that they may have underreported COVID-19 the most or had the most indirect deaths associated with the pandemic. This figure visualizes the percentage of excess deaths assigned and not assigned to COVID-19 in each county and is limited to counties with excess death rates above the median.

![Figure 5:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/04/25/2021.04.23.21255564/F5.medium.gif)

[Figure 5:](http://medrxiv.org/content/early/2021/04/25/2021.04.23.21255564/F5)

Figure 5: Excess Death Decomposition for Counties with the Highest Proportion of Deaths not Assigned to COVID-19 and Excess Deaths Above Median (Population *>* 75000)
*Notes*: Decomposition of excess deaths per 1000 person-years for the 50 counties with the highest proportion of deaths not assigned to COVID-19. Sample is restricted to counties with an excess death rate above the median and whose populations exceed 75,000. For each county, the blue region reflects the proportion of excess deaths which are direct COVID-19 deaths and the pink region reflects residual proportion.

## 5 Discussion

Prior research has documented the geographic impact of directly assigned COVID-19 mortality across the United States, including significant disparities in mortality between counties.[21, 22, 23] However, cause-of-death data for COVID-19 may not capture the full extent of the COVID-19 pandemic if COVID-19 deaths are underreported or indirect deaths have occurred as a result of the social and economic consequences of the pandemic. County-level measures of excess mortality may provide a more complete assessment of the effects of the COVID-19 pandemic on US mortality levels. Prior work has estimated excess mortality for US states, revealing significant differences in patterns of excess deaths relative to directly assigned deaths.[9, 20] Studies of excess deaths in California also found significant differences in excess deaths by race/ethnicity and education.[24, 25] In the present study, we contribute to this prior literature by constructing estimates of excess mortality at the county-level for over 3100 county units in 2020. We also provide a decomposition of the excess death rate into deaths assigned versus those unassigned to COVID-19 on the death certificate. The results indicate substantial heterogeneity in COVID-19-associated mortality across geographic regions of the US and identify clusters of counties with shared patterns of excess and COVID-19-specific mortality.

National estimates of excess mortality suggest that at least 20% of excess deaths in the US are not assigned to COVID-19. At the state level, the fraction not assigned to COVID-19 varied substantially.[10] Consistent with previous national-and state-level results, we find evidence of a gap between excess mortality and directly assigned COVID-19 mortality in many counties across the United States. Our county-level estimates also show wide variation, with some counties reporting that more than 50% of excess deaths were not directly assigned to COVID-19. There are several potential explanations for the discrepancy between excess mortality and directly assigned COVID-19 mortality. One potential explanation is that the gap reflects underreporting of COVID-19 deaths. Especially early on in the pandemic, testing was severely limited, which may have reduced the likelihood of COVID-19 being assigned to the death certificate. Underreporting may have also related to a lack of awareness of the clinical manifestations of COVID-19 early in the pandemic as well as various social, health care, and political factors.[26, 11] In addition to underreporting, gaps between excess and direct mortality may in part be explained by the indirect effects of the pandemic on mortality levels. Indirect effects could relate to interruptions or delays in health care or the broader social and economic upheaval caused by the pandemic, including loss of employment, social isolation and loneliness, and other factors.[27, 28, 29] Many states have reported increases in overdose deaths during the COVID-19 pandemic, and NCHS data suggest that approximately 19,000 more deaths from unintentional injuries occurred in 2020 than in 2019.[1]

We also observed a number of counties in which the direct COVID-19 death rate exceeded our estimates of excess mortality, especially in metro New England. This finding could have occurred for several reasons. First, increases in mortality in 2020 due to COVID-19 may have been offset by declines in deaths from other causes. Provisional data on cause of death for 2020 indicates that flu deaths declined significantly relative to prior years, which may explain all or part of the offset.[30] Other causes of death may also have experienced reductions in 2020. For example NCHS data indicates that there were approximately 2,600 fewer suicide deaths in 2020 relative to 2019.[1] Shelter-in-place policies may have also been associated with reductions in non-natural deaths.[31] Direct deaths may also have exceeded our estimates of excess deaths if medical certifiers in a county over-assigned COVID-19 to the death certificate. In addition, the differences could at least partially relate to how directly assigned COVID-19 deaths were counted by NCHS; while COVID-19 was listed as the underlying cause in the vast majority of cases, in about 8% of cases, COVID-19 was listed as a contributing cause and still directly assigned to COVID-19. Finally, frailty selection may have occurred if deaths from COVID-19 occurred among individuals who were likely to die from other causes, resulting in reductions in those causes of death. It is also important to note that although directly assigned COVID-19 deaths exceeded excess deaths in the overall population in New England, patterns could differ among population subgroups related to age, race/ethnicity, and income.

Accurate county-level predictions of excess mortality, and their associated levels of uncertainty, are an important addition to existing work on excess mortality at the state and national level. County-level estimates can also enable researchers to examine geographic variation in excess mortality and leverage county-level variation in sociodemographic, health, and structural factors to examine inequities in excess deaths. They may also be relevant to local public health departments, to be used in conjunction with their existing efforts to monitor deaths directly assigned to COVID-19. County-level estimates can also be aggregated to health service areas (HSAs), hospital referral regions (HRR) and other units of aggregation including states to allow for systems level analyses of excess mortality. By aggregating at the county-level, it is possible to exclude counties that are reporting negative excess mortality as a result of data quality issues. Other studies at the state level rely on strategies such as state completeness factors to account for differential lags in reporting and thus are not able to exclude these small areas with data quality issues. Including these areas in estimates is likely to lead to an underestimation of excess mortality since areas with negative excess mortality due to data quality issues will offset areas with positive excess mortality.

Our work is broadly consistent with a prior study conducted at the county level.[11] In that study, county data were used as an input to modeling excess deaths and deaths not assigned to COVID-19 for different groupings of counties defined by county-level sociodemographic and health factors. The present study builds on that work by developing and validating a model for predicting excess death rates for individual county units rather than for groupings of counties.

This analysis had several limitations. First, unlike prior state-level analyses which leveraged weekly data on deaths, the present study used cumulative data on COVID-19 and all-cause mortality for all of 2020. Given this limitation in the available data, it was not possible to examine changes in excess mortality over time or trends in the proportion of deaths not assigned to COVID-19 at the county level. Examining trends in excess mortality using small-area data is a priority for future research which may help to distinguish the direct effects of the pandemic from indirect consequences associated with interruptions in health care and the social and economic consequences of pandemic response measures. Second, the provisional county-level mortality files released by the NCHS did not include information on cause of death, and therefore it was not possible to disentangle the sources of excess deaths in 2020. Decomposing excess deaths by cause of death will be critical to understanding why some counties have a higher fraction of unassigned deaths than others and the extent to which the discrepancies are explained by COVID-19 death under counts versus indirect pandemic effects. Third, the data used in the present study are provisional in nature and may be subject to further corrections by the NCHS in the process of generating final estimates.

In conclusion, the present study builds on prior work by extending estimates of excess mortality and excess deaths not assigned to COVID-19 to US counties. The added geographic detail of these estimates compared to prior studies may facilitate additional research on the causes and consequences of the COVID-19 pandemic on population health and provide useful data for local area health policy and planning.

## Data Availability

The present investigation used publicly available data from the National Center for Health Statistics and the U.S. Census Bureau.

[https://data.cdc.gov/NCHS/AH-County-of-Occurrence-Provisional-COVID-19-Death/6vqh-esgs](https://data.cdc.gov/NCHS/AH-County-of-Occurrence-Provisional-COVID-19-Death/6vqh-esgs) 

## 6 Appendix A: Tables and Figures

View this table:
[Table A1:](http://medrxiv.org/content/early/2021/04/25/2021.04.23.21255564/T3)

Table A1: Prediction Accuracy of Different Models
*Notes*: Mean squared and mean absolute error of different statistical models of mortality. These metrics are computed using predicted values and actual mortality data for all counties for 2018 and 2019.

View this table:
[Table A2:](http://medrxiv.org/content/early/2021/04/25/2021.04.23.21255564/T4)

Table A2: COVID-19 and Excess Deaths For the Counties with the Highest Rates of Each
*Notes:* Direct COVID-19 and excess deaths for the 50 highest-ranking counties for each metric. Rates are per 1000 person-years. COVID-19 mortality data based on a provisional release from the National Center for Health Statistics (NCHS) by county of residence from January 1 to December 31, 2020 reported by March 12, 2021. Includes only counties with population above 12,500. Excess deaths are defined as actual deaths less predicted deaths for a given county and year.

![Figure A1:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/04/25/2021.04.23.21255564/F6.medium.gif)

[Figure A1:](http://medrxiv.org/content/early/2021/04/25/2021.04.23.21255564/F6)

Figure A1: Actual and Predicted Deaths for Six Counties
*Notes*: Actual and predicted total deaths for six counties, selected randomly for exposition, from 2011-2020. Predicted death counts are generated by our estimated Poisson GLM using data from 2011-2019. The shaded area represents a 95% prediction region for each point. These regions are generated using a parametric bootstrap.

![Figure A2:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/04/25/2021.04.23.21255564/F7.medium.gif)

[Figure A2:](http://medrxiv.org/content/early/2021/04/25/2021.04.23.21255564/F7)

Figure A2: Actual and Predicted Deaths for Six Counties
*Notes*:Actual and predicted total deaths per 1000 person-years for six of the ten largest counties, 2011-2020. Predicted death counts are generated by our estimated Poisson GLM using data from 2011-2019. The shaded area represents a 95% prediction region for each point. These regions are generated using a parametric bootstrap.

![Figure A3:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/04/25/2021.04.23.21255564/F8.medium.gif)

[Figure A3:](http://medrxiv.org/content/early/2021/04/25/2021.04.23.21255564/F8)

Figure A3: Distribution of Deaths Per 1000 Person-Years in 2019 vs 2020
*Notes*: Distributions, for 2019 and 2020, of all-cause deaths per 1000 person-years across all 3109 counties in our data. Distributions are depicted using a histogram overlaid with a kernel density estimate. 2019 mortality data are from the CDC Wonder database. All-cause mortality estimates for 2020 are based on provisional data from the National Center for Health Statistics (NCHS).

![Figure A4:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/04/25/2021.04.23.21255564/F9.medium.gif)

[Figure A4:](http://medrxiv.org/content/early/2021/04/25/2021.04.23.21255564/F9)

Figure A4: Distribution of Excess Deaths in 2019 vs 2020
*Notes*: Distributions, for 2019 and 2020, of excess deaths per 1000 person-years across all 3109 counties in our data. Excess deaths are defined as actual deaths less predicted deaths for a given county and year. For 2019, predicted deaths are computed as an in-sample fitted value. For 2020, predicted deaths are computed as an out-of-sample fitted value. 2019 mortality data are from the CDC Wonder database. All-cause mortality estimates for 2020 are based on provisional data from the National Center for Health Statistics (NCHS).

![Figure A5:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/04/25/2021.04.23.21255564/F10.medium.gif)

[Figure A5:](http://medrxiv.org/content/early/2021/04/25/2021.04.23.21255564/F10)

Figure A5: Adjusted Excess Death Rate by County
*Notes*: Heat map of adjusted excess death rate by county. Adjusted excess death rate is defined as the excess death rate divided by the standard error of the prediction for a given county and year. Note that estimates for counties in North Carolina may be unreliable due to reporting lags.

![Figure A6:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/04/25/2021.04.23.21255564/F11.medium.gif)

[Figure A6:](http://medrxiv.org/content/early/2021/04/25/2021.04.23.21255564/F11)

Figure A6: Actual and Predicted Deaths for Six Large New England Counties
*Notes*: Actual and predicted total deaths for six large New England counties. Predicted death counts are generated by our estimated Poisson GLM using data from 2011-2019. The shaded area represents a 95% prediction region for each point. These regions are generated using a parametric bootstrap.

![Figure A7:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/04/25/2021.04.23.21255564/F12.medium.gif)

[Figure A7:](http://medrxiv.org/content/early/2021/04/25/2021.04.23.21255564/F12)

Figure A7: Actual and Predicted Deaths for Six Large East South Central Counties
*Notes*: Actual and predicted total deaths for six large East South Central counties. Predicted death counts are generated by our estimated Poisson GLM using data from 2011-2019. The shaded area represents a 95% prediction region for each point. These regions are generated using a parametric bootstrap.

## 7 Appendix B: All Estimates

Table B1 below includes all of our primary estimates for each county. All values other than ratio are in units per 1000 person-years. All-cause and COVID-19 deaths are based on provisional data from the National Center for Health Statistics (NCHS). Lower CI and upper CI denote the approximate lower and upper bounds on the 95% prediction interval around predicted deaths in 2020. Excess is defined at the difference between all-cause and predicted deaths. Adjusted excess denotes the excess death rate divided by the standard error of the prediction. Residual denotes the difference between excess deaths and COVID-19 deaths. The ratio denotes the ratio of COVID-19 to excess deaths.

View this table:
[Table B1:](http://medrxiv.org/content/early/2021/04/25/2021.04.23.21255564/T5)

Table B1: Primary Estimates for Each County

## Footnotes

*   * The authors would like to thank Robert N. Anderson and Farida B. Ahmad from the National Center for Health Statistics, Kathy Hempstead from the Robert Wood Johnson Foundation (RWJF), and Abe Dunn from the Bureau of Economic Analysis (BEA) for their input and technical support. Stokes gratefully acknowledges financial support from the RWJF. The views expressed in this paper are those of the authors and not necessarily the views of the BEA or RWJF.

*   1 See, for example, McCullaugh and Nelder (1989)[15]

*   2 It may be desirable to explicitly include measures of such information to the extent that they vary significantly within counties over time. At present, we exclude such variables as few official county-level estimates of this type of information is presently available for 2020.

*   3 There could be positive serial correlation due persistent shocks, such as natural or other disasters, or there could be negative serial correlation due to survivorship bias. Our baseline estimate of *φ* is .011, which suggests that there is positive correlation on average.

*   4 We include trends at the state level, as opposed to the county level, to reduce overfitting. Empirically, adding county-level trends to this model reduces the out-of-sample predictive accuracy by a non-negligible amount. We have also explored allowing state-level trends to vary by urbanicity, but have found that this does not generally improve fit.

*   5 It is worth pointing out that, in addition to achieving the best predictive performance, the Poisson model has attractive robustness properties. Most importantly, achieving consistent parameter estimates does not depend on the assumption of a Poisson distribution. Moreover, our model allows for overdispersion or underdispersion of the outcome variable and there is no restriction on time dependence within clusters. See, for example, Gourieroux, Monfort, and Trognon (1984)[16] and Wooldridge (1999)[17] for details.

*   6 We selected 2011 as the first year in our sample by comparing the predictive performance of our baseline Poisson GLM on 2019 mortality using different starting windows from 2009 to 2015.

*   7 In the standard Poisson GLM, the variance of the outcome variable is assumed to be equal to the mean. In practice this is highly restrictive, and so variance functions that permit overdispersion are common. Cameron and Trivedi (2013)[18] detail a number such specifications.

*   8 This procedure is based on Gelman and Hill (2006)[19]. Woolf et al. (2020) use a similar procedure in an analogous setting at the state level to generate predictive intervals.[20]

*   Received April 23, 2021.
*   Revision received April 23, 2021.
*   Accepted April 25, 2021.


*   © 2021, Posted by Cold Spring Harbor Laboratory

This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at [http://creativecommons.org/licenses/by-nc-nd/4.0/](http://creativecommons.org/licenses/by-nc-nd/4.0/)

## References

1.  [1]. FB Ahmad,  RN Anderson, The leading causes of death in the US for 2020. JAMA (2021).
    
    
2.  [2]. FB Ahmad, Provisional mortality data — united states, 2020. MMWR Morb. Mortal. Wkly. Rep. 70 (2021).
    
    
3.  [3]. MV Kiang,  RA Irizarry,  CO Buckee,  S Balsari, Every body counts: Measuring mortality from the COVID-19 pandemic. Ann. Intern. Med. 173, 1004–1007 (2020).
    
    
4.  [4]. AB Friedman, et al., Delayed emergencies: The composition and magnitude of non-respiratory emergency department visits during the COVID-19 pandemic. J Am Coll Emerg Physicians Open 2, e12349 (2021).
    
    
5.  [5]. KP Hartnett, et al., Impact of the COVID-19 pandemic on emergency department visits - united states, january 1, 2019-may 30, 2020. MMWR Morb. Mortal. Wkly. Rep. 69, 699–704 (2020).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.15585/mmwr.mm6923e1&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32525856&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F04%2F25%2F2021.04.23.21255564.atom) 

6.  [6]. EC Matthay,  KA Duchowny,  AR Riley,  S Galea, Projected All-Cause deaths attributable to COVID-19-Related unemployment in the united states. Am. J. Public Health 111, 696–699 (2021).
    
    
7.  [7]. DA Leon, et al., COVID-19: a need for real-time monitoring of weekly excess deaths. Lancet 395, e81 (2020).
    
    
8.  [8]. LM Rossen,  AM Branum,  FB Ahmad,  P Sutton,  RN Anderson, Excess deaths associated with COVID-19, by age and race and ethnicity - united states, january 26-october 3, 2020. *MMWR Morb*. Mortal. Wkly. Rep. 69, 1522–1527 (2020).
    
    
9.  [9]. DM Weinberger, et al., Estimation of excess deaths associated with the COVID-19 pandemic in the united states, march to may 2020. JAMA Intern. Med. 180, 1336–1344 (2020).
    
    
10. [10]. SH Woolf,  DA Chapman,  RT Sabo,  EB Zimmerman, Excess deaths from COVID-19 and other causes in the US, march 1, 2020, to january 2, 2021. JAMA (2021).
    
    
11. [11]. AC Stokes, et al., Assessing the impact of the covid-19 pandemic on US mortality: A County-Level analysis. medRxiv (2020).
    
    
12. [12]. MR Spencer,  F Ahmad, Timeliness of death certificate data for mortality surveillance and provisional estimates. National Vital Statistics Rapid Release 001 (2016).
    
    
13. [13].National Center for Health Statistics, Technical notes: Provisional death counts for coronavirus disease ([https://www.cdc.gov/nchs/nvss/vsrr/covid19/technotes.htm)](https://www.cdc.gov/nchs/nvss/vsrr/covid19/technotes.htm)(2021)Accessed : 2021 − 3 − 29.
    
    
14. [14]. IT Elo,  AS Hendi,  JY Ho,  YC Vierboom,  SH Preston, Trends in Non-Hispanic white mortality in the united states by Metropolitan-Nonmetropolitan status and region, 1990-2016. Popul. Dev. Rev. 45, 549–583 (2019).
    
    
15. [15]. P McCullagh,  J Nelder, Binary data in Generalized linear models. (Springer), pp. 98–148 (1989).
    
    
16. [16]. C Gourieroux,  A Monfort,  A Trognon, Pseudo maximum likelihood methods: Theory. Econometrica: journal of the Econometric Society, 681–700 (1984).
    
    
17. [17]. JM Wooldridge, Distribution-free estimation of some nonlinear panel data models. Journal of Econometrics 90, 77–97 (1999).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0304-4076(98)00033-5&link_type=DOI) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000079082800004&link_type=ISI) 

18. [18]. AC Cameron,  PK Trivedi, Regression analysis of count data. (Cambridge university press) Vol. 53, (2013).
    
    
19. [19]. A Gelman,  J Hill, Data analysis using regression and multilevel/hierarchical models. (Cambridge university press), (2006).
    
    
20. [20]. SH Woolf, et al., Excess deaths from COVID-19 and other causes, March-July 2020. JAMA 324, 1562–1564 (2020).
    
    
21. [21]. JT Chen,  N Krieger, Revealing the unequal burden of COVID-19 by income, Race/Ethnicity, and household crowding: US county versus zip code analyses. J. Public Health Manag. Pract. 27 Suppl 1, COVID-19 and Public Health: Looking Back, Moing Forward, S43–S56 (2021).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1097/PHH.0000000000001263&link_type=DOI) 

22. [22]. GA Millett, et al., Assessing differential impacts of COVID-19 on black communities. Ann. Epidemiol. 47, 37–44 (2020).
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F04%2F25%2F2021.04.23.21255564.atom) 

23. [23]. SB Tan,  P deSouza,  M Raifman, Structural racism and COVID-19 in the USA: a County-Level empirical analysis. J Racial Ethn Health Disparities (2021).
    
    
24. [24]. AR Riley, et al., Excess death among latino people in california during the COVID-19 pandemic (2021).
    
    
25. [25]. YH Chen, et al., Excess mortality in california during the coronavirus disease 2019 pandemic, march to august 2020. JAMA Intern. Med. (2020).
    
    
26. [26]. M Boukhris, et al., Cardiovascular implications of the COVID-19 pandemic: A global perspective. Can. J. Cardiol. 36, 1068–1080 (2020).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.cjca.2020.05.018&link_type=DOI) 

27. [27]. JA Wolfson,  CW Leung, Food insecurity and COVID-19: Disparities in early effects for US adults. Nutrients 12 (2020).
    
    
28. [28]. SJ Lange, et al., Potential indirect effects of the COVID-19 pandemic on use of emergency departments for acute Life-Threatening conditions — united states, January–May 2020 (2020).
    
    
29. [29]. B Wu, Social isolation and loneliness among older adults in the context of COVID-19: a global challenge. Glob Health Res Policy 5, 27 (2020).
    
    
30. [30]. SJ Olsen, et al., Decreased influenza activity during the COVID-19 pandemic — united states, australia, chile, and south africa, 2020 (2020).
    
    
31. [31]. R Catalano,  M Maria Glymour,  YH Chen,  K Bibbins-Domingo, Sheltering in place and the likelihood of nonnatural death (2021).

 [1]: /embed/graphic-1.gif