County-Level Estimates of Excess Mortality Associated with COVID-19 in the United States ======================================================================================== * Calvin A. Ackley * Dielle J. Lundberg * Lei Ma * Irma T. Elo * Samuel H. Preston * Andrew C. Stokes ## Abstract The COVID-19 pandemic in the US has been largely monitored on the basis of death certificates containing reference to COVID-19. However, prior analyses reveal that a significant percentage of excess deaths associated with the pandemic were not directly assigned to COVID-19 on death records. In the present study, we estimate a generalized linear model of expected mortality in 2020 based on historical trends in deaths by county of residence between 2011 and 2019. We use the results of the model to generate county estimates of excess mortality and excess deaths not assigned to COVID-19 for each county in the US. Overall, we estimate that 437,849 excess deaths occurred in 2020, among which 88% were assigned to COVID-19. The proportion assigned to COVID-19 was lower in the West (82%) and South (83%) as compared to counties in the Midwest (90%) and Northeast (97%). Urban counties (87%) reported lower proportions of excess deaths assigned to COVID-19 than rural areas (93%). The New England Census Division stood out as the only division where directly assigned COVID-19 deaths actually exceeded excess deaths, indicating that reductions in mortality from other causes of death may have occurred. Across individual counties, the percentage of excess deaths not assigned to COVID-19 varied substantially, with some counties’ direct COVID-19 tallies capturing only a small fraction of total excess deaths. Our findings suggest that consideration of excess deaths across counties is critical for a full accounting of geographic inequities in mortality during the pandemic. ## 1 Introduction Estimates of excess deaths are critical to tracking the direct and indirect effects of the COVID-19 pandemic and for developing adequate and equitable policy responses.[1] Provisional estimates from the Center for Disease Control and Prevention (CDC) indicate that between 545,600 and 660,200 excess deaths occurred in the United States from January 26, 2020 to February 27, 2021.[2] The CDC further estimates that between 75 and 88% of excess deaths were directly assigned to COVID-19 on death certificates, suggesting that between 12 and 25% of excess deaths were not assigned to COVID-19.[2] Other prior estimates of excess mortality have also found significant discrepancies between direct COVID-19 deaths and excess mortality [2, 3, 4, 5]. Excess deaths not assigned to COVID-19 may reflect a variety of factors, including COVID-19 deaths that were ascribed to other causes of death due to limited testing,[6] indirect deaths caused by interruptions in the provision of health care services,[7, 8] or indirect deaths caused by the broader social and economic consequences of the pandemic.[9, 10, 11] Furthermore, the percent of excess deaths not assigned to COVID-19 varied significantly at the state-level, suggesting that attribution of deaths to COVID-19 may not be uniform across the country. Prior studies have found that the proportion of excess deaths not assigned to COVID-19 also differed significantly by county-level sociodemographic and health care factors, suggesting that state estimates mask significant community-level heterogeneity. [5, 12] While these studies produced estimates of the proportion of excess deaths not assigned to COVID-19 for groups of counties, they did not report individual county estimates. While prior estimates of excess mortality at the national and state levels are useful for understanding the impact of the pandemic on mortality levels broadly, estimation of excess mortality at the county level may be valuable for several reasons. First, prior studies of direct COVID-19 mortality demonstrate that states are heterogeneous units and that geographic differences in mortality are present within states.[13, 14] Second, deaths are registered at the county-level.[15] Thus, it is reasonable to assume that administrative differences may exist between counties in the processing and assignment of deaths. For example, a county with a large medical examiner office may have more resources for death investigation than a small coroner’s office, resulting in differential assignment to COVID-19. Another important reason to study excess mortality at the local level is that these data are essential for informing community and policy interventions. If a county’s direct COVID-19 tallies are substantially underestimated, measuring excess mortality at the county-level may be an important step to appreciating the full burden of the COVID-19 pandemic in an area and allocating response resources appropriately. Additionally, providing accurate data to residents could result in a positive behavior feedback loop, which encourages residents to understand the extent of mortality in their area and take protective actions such as wearing masks and pursuing vaccination. In the event that direct tallies substantially underestimate excess mortality, residents may not accurately assess risk in their area and thus will be less likely to take these steps, resulting in a negative behavior loop that leads to further disease spread throughout their community.[16] The objective of the present study is to generate valid estimates of excess mortality at the county-level and examine geographic variation in excess mortality and the proportion of excess deaths not assigned to COVID-19. Examining excess deaths at the county-level has the potential to identify counties with high excess mortality but low directly assigned COVID-19 mortality, indicating that COVID-19 is underreported on death certificates or substantial numbers of indirect deaths have occurred in these areas. These counties could represent regions that have been especially hard hit by the COVID-19 pandemic but whose mortality impacts did not appear in direct COVID-19 tallies and whose excess mortality has thus been hidden. In estimating excess mortality at the county-level, this study seeks to provide communities with estimates of the severity of the pandemic in their area which can be used to inform pandemic preparedness and response at the county, state, and national levels. ## 2 Data We used provisional data from the National Center for Health Statistics (NCHS) on COVID-19 mortality and all-cause mortality by county of residence from January 1 to December 31, 2020 reported by June 3, 2021. We used data with a twenty-two week lag (December 31, 2020 to June 3, 2021) to improve the completeness of data, since prior analysis of provisional NCHS vital statistics reveal low completeness within the month following a death but more than 75 percent completeness after eight-weeks.[17] COVID-19 deaths were identified using the International Statistical Classification of Diseases and Related Health Problems, Tenth Revision (ICD-10) code *U* 07.1 and include deaths assigned to COVID-19 as the underlying cause as well as deaths in which COVID-19 was reported as a cause that contributed to death on the death certificate. Prior reports indicate that COVID-19 was assigned as the underlying cause on death certificate in 92% of deaths.[18] For our historical comparison period, we used CDC Wonder data on all-cause mortality by county of residence from 2011 to 2019. To compute death rates, we used data on estimated populations from the U.S. Census Bureau for the years 2011 through 2020. To assess geographic patterns in mortality, we classify counties on the basis of urbanicity (urban/rural), metropolitan-nonmetropolitan categories (large central metro, large metro suburb, small/medium metro, and nonmetro)[19] and U.S. Census region (West, Northeast, South, and Midwest). The classifications for urbanicity were developed by the US Department of Agriculture (USDA) Economic Research Service (ERS) and modified by the National Center for Health Statistics. Large central metros include counties in metropolitan statistical areas with a population of more than 1 million. Large metro suburbs are counties that surround the large central metros. Small or medium metros include counties in metropolitan statistical areas with a population between 50,000 and 999,999. Nonmetropolitan areas include all other counties. We also examine patterns by 10 modified U.S. Census Divisions. These include the 9 Census Divisions (New England (CT, ME, MA, NH, RI, VT), Middle Atlantic (NJ, NY, PA), East North Central (IL, IN, MI, OH, WI), West North Central (IA, KS, MN, MO, NE, ND, SD), South Atlantic (DE, DC, FL, GA, MD, NC, SC, VA), East South Central (AL, KY, MS, TN), West South Central (AR, LA, OK, TX), Mountain (AZ, CO, ID, MT, NV, NM, UT, WY), and Pacific (AK, CA, HI, OR, WA)) and Appalachia, as defined by the Appalachian Regional Commission to include all of West Virginia and counties from 12 other states. We then stratified these divisions by metropolitan-nonmetropolitan categories to yield 40 distinct geographic units. Further details about these geographic units are provided by Elo et al.[19] Our provisional data includes 3139 counties with reported all-cause mortality for 2020. We exclude counties for which we do not have mortality information over the entire period from 2011 to 2019, resulting in a final sample of 3,080 counties. The present investigation relied on de-identified publicly available data and was therefore exempted from review by the Boston University Medical Center Institutional Review Board. Analyses were conducted using STATA 16 and R/R Studio. Additional details about the data along with programming code for replicating the analyses of the present study are available from the linked GitHub repository. ## 3 Methodology ### 3.1 General Model for County-Level Mortality To generate a prediction of expected mortality in 2020, we estimate a statistical model of mortality using historical mortality data from 2011-2019. Specifically, we model mortality at the county-year level using a quasi-poisson generalized linear model (QP-GLM) of the following form:1,2 ![Formula][1] Here, *Yit* denotes the number of all-cause deaths divided by the total population of county *i* in year *t*. In the linear index, we include a county-specific intercept term, *αi*, which captures latent characteristics of each county that may be correlated with mortality. Importantly, this term picks up relevant information such as the distribution of age and health in each county.3 We include one lag of the dependent variable, *Yt−*1, to capture potential serial correlation in mortality.4 We include a time trend, *t*, and allow the time trend to vary across counties according to *βi*. This accounts for demographic and other trends across granular geographies that may affect mortality. We use the model estimates described above to compute fitted values of total deaths and the death rate per 1000 person-years for each county and year from 2011-2020.5 ### 3.2 Excess Deaths We define excess deaths as the difference between the number of predicted all-cause deaths in 2020 and the number of observed all-cause deaths in 2020. For each county in our sample, we produced an excess death rate for 2020 as well as a ratio of observed to expected deaths. Additionally, we compute a z-score-like statistic, which we call adjusted excess deaths, by dividing the excess death estimate by the standard error of the prediction. This adjusts for county-level variation in the precision of expected death estimates. ### 3.3 Excess Deaths Not Assigned to COVID-19 We define excess deaths not assigned to COVID-19 as the difference between the number of excess deaths in 2020 and the number of observed directly assigned COVID-19 deaths in 2020. For each county in our sample, we decomposed the excess death rate into: (1) the observed death rate from COVID-19 and (2) the excess death rate not assigned to COVID-19. Next, we define the proportion of excess deaths assigned to COVID-19 as the ratio of the COVID-19 death rate to the excess death rate. ## 4 Results Across 3,080 counties in the U.S., we estimated that 437,849 excess deaths occurred in 2020. **Figure 1** shows the distribution of all-cause mortality, excess mortality, and excess mortality not assigned to COVID-19 across counties comparing the year 2020 to the years 2011 through 2019. The distribution of all-cause mortality, excess mortality, and excess mortality not assigned to COVID-19 were all shifted to the right compared to previous years, indicating that mortality was higher in most counties in 2020 than in prior years. ![Figure 1:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/07/21/2021.04.23.21255564/F1.medium.gif) [Figure 1:](http://medrxiv.org/content/early/2021/07/21/2021.04.23.21255564/F1) Figure 1: Distribution of All-Cause Deaths, Excess Deaths, and Excess Deaths Excluding COVID-19 Deaths per 100,000 Person-Years Across 3,080 U.S. Counties **Table 1** presents summary statistics of the number of excess deaths across U.S. Census Regions, U.S. Census Divisions, indicators of urbanicity, and metropolitan-nonmetropolitan categories. Among the total excess deaths, 151,160 excess deaths (34.5%) occurred in large metro areas, 94,260 (21.5%) occurred in large fringe metros, 121,835 (27.8%) occurred in small or medium metro areas, and 70,594 (16.1%) occurred in nonmetro areas. View this table: [Table 1:](http://medrxiv.org/content/early/2021/07/21/2021.04.23.21255564/T1) Table 1: Excess All-Cause Mortality and COVID-19 Mortality by US Census Region, Division, and Urbanicity Excess death rates were highest in nonmetro areas (153.2 deaths per 100,000 residents) and large metro areas (148.3 deaths per 100,000 residents) compared to small or medium metro areas (122.3 deaths per 100,000 residents) and large fringe metros (113.0 deaths per 100,000 residents). Across U.S. Census Divisions, excess mortality was highest in the Middle Atlantic (215.5 deaths per 100,000 residents), East South Central (152.7 deaths per 100,000 residents), East North Central (145.2 deaths per 100,000 residents) and West South Central (144.8 deaths per 100,000 residents). **Figure 2** shows observed deaths across counties in the U.S. as a percentage of expected deaths, highlighting counties with higher excess mortality in darker blue. This figure highlights the geographic dispersion of the COVID-19 pandemic, along with areas where excess mortality is notably higher such as the East South Central Division. ![Figure 2:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/07/21/2021.04.23.21255564/F2.medium.gif) [Figure 2:](http://medrxiv.org/content/early/2021/07/21/2021.04.23.21255564/F2) Figure 2: Observed All-Cause Deaths as a Percentage of Expected All-Cause Deaths across U.S. Counties Across the metropolitan-nonmetropolitan categories, excess death rates were highest in metro areas in some U.S. Census Divisions (Middle Atlantic, New England, East North Central, and Pacific) while excess death rates were highest in nonmetro areas in other Divisions (West South Central, East South Central, Appalachia, West North Central, South Atlantic, and Mountain). **Figure 3** shows excess death rates for each metropolitan-nonmetropolitan category across U.S. Census Divisions, highlighting the geographic variation in the urban-rural gradient. Metro areas in the Middle Atlantic Division reported the highest excess mortality, with 305 deaths per 100,000 residents. In contrast, nonmetro areas in New England and the Pacific Division had the lowest excess death rates, with 31 deaths per 100,000 residents and 33 deaths per 100,000 residents respectively. ![Figure 3:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/07/21/2021.04.23.21255564/F3.medium.gif) [Figure 3:](http://medrxiv.org/content/early/2021/07/21/2021.04.23.21255564/F3) Figure 3: Excess All-Cause Deaths per 100,000 Person-Years by U.S. Census Division and Metropolitan Status There was also significant variation in the proportion of excess deaths assigned to COVID-19 across counties in the U.S.. **Figure 4** shows the percentage of excess deaths assigned to COVID-19 across counties in the U.S., labeling counties with low assignment (between 25 and 100% of excess deaths not assigned to COVID-19), counties with moderate assignment (between 0 and 25% of excess deaths not assigned to COVID-19), and counties with high assignment (COVID-19 deaths exceed excess deaths). This map indicates that there is lower assignment of excess deaths to COVID-19 in specific U.S. Census Divisions, including East South Central (79.8% assigned to COVID-19), Mountain (81.7% assigned to COVID-19), Pacific (82.7% assigned to COVID-19), and West South Central (83.3% assigned to COVID-19). There is also lower assignment of excess deaths to COVID-19 in large central metro areas (79.6% assigned to COVID-19) compared to nonmetro areas (92.6% assigned to COVID-19). ![Figure 4:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/07/21/2021.04.23.21255564/F4.medium.gif) [Figure 4:](http://medrxiv.org/content/early/2021/07/21/2021.04.23.21255564/F4) Figure 4: Percent of All-Cause Excess Deaths not Assigned to COVID-19 across U.S. Counties Throughout the United States, there were also areas where COVID-19 deaths exceeded excess deaths, primarily in the New England Division. In this Division, 128.8% of the predicted excess deaths were assigned to COVID-19. **Appendix Figure A1** shows the proportion of excess deaths assigned to COVID-19 in New England across metropolitan-nonmetropolitan categories. Across all areas in New England, directly assigned COVID-19 deaths exceeded excess deaths. In nonmetro areas in New England, 146% of excess deaths were assigned to COVID-19, 140% in large fringe metro areas, 128% in medium or small metros, and 112% in large central metro areas. Figure 5 shows the counties in the U.S. with the highest excess mortality rates and the highest proportion of excess deaths not assigned to COVID-19. ![Figure 5:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/07/21/2021.04.23.21255564/F5.medium.gif) [Figure 5:](http://medrxiv.org/content/early/2021/07/21/2021.04.23.21255564/F5) Figure 5: Decomposition of Excess Deaths for U.S. Counties with High Excess Mortality Rates and Low COVID-19 to Excess Death Ratios **Appendix Figure A2** and **Appendix Figure A3** display a time series of observed death rates from 2011 to 2019 and a comparison of the 2020 predicted death rate and the 2020 expected death rate. In the East South Central region where a relatively larger share of excess deaths were not assigned to COVID-19, the observed death rate in 2020 substantially exceeds the expected 2020 death rate and the expected 2020 death rate plus COVID-19 deaths. In New England, however, where direct COVID-19 deaths exceeded excess deaths, the opposite occurs – with the expected death rate plus COVID-19 deaths exceeding the observed death rate in 2020. Few areas experienced negative excess mortality (where observed mortality was less than expected mortality). Even fewer among the counties had statistically significant negative excess death rates where the uncertainty intervals for the excess death rates did not overlap with 0. This suggests that most counties across the U.S. experienced positive excess mortality in 2020. **Appendix Table C1** provides estimates of excess death rates and their uncertainty intervals for 3,080 counties across the U.S.. These data are available for download as excess death rates and counts in a .csv file format as **Appendix Spreadsheet C2**. ## 5 Discussion In this study, we produced county-level estimates of excess mortality and the proportion of excess deaths not assigned to COVID-19 and examined geographic variation in mortality across 3,080 county units throughout the United States. Excess death rates in 2020 were highest in large central metro and nonmetro areas and in the Middle Atlantic, East South Central, East North Central, and West South Central Divisions. The proportion of excess deaths assigned to COVID-19 in 2020 was lowest in large central metro areas and in the East South Central, Mountain, Pacific, and West South Central Divisions. The New England Division was unique in reporting more direct COVID-19 deaths than excess deaths. Across counties in the U.S., we estimated that 437,849 total excess deaths occurred in 2020. This estimate is similar to an estimate of 458,000 excess deaths in the U.S. during 2020 produced by Islam et al. Recent estimates by Woolf et al. calculated 522,368 excess deaths between March 1, 2020, and January 2, 2021, which is higher than our estimate. Our estimate may be more conservative than Woolf et al. because we incorporated historical data from 2011 through 2019 for our calculations of expected mortality whereas Woolf et al. used trends from 2014 through 2019. Ahmed et al. found an increase of 503,976 deaths between 2019 and 2020, which is also higher than our estimate of excess mortality. However, these estimates were descriptive and did not control for recent mortality trends. Our study also reveals substantial heterogeneity in excess deaths and the proportion of excess deaths not assigned to COVID-19 across counties, which are the administrative unit for death registration. This highlights the value of studying excess mortality at the county-level since state and national-estimates mask significant variability within states. This finding is in line with studies of excess deaths in California that found significant differences in excess deaths by race/ethnicity and education, which vary by county.[21, 22] Consistent with previous results, we find evidence of wide gaps between excess mortality and directly assigned COVID-19 mortality in many areas across the United States. There are several potential explanations for the discrepancy between excess mortality and directly assigned COVID-19 mortality. One explanation is that the gap reflects underreporting of COVID-19 deaths. Especially early in the pandemic, testing was severely limited, which may have reduced the likelihood of COVID-19 being assigned to the death certificate. Underreporting may have also related to a lack of awareness of the clinical manifestations of COVID-19 early in the pandemic as well as various social, health care, and political factors.[23, 24, 12] In addition to underreporting, gaps between excess and direct mortality may in part be explained by the indirect effects of the pandemic on mortality levels. Indirect effects could relate to interruptions or delays in health care or the broader social and economic upheaval caused by the pandemic, including loss of employment, social isolation and loneliness, and other factors.[25, 26, 27] Many states have reported increases in overdose deaths during the COVID-19 pandemic, and NCHS data suggest that approximately 19,000 more deaths from unintentional injuries occurred in 2020 than in 2019.[28] Our study made use of data through December 31, 2020 reported by June 3, 2021, meaning that deaths could be reported for up to five months after they occurred. As a result, our estimate of the proportion of excess deaths not assigned to COVID-19 should be interpreted accordingly, indicating that after accounting for five months of potential reporting and processing delays, 12.4% of excess deaths were still not assigned to COVID-19. It is important to acknowledge that COVID-19 mortality surveillance data sources such as local health system dashboards that the public uses to interact with county-level data in real-time more substantially underestimate excess mortality due to significant delays in COVID-19 reporting that are not accounted for. Similarly, our estimate of the proportion of excess deaths not assigned to COVID-19 (12%) is lower than a previous study using county-level data (17%), which only incorporated a two and a half month delay between occurrence and reporting. We also observed a number of counties in which the direct COVID-19 death rate exceeded our estimates of excess mortality, especially in New England. This finding could have occurred for several reasons. First, increases in mortality in 2020 due to COVID-19 may have been offset by declines in deaths from other causes. Provisional data on cause of death for 2020 indicates that flu deaths declined significantly relative to prior years, which may explain part of the offset.[29] Other causes of death may also have experienced reductions in 2020. For example NCHS data indicates that there were approximately 2,600 fewer suicide deaths in 2020 relative to 2019.[28] Shelter-in-place policies may have also been associated with reductions in non-natural deaths.[30] Another reason direct deaths may have exceeded our estimates of excess deaths is if medical certifiers in a county over-assigned COVID-19 to death certificates. In addition, the differences could at least partially relate to how directly assigned COVID-19 deaths were counted by NCHS; while COVID-19 was listed as the underlying cause in the vast majority of cases, in about 8% of cases, COVID-19 was listed as a contributing cause and still directly assigned to COVID-19. Finally, frailty selection may have occurred if deaths from COVID-19 occurred among individuals who were likely to die from other causes, resulting in reductions in those causes of death. It is also important to note that in instances where directly assigned COVID-19 deaths exceeded excess deaths in the overall population, patterns could differ among population subgroups related to age, race/ethnicity, and income. Accurate county-level predictions of excess mortality, and their associated levels of uncertainty, are an important addition to existing work on excess mortality at the state and national level. County-level estimates can enable researchers to examine geographic variation in excess mortality and leverage county-level variation in sociodemographic, health, and structural factors to examine inequities in excess deaths. They may also be relevant to local public health departments, to be used in conjunction with their existing efforts to monitor deaths directly assigned to COVID-19. Accurate death tallies at the county level may play an important role in motivating individual and community responses, including vaccine uptake. Our study indicates that directly assigned COVID-19 death rates have been less accurate measures of excess mortality in areas such as the East South Central and Mountain Divisions, which are also areas that are experiencing the slowest vaccine uptake.[31] This analysis had several limitations. First, unlike prior state-level analyses which leveraged weekly data on deaths, the present study used cumulative data on COVID-19 and all-cause mortality for all of 2020. Given this limitation in the available data, it was not possible to examine changes in excess mortality over time or trends in the proportion of deaths not assigned to COVID-19 at the county level. Examining trends in excess mortality using small-area data is a priority for future research which may help to distinguish the direct effects of the pandemic from indirect consequences associated with interruptions in health care and the social and economic consequences of pandemic response measures. Second, when considering patterns of excess mortality across the United States, an important caveat is that age structure differs across counties. Since COVID-19 mortality is more common in older populations, some of the patterns observed across counties may simply reflect differences in age structure. Thus, an important future direction for county-level analyses of excess mortality is to age standardize the estimates when age-specific mortality data become available. Third, the provisional county-level mortality files released by the NCHS did not include information on cause of death, and therefore it was not possible to disentangle the sources of excess deaths in 2020. Decomposing excess deaths by cause of death will be critical to understanding why some counties have a higher fraction of unassigned deaths than others and the extent to which the discrepancies are explained by COVID-19 death undercounts versus indirect pandemic effects. For example, such an analysis might partition natural from non-natural deaths under the assumption that non-natural causes of death are unlikely to represent misasacribed COVID-19 deaths. Information on cause of death will also be valuable for understanding the extent to which declines in mortality from other causes have offset COVID-19 deaths, thereby leading to smaller estimates of both excess deaths and the percent of excess deaths that were not assigned to COVID-19. Given the potential for offset from other causes of death, it is likely our overall finding that 12% of excess deaths were not assigned to COVID-19 represents a lower bound on the percent unassigned. Finally, the data used in the present study are provisional in nature and may be subject to further corrections by the NCHS in the process of generating final death counts by cause of death for 2020. In conclusion, the present study builds on prior work by extending estimates of excess mortality and excess deaths not assigned to COVID-19 to US counties. The added geographic detail of these estimates compared to prior studies may facilitate additional research on the causes and consequences of the COVID-19 pandemic on population health and provide useful data for local area health policy and planning. ## Data Availability The present investigation used publicly available data from the National Center for Health Statistics and the U.S. Census Bureau. [https://data.cdc.gov/NCHS/AH-County-of-Residence-COVID-19-Deaths-Counts-2020/75vb-d79q](https://data.cdc.gov/NCHS/AH-County-of-Residence-COVID-19-Deaths-Counts-2020/75vb-d79q) ## 6 Appendix A: Tables and Figures ![Figure A1:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/07/21/2021.04.23.21255564/F6.medium.gif) [Figure A1:](http://medrxiv.org/content/early/2021/07/21/2021.04.23.21255564/F6) Figure A1: Ratio of COVID-19 Deaths to Excess Deaths by Census Division and Metropolitan Status ![Figure A2:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/07/21/2021.04.23.21255564/F7.medium.gif) [Figure A2:](http://medrxiv.org/content/early/2021/07/21/2021.04.23.21255564/F7) Figure A2: Comparison of Observed and Expected Deaths in Large Counties, 2011-2020 ![Figure A3:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/07/21/2021.04.23.21255564/F8.medium.gif) [Figure A3:](http://medrxiv.org/content/early/2021/07/21/2021.04.23.21255564/F8) Figure A3: Comparison of Observed and Expected Deaths in Large Counties, 2011-2020 ## 7 Appendix B: Details on Model Selection and Estimation To select our primary specification for a distribution and link function, we computed both in-sample and out-of-sample predictive accuracy for a number of candidate specifications. Specifically, we estimated for three canonical GLM specifications for continuous and count data: Gaussian-identity, Poisson, and gamma models using data from 2010-2018.6 We then used the estimated model parameters to make out-of-sample predictions of 2019 mortality, and computed the mean-squared prediction error and mean absolute prediction error for each. Some prior studies have used a simple average of mortality in prior years to compute excess mortality in 2020. This has the advantage of being much simpler than fitting a high-dimensional GLM, although it will fail to capture trends, and may overweight more recent years. To gauge the improvement (if any) of using a regression model rather than a prior mean to predict deaths, we compute the predictive accuracy using both a one-year lagged value as well as a six-year prior average. **Table A1** reports the mean squared and mean absolute prediction error, both in-sample and out-of-sample, associated with each model. Unsurprisingly, all of the GLMs provide a much better fit in terms of in-sample deviations than lagged value or prior mean. The GLMs also perform better in predicting mortality rates in 2019. Overall, the Gaussian and Poisson GLMs perform similarly on both metrics, with the Poisson providing a slighly better in-sample fit and the Gaussian a slightly better forecast of 2019. Given these results, and the results of using alternative windows for estimation and prediction, we chose the Poisson specification as our primary specification.7,8 To obtain predicted mortality in each county in 2020, we apply our estimated parametric conditional expectation function to 2020 data. ### 7.1 Uncertainty Intervals We also construct a 95% prediction interval around the predicted death rates for 2020 to help identify counties in which 2020 mortality falls outside the normal range of year-to-year fluctuations. Importantly, while the point estimates of predicted mortality do not depend on the particular variance function assumed, the prediction intervals depend on both the variance of estimated parameters and the variance of outcome variable.9 To compute the prediction intervals, we first compute cluster-robust standard errors (Wooldridge, 1999)[33] for the individual parameter estimates, and then perform a parametric bootstrap procedure over the parameter and outcome distributions.10 View this table: [Table B1:](http://medrxiv.org/content/early/2021/07/21/2021.04.23.21255564/T2) Table B1: Prediction Accuracy of Different Models ## 8 Appendix C: All Estimates Table C1 below includes all of our primary estimates for each county. Mortality rates are in units per 100,000 person-years. All-cause and COVID-19 deaths are based on provisional data from the National Center for Health Statistics. Lower CI and upper CI denote the approximate lower and upper bounds on the 95% prediction interval around expected and excess deaths in 2020. Note that the ratio of COVID-19 to excess deaths is negative in some cases, indicating that the point estimate on excess deaths is negative for the county. In these cases the ratio does not have a clear interpretation. View this table: [Table C1:](http://medrxiv.org/content/early/2021/07/21/2021.04.23.21255564/T3) Table C1: Primary Estimates for Each County ## Footnotes * * The authors would like to thank Robert N. Anderson and Farida B. Ahmad from the National Center for Health Statistics, Katherine Hempstead from the Robert Wood Johnson Foundation (RWJF), and Abe Dunn from the Bureau of Economic Analysis (BEA) for their input and technical support. Stokes gratefully acknowledges financial support from the RWJF. The views expressed in this paper are those of the authors and not necessarily the views of the BEA or RWJF. * 1 See, for example, McCullaugh and Nelder (1989)[20] * 2 Details on how our exact specification was chosen are given in Appendix B * 3 It may be desirable to explicitly include measures of such information to the extent that they vary significantly within counties over time. At present, we exclude such variables as few official county-level estimates of this type of information is presently available for 2020. * 4 There could be positive serial correlation due persistent shocks, such as natural or other disasters, or there could be negative serial correlation due to survivorship bias. Our baseline estimate of *φ* is .011, which suggests that there is positive correlation on average. * 5 We selected 2011 as the first year in our sample by comparing the predictive performance of our baseline Poisson GLM on 2019 mortality using different starting windows from 2009 to 2015. * 6 The negative binomial model was infeasible to estimate with our high-dimensional fixed effects * 7 Using alternative years to predict mortality results in the same general pattern of results, but with the Poisson and Gaussian specifications flip-flopping in terms of out-of-sample accuracy. * 8 It is worth pointing out that, in addition to providing good fit and predictive performance, the Poisson model has attractive robustness properties. Most importantly, achieving consistent parameter estimates does not depend on the assumption of a Poisson distribution. Moreover, our model allows for overdispersion or underdispersion of the outcome variable and there is no restriction on time dependence within clusters. See, for example, Gourieroux, Monfort, and Trognon (1984)[32] and Wooldridge (1999)[33] for details. * 9 In the standard Poisson GLM, the variance of the outcome variable is assumed to be equal to the mean. In practice this is highly restrictive, and so variance functions that permit overdispersion are common. Cameron and Trivedi (2013)[34] detail a number such specifications. * 10 This procedure is based on Gelman and Hill (2006)[35]. Woolf et al. (2020) use a similar procedure in an analogous setting at the state level to generate predictive intervals.[36] * Received April 23, 2021. * Revision received July 20, 2021. * Accepted July 21, 2021. * © 2021, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at [http://creativecommons.org/licenses/by-nc-nd/4.0/](http://creativecommons.org/licenses/by-nc-nd/4.0/) ## References 1. [1]. DA Leon, et al., COVID-19: a need for real-time monitoring of weekly excess deaths. Lancet 395, e81 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0140-6736(20)30933-8&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F21%2F2021.04.23.21255564.atom) 2. [2]. LM Rossen, AM Branum, FB Ahmad, PD Sutton, RN Anderson, Notes from the field: Update on excess deaths associated with the COVID-19 pandemic - united states, january 26, 2020-february 27, 2021. MMWR Morb. Mortal. Wkly. Rep. 70, 570–571 (2021). 3. [3]. DM Weinberger, et al., Estimation of excess deaths associated with the COVID-19 pandemic in the united states, march to may 2020. JAMA Intern. Med. 180, 1336–1344 (2020). 4. [4]. SH Woolf, DA Chapman, RT Sabo, EB Zimmerman, Excess deaths from COVID-19 and other causes in the US, march 1, 2020, to january 2, 2021. JAMA (2021). 5. [5]. AC Stokes, et al., COVID-19 and excess mortality in the united states: A county-level analysis. PLoS Med. 18, e1003571 (2021). 6. [6]. MV Kiang, RA Irizarry, CO Buckee, S Balsari, Every body counts: Measuring mortality from the COVID-19 pandemic. Ann. Intern. Med. 173, 1004–1007 (2020). 7. [7]. AB Friedman, et al., Delayed emergencies: The composition and magnitude of non-respiratory emergency department visits during the COVID-19 pandemic. J Am Coll Emerg Physicians Open 2, e12349 (2021). 8. [8]. KP Hartnett, et al., Impact of the COVID-19 pandemic on emergency department visits - united states, january 1, 2019-may 30, 2020. MMWR Morb. Mortal. Wkly. Rep. 69, 699–704 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.15585/mmwr.mm6923e1&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32525856&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F21%2F2021.04.23.21255564.atom) 9. [9]. EC Matthay, KA Duchowny, AR Riley, S Galea, Projected All-Cause deaths attributable to COVID-19-Related unemployment in the united states. Am. J. Public Health 111, 696–699 (2021). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2105/AJPH.2020.306095&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=33600244&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F21%2F2021.04.23.21255564.atom) 10. [10]. JS Faust, et al., Mortality from drug overdoses, homicides, unintentional injuries, motor vehicle crashes, and suicides during the pandemic, March-August 2020. JAMA 326, 84–86 (2021). 11. [11]. LE Egede, RJ Walker, Structural racism, social risk factors, and covid-19 - a dangerous convergence for black americans. N. Engl. J. Med. 383, e77 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMp2023616&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F21%2F2021.04.23.21255564.atom) 12. [12]. AC Stokes, et al., Association of health care factors with excess deaths not assigned to COVID-19. JAMA Network Open; Forthcoming (2021). 13. [13]. JT Chen, N Krieger, Revealing the unequal burden of COVID-19 by income, Race/Ethnicity, and household crowding: US county versus zip code analyses. J. Public Health Manag. Pract. 27 **Suppl 1**, **COVID-19 and Public Health: Looking Back, Moing Forward,** S43–S56 (2021). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1097/PHH.0000000000001263&link_type=DOI) 14. [14]. SB Tan, P deSouza, M Raifman, Structural racism and COVID-19 in the USA: a County-Level empirical analysis. J Racial Ethn Health Disparities (2021). 15. [15].I of Medicine, Medicolegal Death Investigation System: Workshop Summary. (The National Academies Press, Washington, DC), (2003). 16. [16]. E Gutierrez, A Rubli, T Tavares, Information and behavioral responses during a pandemic: Evidence from delays in COVID-19 death reports. (2021). 17. [17]. MR Spencer, F Ahmad, Timeliness of death certificate data for mortality surveillance and provisional estimates. National Vital Statistics Rapid Release 001 (2016). 18. [18].National Center for Health Statistics, Technical notes: Provisional death counts for coronavirus disease ([https://www.cdc.gov/nchs/nvss/vsrr/covid19/technotes.htm](https://www.cdc.gov/nchs/nvss/vsrr/covid19/technotes.htm))(2021)Accessed : 2021 - 3 - 29. 19. [19]. IT Elo, AS Hendi, JY Ho, YC Vierboom, SH Preston, Trends in Non-Hispanic white mortality in the united states by Metropolitan-Nonmetropolitan status and region, 1990-2016. Popul. Dev. Rev. 45, 549–583 (2019). 20. [20]. P McCullagh, J Nelder, Binary data in Generalized linear models. (Springer), pp. 98–148 (1989). 21. [21]. AR Riley, et al., Excess death among latino people in california during the COVID-19 pandemic (2021). 22. [22]. YH Chen, et al., Excess mortality in california during the coronavirus disease 2019 pandemic, march to august 2020. JAMA Intern. Med. (2020). 23. [23]. M Boukhris, et al., Cardiovascular implications of the COVID-19 pandemic: A global perspective. Can. J. Cardiol. 36, 1068–1080 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.cjca.2020.05.018&link_type=DOI) 24. [24]. AC Stokes, et al., Assessing the impact of the covid-19 pandemic on US mortality: A County-Level analysis. *medRxiv* (2020). 25. [25]. JA Wolfson, CW Leung, Food insecurity and COVID-19: Disparities in early effects for US adults. Nutrients 12 (2020). 26. [26]. SJ Lange, et al., Potential indirect effects of the COVID-19 pandemic on use of emergency departments for acute Life-Threatening conditions — united states, January–May 2020 (2020). 27. [27]. B Wu, Social isolation and loneliness among older adults in the context of COVID-19: a global challenge. Glob Health Res Policy 5, 27 (2020). 28. [28]. FB Ahmad, RN Anderson, The leading causes of death in the US for 2020. JAMA (2021). 29. [29]. SJ Olsen, et al., Decreased influenza activity during the COVID-19 pandemic — united states, australia, chile, and south africa, 2020 (2020). 30. [30]. R Catalano, M Maria Glymour, YH Chen, K Bibbins-Domingo, Sheltering in place and the likelihood of nonnatural death (2021). 31. [31].U.S. Department of Health & Human Services, ASPE predictions of vaccine hesitancy for COVID-19 vaccines by geographic and sociodemographic features ([https://aspe.hhs.gov/pdf-report/vaccine-hesitancy](https://aspe.hhs.gov/pdf-report/vaccine-hesitancy)) (2021) Accessed: 2021-5-1. 32. [32]. C Gourieroux, A Monfort, A Trognon, Pseudo maximum likelihood methods: Theory. Econometrica: journal of the Econometric Society, 681–700 (1984). 33. [33]. JM Wooldridge, Distribution-free estimation of some nonlinear panel data models. Journal of Econometrics 90, 77–97 (1999). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0304-4076(98)00033-5&link_type=DOI) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000079082800004&link_type=ISI) 34. [34]. AC Cameron, PK Trivedi, Regression analysis of count data. (Cambridge university press) Vol. 53, (2013). 35. [35]. A Gelman, J Hill, Data analysis using regression and multilevel/hierarchical models. (Cambridge university press), (2006). 36. [36]. SH Woolf, et al., Excess deaths from COVID-19 and other causes, March-July 2020. JAMA 324, 1562–1564 (2020). [1]: /embed/graphic-1.gif