Abstract
Background There are significant gaps in our understanding of the mortality effects of COVID-19 due to evolving diagnosis criteria, shortages of testing supplies, and challenges faced by physicians in treating patients in crisis environments. Accurate information on the number of deaths caused by COVID-19, both population wide and for demographic subgroups, is vital for policy makers and health care providers.
Methods We performed a retrospective study of weekly data for Ohio, a large American state. To estimate expected mortality in 2020 we employed data from 2010 through 2019, adjusted for secular trends and seasonality. We estimated excess mortality as the number of observed deaths less the number of expected deaths. We conducted the analysis for the entire population and by gender, race, age, and county of residence.
Findings We estimated 1,485 (95% CI 680-2,345) excess all-cause deaths in Ohio from March 15, 2020 through May 23, 2020. When limited to deaths due to natural causes, the estimated excess number of deaths increased to 2,504 (95% CI 1,633-3,221), reflecting the countervailing effect of a decrease in deaths due to external causes. While the largest number excess of deaths was observed in the 80+ age group, excess deaths comprised 45.3% (95% CI 21.8-60.9) of observed deaths in the groups corresponding to ages between 20 and 49 years old. Our estimate of 729 (95% CI 355-966) excess deaths for this group substantially exceeds the reported number of COVID-19 deaths of 51. We found elevated excess deaths for older individuals, blacks, and males.
Interpretation Our methodology addressed some of the challenges of estimating the number of deaths caused by COVID-19. Our finding of high proportional levels of excess deaths among younger age groups suggests that increases in the infection rates for this cohort may have a greater mortality impact than expected.
Funding None.
Introduction
A primary challenge in understanding COVID-19 has been the difficulty in accurately determining whether a death was caused by the novel coronavirus SARS-CoV-2. Evolving diagnosis criteria,1 testing supply constraints and the prioritization of tests for living patients,2 and the “fog of war” present in burdened intensive care units3 can pose obstacles to physicians in correctly identifying COVID-19 deaths. Accurate mortality data are crucial for effective medical and policy responses to the virus.
A potentially insightful approach to estimating COVID-19 deaths is to use historical data aggregated across multiple causes of death to estimate the expected (or counterfactual) number of deaths. The number of excess deaths due to the virus can then be estimated as the observed number of deaths less the expected number. This approach abstracts from errors due to the incorrect coding of deaths if the reported cause of death falls in the same grouping as the true cause of death. This methodology has been used to estimate deaths following natural disasters4–7 and deaths due to COVID-19.8–15 The COVID-19 analyses found varying levels of excess mortality, but largely focused on overall mortality with the exceptions of a study of Massachusetts that found little difference in excess mortality by gender8 and other studies that found excess deaths increased with age.11,13,15 The authors are unaware of any studies that employ this technique to estimate excess deaths by race.
While the existing COVID-19 excess mortality retrospective studies provide important insight, they can have significant limitations. Their use of all-cause mortality likely understates the effects of COVID-19 by the inclusion of deaths by external causes, which were likely reduced during the outbreak due to reductions in accidents. Differential effects on excess deaths by race remain unknown and little is known how excess deaths compare to reported COVID-19 deaths by demographic group. Outside of studies of New York City,9,10 there have been no analyses of sub-state areas.
The goal of this study was to estimate excess mortality in Ohio, the seventh-largest U.S. state and a state that provides comprehensive health data online. In many ways the state is representative of the country, with a similar racial and age profile and comparable median income.16 The state has been aggressive in controlling the COVID-19 outbreak by being the first state in the country to close its schools17 and soon thereafter closing restaurants and bars.18 The state began allowing businesses to reopen on May 4, 2020 and for the next eight weeks experienced falling rates of new cases and COVID-19 hospitalizations.19 We analyzed excess overall mortality as well as by demographic group. We further compared our excess mortality estimates to the number of reported COVID-19 deaths.
Methods
Study design and data sources
We performed a weekly time series analysis in which ten years of historical data were used to estimate the expected number of deaths in 2020. We defined excess mortality as the number of observed deaths less the expected number. Our mortality data were obtained from the Ohio Public Health Information Warehouse published by the Ohio Department of Health (ODH). The data for 2010-2018 were finalized, while data for 2019-2020 were provisional. The population data employed were the 2018 vintage version of bridged-race population postcensal estimates published by the National Vital Statistics System branch of the U.S. Centers for Disease Control and Prevention. Population data were not available for 2019 and 2020. To approximate those values, for each subgroup we calculated the 2013-2018 compound annual growth rate and applied the rate to the 2018 value. We obtained the number of reported COVID-19 deaths from the ODH’s COVID-19 dashboard. These data are reported by county health departments to the state via the Ohio Disease Reporting System. For each death the county of residence, gender, and age are reported, as well as the dates of COVID-19 onset, hospital admission, and death. The study population was all Ohio residents from January 3, 2010 through May 23, 2020.
Statistical analysis
Our baseline period was the full set of epidemiological weeks for each year from 2010 through 2019. The analytic period begins with the week ending March 21, 2020, which was the week during which the first COVID-19 death was reported in Ohio, and ends with the week ending May 23, 2020. The observed number of deaths refers to the actual number of deaths in 2020, while the expected number of deaths corresponds to the estimate based on data from the baseline period. As the variation in the number deaths is greater than the mean, an over-dispersed log-linear model was used to estimate the expected number of deaths. The exposure variable was the relevant population. Annual indicator variables were included to account for secular trends in the number of deaths, while harmonic variables consisting of four Fourier terms were used to adjust for seasonality. The model was estimated separately for all residents and by gender, age group, race, and county. The estimating equation was: where deaths is the number of deaths for subgroup s in epidemiological year i and week t. Three models were estimated for all residents: one for all-cause deaths, one for deaths due to natural causes, and one for deaths due to external causes. The models for the subgroups were based on deaths due to natural causes.
The variation in the expected number of deaths was modeled via parametric bootstrapping and follows an earlier study of COVID-19 excess mortality.9 From each regression we obtained the asymptotic covariance matrix, which was then used along with the estimated parameter values to specify a multivariate normal distribution to approximate the sampling distribution. One hundred samples were drawn from this distribution from which the mean number of deaths was calculated for each. We then drew one hundred samples from the Poisson distribution of each of these means, resulting in 10,000 samples for each subgroup and week. The 95% confidence interval bounds were estimated as the 2.5th and 97.5th percentile of the distribution. We also calculated the number of observed deaths less the number of reported COVID-19 deaths for each subgroup.
The cause of death was based on the primary underlying cause code recorded on the death certificate. Causes of death were grouped based on the 39 selected causes as described in the Supplementary Material. By using deaths due to natural causes as our primary outcome, we abstracted from potential indirect (and likely downward) effects on the deaths due to external causes that resulted from stay-at-home behaviors including a reduction in driving and work-related accidents.
Funding source
This study did not receive any external funding.
Results
During our analytic period the state of Ohio reported 2,167 COVID-19 deaths. Over those same weeks, there 25,523 all-cause deaths and 24,359 deaths due to natural causes. For all residents, the excess numbers of all-cause deaths and deaths due to natural causes during the same period were 1,485 (95% CI 680-2,345) and 2,405 (95% CI 1,633-3,221), respectively (figure 1). The number of excess deaths due to natural causes is larger than the value across all causes because of the shortfall in the number of deaths due to external causes. The reported number of COVID-19 deaths appears to largely explain the excess deaths across all residents, as the observed number of deaths less the reported number of COVID-19 deaths falls in the 95% confidence interval for the expected number of deaths.
The expected and observed numbers of deaths shown in figures 2 through 6 are based only on deaths due to natural causes. Outside of the youngest, all of the age groups have at least one week in which the observed number of deaths exceeded the 95% confidence interval for the expected number of deaths (figure 2). The 80+ years old group had the most excess deaths at 986 (95% CI 497-1,529). While the number of deaths were substantially fewer than for the older age groups, the 20-29, 30-39, and 40-49 groups had the largest proportions of excess deaths relative to actual deaths. The corresponding values were 59.3% (95% CI 33.9-79.8), 54.7% (95% CI 29.3-68.7), and 36.3% (95% CI 14.7-49.9). The number of excess deaths across these three groups was 729 (95% CI 355-966), yet the corresponding number of reported COVID-19 deaths was only 51. To better understand the causes underlying the excess deaths for those aged 20-49, we summarized the 2010-2019 average and 2020 number of deaths for the top ten causes of death (figure 3). The excess deaths appear to be largely driven by the cause “Symptoms, signs, and abnormal findings, not elsewhere classified.” Yet the ICD-10 code for deaths caused by COVID-19 (U07.1) is contained in the cause “All other diseases (Residual)”, which experienced a modest increase in 2020.
The estimated number of excess deaths is higher for males (1,404, 95% CI 881-1,966) than females (965, 95% CI 456-1,555) (figure 4). Blacks appear to have suffered a disproportionate number of excess deaths (figure 5). The excess deaths as a proportion of observed deaths for blacks was 18.7% (95% CI 10.2-26.3), which is significantly higher than the corresponding proportion for whites (8.5%, 95% CI 5.1-12.1). As noted above, the number of reported COVID-19 deaths is not available by race. The estimates for the ten most populated counties in Ohio, listed in descending population order, suggest that there is not a clear association between population and the rate of excess deaths (figure 6). The most populous county (Franklin) has the most excess deaths (453, 95% CI 250-681), but the number of excess deaths in each of the next four most-populated counties does not statistically differ from zero. In terms of excess deaths as a proportion of observed deaths, the tenth most-populated county (Mahoning) has the highest point estimate at 37.0% (95% CI 14.2-51.5) while the second highest is associated with the sixth most-populated county (Lucas) at 34.4% (95% CI 15.9-47.1). The number of reported COVID-19 deaths in Franklin county was approximately 57% of the estimated excess deaths.
Discussion
We analyzed administrative data and employed a rigorous methodology to provide insight into the difficult task of identifying deaths caused by COVID-19. We identified a greater number of excess deaths when we limited our estimate to deaths due to natural causes rather than deaths due to all causes, which include external causes. Our findings indicate potential disparities in the mortality effects of COVID-19, with excess mortality especially pronounced among those aged 80 and over, men, and blacks.
Our analysis of all residents demonstrates the magnitude of excess mortality due to COVID-19 and shows how the extent becomes more pronounced when the analysis is limited to deaths due to natural causes. The substantial shortfall in deaths due to external causes may be partly due to delays in determinations from the need for investigations.20 However, the timing of the shortfall coincides with school and restaurant closures and suggests stay-at-home behavior may have obscured the true impact of COVID-19 on excess mortality. Our investigation reveals, given the potential behavioral effects on deaths due to external causes, the value of examining deaths due to natural causes rather than exclusively relying on all-cause mortality.
Our finding of large mortality effects among older individuals is consistent with previous studies that employ alternative estimation approaches.21-23 However, the especially large proportional amount of excess mortality in age groups with 20-49 years has not been observed elsewhere. Given younger populations have experienced higher infection rates after the initial wave of COVID-19 in many geographical areas, our finding may be especially important as officials revise guidance and policy.
The higher proportional excess mortality that we observe for blacks is in agreement with an earlier study that found counties with higher proportions of black residents had higher relative risks of COVID-19 deaths.24 The disparity in risk of COVID-19 death echoes disparities blacks experience for many other health conditions.24 Given the scarce reporting of COVID-19 deaths by race, our approach can be a valuable tool to obtain insight into potential differential impacts. Our findings of increased excess deaths for males is consistent with prior studies that analyzed hospital records to assess the risk factors of COVID-19 death.25,26 The substantial variation across counties that we observed is consistent with the varying levels of infection within countries.
The number of reported COVID-19 deaths largely explains the levels of excess deaths that we estimate, but to varying degrees. The large gap between the estimated number of excess deaths and reported COVID-19 deaths for those aged 20-49 years suggests that COVID-19 deaths for this cohort may be under-reported by local health departments. Further research is warranted to determine whether this finding is specific to Ohio. If this phenomenon is more widespread, it could suggest substantial under-stating of mortality risk to younger individuals. Also worthy of further investigation is the apparent miscoding that we observed where presumably the vast majority of COVID-19 deaths in the 20-49 age group were not assigned the correct ICD-10 code (U07.1). Such miscoding can prevent analysts from correctly assessing the true mortality impact of COVID-19.
Our study has several limitations. Our analysis was limited to one, albeit relatively large, U.S. state. There has been significant variation in the effects of COVID-19 across and within countries, so our findings do not necessarily apply universally. Our data were limited to the first several months of the outbreak and thus our results may not pertain to later stages. The methodology we employ does not definitively identify excess deaths as being due to COVID-19. Stressful events such as COVID-19 can lead to negative health outcomes such as increased incidence of cardiovascular events,27 poor medication adherence,28 and hypertension.29 Potential increases in mortality due to these types of effects would have been included in our estimates of excess deaths.
Our mortality data were imperfect in that the 2019 and 2020 data were provisional and not finalized. However, as noted above, undercounting of deaths in those years should largely be limited to deaths due to external causes as these deaths often require lengthy investigation before being coded. Further, we pulled the data more than three weeks after May 23, 2020, which should have provided sufficient time for nearly all deaths due to natural causes to be properly recorded. As noted in the Supplemental Material, some deaths were excluded from the analysis due to unusable data. However, the number of lost observations was minimal. The mortality data were based on state and county of residence at the time of death, rather than the location of occurrence. This choice was made to align the data with the reported number of COVID-19 deaths, but insight could be gained by exploring potential differences from basing the data on where the death occurred. The number of reported COVID-19 deaths did not include information regarding the decedent’s race. As noted in the Supplementary Material, the date of death had to be approximated for six of the 2167 deaths in the reported COVID-19 deaths. Population data were only available through 2018, thus values for 2019 and 2020 had to be estimated.
The approach we employed indicated important differences in the effects of COVID-19 across demographic groups and identified potential shortcomings in published data. Our methodology can be applied in jurisdictions that provide timely access to death certificate data. While the results may not offer perfect insight into the precise causes of death, they can provide information to help craft effective policy and medical responses. Further, the methodology is robust to the imperfect coding of deaths that is common in crisis environments.
While our study is specific to Ohio, given the state’s large population and similarities with the U.S. as a whole, our results may improve the understanding of COVID-19 elsewhere. Policy makers, public health officials, and health care providers may be able to apply our findings to mitigate the destructive effects of COVID-19 and better prepare for future epidemics.
Data Availability
The data and code employed in the analysis are available at https://github.com/troyquast/covid19_ohio.
Supplementary Material
Data sources
The Ohio Public Health Information warehouse is located at http://publicapps.odh.ohio.gov/EDW/DataCatalog. The data for the analyses of the aggregated number of all causes of death, deaths due to natural causes, and deaths due to external causes were obtained on June 13, 2020. The data for the specific causes of death for 20-49 years old was obtained on June 17, 2020.
The Ohio COVID-19 dashboard is located at https://coronavirus.ohio.gov/wps/portal/gov/covid-19/dashboards/overview. These data were obtained on June 15, 2020 and were limited to deaths that occurred through May 23, 2020. There were valid dates of death during the analytic period for 2161 deaths. Twelve observations were listed as “Unknown” dates of death. Two of these observations had a valid hospital admission date. For these observations, the date of death was approximated as seven days after the admission date. Of these two observations, one had an approximated date that fell within the analytic period and was thus included in our data. The ten remaining observations did not have a valid hospital admission date but did have a valid onset date value. For these observations, the date of death was approximated as fourteen days after the onset date. Of these ten observations, five had an approximated date that fell within the analytic period and was thus included in the sample. The final dataset contained 2,167 deaths.
The data and code employed in the analysis are available at https://github.com/troyquast/covid19_ohio.
Acknowledgements
We thank the Ohio Department of Health for publishing timely mortality and COVID-19 data. The Department specifically disclaims responsibility for any analyses, interpretations or conclusions.
Footnotes
Charles University and Motol University Hospital, Department of Neurology, Prague
International Clinical Research Center, St. Anne’s University Hospital, Brno