Abstract
This paper assesses the age specificity of the infection fatality rate (IFR) for COVID-19 using seroprevalence results from eight national studies and fifteen regional studies as well as five countries that have engaged in comprehensive tracing of COVID-19 infections. The estimated IFR is close to zero for children and younger adults but rises exponentially with age, reaching 0.4% at age 55, 1.3% at age 65, 4.5% at age 75, and 15% at age 85. We find that differences in the age structure of the population and the age-specific prevalence of COVID-19 explain 90% of the geographical variation in population IFR. Consequently, protecting vulnerable age groups could substantially reduce the incidence of mortality.
Objective Determine age-specific infection fatality rates for COVID-19 to inform public health policies and communications that help protect vulnerable age groups.
Methods Studies of COVID-19 prevalence were collected by conducting an online search of published articles, preprints, and government reports. A total of 89 studies were reviewed in depth and screened. Studies of 31 locations satisfied the inclusion criteria and were included in the meta-analysis. Age-specific IFRs were computed using the prevalence data in conjunction with reported fatalities four weeks after the midpoint date of the study, reflecting typical lags in fatalities and reporting. Meta-regression procedures in Stata were used to analyze IFR by age.
Results Our analysis finds a exponential relationship between age and IFR for COVID-19. The estimated age-specific IFRs are close to zero for children and younger adults but reaching 0.4% at age 55, 1.3% at age 65, 4.5% at age 75, and 15% at age 85. We find that differences in the age structure of the population and the age-specific prevalence of COVID-19 explain 90% of the geographical variation in population IFR.
Discussion These results indicate that COVID-19 is hazardous not only for the elderly but also for middle-aged adults, for whom the infection fatality rate is two orders of magnitude greater than the annualized risk of a fatal automobile accident. Moreover, the overall IFR for COVID-19 should not be viewed as a fixed parameter but as intrinsically linked to the age-specific pattern of infections. Consequently, individual and collective efforts that minimize infections in older adults could substantially decrease total deaths.
As the COVID-19 pandemic has spread across the globe, some fundamental issues have remained unclear: How dangerous is COVID-19? And to whom? Answering these questions will help inform appropriate decision-making by individuals, families, and communities.
The case fatality rate (CFR), the ratio of deaths to reported cases, is commonly used in gauging disease severity. However, this measure can be highly misleading for SARS-CoV-2, the virus that causes COVID-19, because a high proportion of infections are asymptomatic or mildly symptomatic (especially for younger people) and may not be included in official case reports.1,2 Consequently, the infection fatality rate (IFR), the ratio of fatalities to infections, is a more reliable metric than the CFR in assessing the hazards of COVID-19.
Assessing the IFR for COVID-19 is difficult. As shown in Table 1, a recent seroprevalence study by the New York Department of Health estimated ~1·6 million infections among the 8 million residents of NYC, but only one-tenth of those infections were captured in reported COVID-19 cases.3,4 About one-fourth of reported cases were severe enough to require hospitalization, many of whom succumbed to the disease. All told, fatalities represented a tenth of reported cases but only a hundredth of all infections.
Introduction
While the NYC data indicate an IFR of ~1%, analyses of other locations have produced a wide array of IFR estimates, e.g., 0·6% in Geneva, 1·5% in England, and 2·3% in Italy. Indeed, a recent meta-analysis noted the high degree of heterogeneity across aggregate estimates of IFR and concluded that research on age-stratified IFR is “urgently needed to inform policymaking.”5
In this paper, we consider the hypothesis that the observed variation in IFR across locations may primarily reflect the age specificity of COVID-19 infections and fatalities. Consequently, this paper reports on a systematic review and meta-analysis of age-specific IFRs for COVID-19. Based on our findings, we are able to assess and contextualize the severity of COVID-19 and examine how age-specific prevalence affects population IFR and the total incidence of fatalities.
Methodology
To perform the present meta-analysis, we collected published papers and preprints on the seroprevalence and/or infection fatality rate of COVID-19 that were publicly disseminated prior to 13 August 2020. As described in Supplementary Appendix B, we systematically performed online searches in MedRxiv, Medline, PubMed, Google Scholar, and EMBASE, and we identified other studies listed in reports by government institutions such as the U.K. Parliament Office.6 Data was extracted from studies by three authors and verified prior to inclusion.
We restricted our meta-analysis to studies of advanced economies, based on current membership in the Organization for Economic Cooperation and Development (OECD), in light of the distinct challenges of health care provision and reporting of fatalities in developing economies.7 We also excluded studies aimed on measuring prevalence in specific groups such as health care workers.
Our meta-analysis encompasses two distinct approaches for assessing the prevalence of COVID-19: (1) seroprevalence studies that test for antibodies produced in response to the virus, and (2) comprehensive tracing programs using extensive live-virus testing of everyone who has had contact with a potentially infected individual. Seroprevalence estimates are associated with uncertainty related to the sensitivity and specificity of the test method and the extent to which the sampling frame provides an accurate representation of prevalence in the general population; see Supplementary Appendix C. Prevalence measures from comprehensive tracing programs are associated with uncertainty about the extent of inclusion of infected individuals, especially those who are asymptomatic.
Sampling frame
To assess prevalence in the general population, a study should be specifically designed to utilize a random sample using standard survey procedures such as stratification and weighting by demographic characteristics. Other sampling frames may be useful for specific purposes such as sentinel surveillance but not well-suited for assessing prevalence due to substantial risk of systemic bias. Consequently, our meta-analysis excludes the following types of studies:
Blood donor studies. Only a small fraction of blood donors are ages 60 and above—a fundamental limitation in assessing COVID-19 prevalence and IFRs for older age groups—and the social behavior of blood donors may be systematically different from their peers.8,9 These concerns can be directly investigated by comparing alternative seroprevalence surveys of the same geographical location. As of early June, Public Health England (PHE) reported seroprevalence of 8·5% based on specimens from blood donors, whereas the U.K. Office of National Statistics (ONS) reported markedly lower seroprevalence of 5·4% (CI: 4·3–6·5%) based on its monitoring of a representative sample of the English population.10,11
Hospitals and Urgent Care Clinics. Estimates of seroprevalence among current medical patients are subject to substantial bias, as evident from a pair of studies conducted in Tokyo, Japan: One study found 41 positive cases among 1071 urgent care clinic patients, whereas the other study found only two confirmed positive results in a random sample of nearly 2000 Tokyo residents (seroprevalence estimates of 3·8% vs. 0·1%).12,13
Active Recruitment. Soliciting participants is particularly problematic in contexts of low prevalence, because seroprevalence can be markedly affected by a few individuals who volunteer due to concerns about prior exposure. For example, a Luxembourg study obtained positive antibody results for 35 out of 1,807 participants, but nearly half of those individuals (15 of 35) had previously had a positive live virus test, were residing in a household with someone who had a confirmed positive test, or had direct contact with someone else who had been infected.14
Our critical review has also underscored the pitfalls of seroprevalence studies based on “convenience samples” of residual sera collected for other purposes. For example, two studies assessed seroprevalence of Utah residents during spring 2020. The first study analyzed residual sera from two commercial laboratories and obtained a prevalence estimate of 2·2% (CI: 1·2–3·4%), whereas the second study collected specimens from a representative sample and obtained a markedly lower prevalence estimate of 0·96% (CI: 0·4–1·8%).15,16 In light of these issues, our meta-analysis includes residual serum studies but we flag such studies as having an elevated risk of bias.
Comprehensive Tracing Programs
Our meta-analysis incorporates data on COVID-19 prevalence and fatalities in countries that have consistently maintained comprehensive tracing programs since the early stages of the pandemic. Such a program was only feasible in places where public health officials could conduct repeated tests of potentially infected individuals and trace those whom they had direct contact. We identify such countries using a threshold of 300 for the ratio of cumulative tests to reported cases as of 30 April 2020.17 That threshold was chosen based on comparisons of prevalence estimates and reported cases in the Czech Republic vs. Iceland; see Supplementary Appendix D. Studies of Iceland and Korea found that estimated prevalence was moderately higher than the number of reported cases, especially for younger age groups; see Supplementary Appendix E.18-20 Consequently, we make corresponding adjustments in inferring prevalence from reported cases for other countries with comprehensive tracing programs, and we identify these estimates as subject to an elevated risk of bias.
Measurement of fatalities
Accurately measuring total deaths is a substantial issue in assessing IFR due to time lags from onset of symptoms to death and from death to official reporting. Symptoms typically develop within 6 days after exposure but may develop as early as 2 days or as late as 14 days.1,21 More than 95% of symptomatic COVID patients have positive antibody (IgG) titres within 17-19 days of symptom onset, and those antibodies remain elevated over a sustained period.22-25 The mean time interval from symptom onset to death is 15 days for ages 18–64 and 12 days for ages 65+, with interquartile ranges of 9–24 days and 7–19 days, respectively, while the mean interval from date of death to the reporting of that person’s death is ~7 days with an IQR of 2–19 days; thus, the upper bound of the 95% confidence interval between symptom onset and reporting of fatalities is about six weeks (41 days).26
Figure 1 illustrates these findings in a hypothetical scenario where the pandemic was curtailed two weeks prior to the date of the seroprevalence study. This figure shows the results of a simulation calibrated to reflect the estimated distribution for time lags between symptom onset, death, and inclusion in official fatality reports. The histogram shows the frequency of deaths and reported fatalities associated with the infections that occurred on the last day prior to full containment. Consistent with the confidence intervals noted above, 95% of cumulative fatalities are reported within roughly four weeks of the date of the seroprevalence study.
As shown in Table 2, the precise timing of the count of cumulative fatalities is relatively innocuous in locations where the outbreak had been contained for more than a month prior to the date of the seroprevalence study. By contrast, in instances where the outbreak had only recently been contained, the death count continued rising markedly for several more weeks after the midpoint of the seroprevalence study.
Therefore, we construct age-specific IFRs using the seroprevalence data in conjunction with cumulative fatalities four weeks after the midpoint date of each study; see Supplementary Appendix F. We have also conducted sensitivity analysis using cumulative fatalities five weeks after the midpoint date, and we flag studies as having an elevated risk of bias if the change in cumulative fatalities between weeks 4 and 5 exceeds 10%.
By contrast, matching prevalence estimates with subsequent fatalities is not feasible if a seroprevalence study was conducted in the midst of an accelerating outbreak. Therefore, our meta-analysis excludes seroprevalence studies for which the change in cumulative fatalities from week 0 to week 4 exceeds 200%.
Metaregression procedure
To analyze IFR by age, we use meta-regression with random effects, using the meta regress procedure in Stata v16.27,28 We used a random-effects procedures to allow for residual heterogeneity between studies and across age groups by assuming that these divergences are drawn from a Gaussian distribution. Publication bias was assessed using Egger’s regression and the trim-and-fill method. See Supplementary Appendix G for further details of these procedures.
Results
After an initial screening of 962 studies, we reviewed the full texts of 89 studies, of which 43 studies were excluded due to lack of age-specific data on COVID-19 prevalence or fatalities.11-13,46-85 Seroprevalence estimates for two locations were excluded because the outbreak was still accelerating during the period when the specimens were being collected.15,86 Studies of non-representative samples were excluded as follows: 10 studies of blood donors, 4 studies of patients of hospitals and outpatient clinics, 4 studies with active recruitment of participants, and 2 studies of elementary schools.10,13,14,86-102 Supplementary Appendix H lists all excluded studies.
Consequently, our metaregression analyzes IFR data from 26 locations, which can be classified into three distinct groups:
Representative samples from 7 national studies (England, Hungary, Italy, Netherlands, Portugal, Spain, and Sweden) and from regional studies of Geneva, Switzerland and 4 U.S. locations (Atlanta, Indiana, New York, and Salt Lake City).16,29-39
Convenience samples for Belgium and 8 U.S. locations (Connecticut, Louisiana, Miami, Minneapolis, Missouri, Philadelphia, San Francisco, and Seattle).15,40
Comprehensive tracing programs for 5 countries (Australia, Iceland, Korea, Lithuania, and New Zealand).41-45
The metaregression includes results from the very large REACT-2 seroprevalence study of the English population.29 Thus, to avoid pitfalls of nested or overlapping samples, two other somewhat smaller studies conducted by U.K. Biobank and the U.K. Office of National Statistics are not included in the metaregression but are instead used in out-of-sample analysis of the metaregression results.11,103 Similarly, the metaregression includes a large representative sample from Salt Lake City, and hence a smaller convenience sample of Utah residents is included in the out-of-sample analysis along with two other small-scale studies.15,16,104,105 Supplementary Appendices I and J assess the risk of bias for each individual study and the risk that our metaregression results could be influenced by publication bias, respectively.
We obtain the following meta-regression results: where the standard error for each estimated coefficient is given in parentheses. These estimates are highly significant with t-statistics of −44·8 and 40·5, respectively, and p-values below 0·0001. The residual heterogeneity τ2 = 0·353 (p-value < 0.0001) and I2 = 96·8, confirming that the random effects are essential for capturing unexplained variations across studies and age groups. The adjusted R2 is 95·0%.
As noted above, the validity of this meta-regression rests on the condition that the data are consistent with a Gaussian distribution. The validity of that assumption is evident in Figure 3: Nearly all of the observations fall within the 95% prediction interval of the metaregression, and the remainder are moderate outliers.
Figure 4 depicts the exponential relationship between age and the level of IFR in percent. Evidently, the SARS-CoV-2 virus poses a substantial mortality risk for middle-aged adults and even higher risks for elderly people: The IFR is close to zero for younger adults but rises to 0·4% at age 55, 1·3% at age 65, 4·5% at age 75, 15% at age 85, and exceeds 25% for ages 90 and above. These metaregression predictions are well aligned with the out-of-sample IFRs; see Supplementary Appendix K.
As shown in Figure 5, the metaregression explains 90% of the geographical variation in population IFR, which ranges from ~0·5% in Salt Lake City and Geneva to 1·5% in Australia and England and 2·2% in Italy. The metaregression explains this variation in terms of differences in the age structure of the population and age-specific prevalence of COVID-19.
No publication bias was found using Egger’s test (p > 0.10), and the trim-and-fill method produced precisely the same estimate as obtained from the metaregression.
Discussion
Our meta-analysis indicates that COVID-19 poses a low risk for children and younger adults but is hazardous for middle-aged adults and extremely dangerous for older adults. Table 4 contextualize these risks by comparing the age-specific IFRs from our meta-regression analysis to the annualized risks of fatal automobile accidents or other unintentional injuries in England and in the United States.106,107 For example, an English person aged 55–64 years who gets infected with SARS-CoV-2 faces a fatality risk that is more than 200 times higher than the annual risk of dying in a fatal car accident.
This analysis also confirms that COVID-19 is far more deadly than seasonal flu. For example, during the influenza season of winter 2018–19 the U.S. population had ~63 million infections and 34 thousand fatalities, with a population IFR of 0·05% an order of magnitude lower than COVID-19; see Supplementary Appendix L.
These results indicate that the population IFR should not be interpreted as a fixed parameter of COVID-19 but as an outcome that depends on the extent to which public health measures and communications limit the incidence of infections among vulnerable age groups. To illustrate these considerations, we have constructed three scenarios for the U.S. trajectory of COVID-19 infections and fatalities; see Supplementary Appendix M. Each scenario assumes that U.S. prevalence rises to a plateau of around 20% but with alternative patterns of age-specific prevalence. In particular, if the prevalence becomes uniform across all age groups, this analysis projects that total U.S. fatalities would exceed 500 thousand and that the population IFR would converge to around 0·8%. By contrast, a scenario with relatively low incidence of new infections among vulnerable age groups would be associated with less than half as many deaths and a much lower population IFR of ~0·3%.
Our critical review underscores the substantial benefits of assessing prevalence using large-scale studies of representative samples of the general population (rather than convenience samples of blood donors or medical patients). Conducting such studies on an ongoing basis will enable public health officials to monitor changes in prevalence among vulnerable age groups and gauge the efficacy of public policy measures. Moreover, such studies will enable researchers to assess the extent to which antibodies to SARS-CoV-2 may gradually diminish over time as well as the extent to which advances in treatment facilitate the reduction of age-specific IFRs.
Our metaregression results are broadly consistent with the pathbreaking study of Ferguson et al. (2020), which was completed at a very early stage of the COVID-19 pandemic and characterized an exponential pattern of age-specific IFRs that was close to zero for children and much higher for middle-aged and older adults.108 Our results are also well-aligned with a more recent meta-analysis of population IFR; indeed, our age-specific analysis explains a very high proportion of the dispersion in population IFRs highlighted by that study.5 In contrast, our findings are markedly different from those of an earlier review of population IFR, mostly due to differences in selection criteria.109 Finally, the exponential pattern of our age-specific IFR estimates is qualititatively similar to that of age-specific CFRs but the magnitudes are systematically different, as shown in Supplementary Appendix N.
This meta-analysis has focused on the role of age in determining the IFR of COVID-19 but has not incorporated other factors that may have significant effects on IFR. For example, a recent U.K. study found that mortality outcomes are strongly linked to specific comorbidities such as diabetes and obesity but did not resolve the question of whether those links reflect differences in prevalence or causal effects on IFR.110 See Supplementary Appendix O for additional evidence. Likewise, we have not considered the extent to which IFRs may vary with other demographic factors such as race and ethnicity.46,77 Further research on these issues is clearly warranted.
It should also be noted that our analysis has focused exclusively on the incidence of fatalities but has not captured the full spectrum of adverse health consequences of COVID-19, some of which may may be severe and persistent. Further research is needed to assess age-stratified rates of hospitalization as well as longer-term sequelae attributable to SARS-CoV-2 infections.
In summary, our meta-analysis demonstrates that COVID-19 is not just dangerous for the elderly and infirm but also for healthy middle-aged adults. The metaregression explains 90% of the geographical variation in population IFR, indicating that the population IFR is intrinsically linked to the age-specific pattern of infections. Consequently, public health measures to protect vulnerable age groups could substantially reduce the incidence of mortality.
Data Availability
This study is a meta-analysis using information from published articles, preprints, and government reports; all sources are listed in the bibliography with active URLs. The data and Stata code used in performing the meta-regression analysis are provided as Supplementary Materials.
Affiliations
Levin is a professor of economics at Dartmouth College, research associate of the NBER, and international research fellow of the Centre for Economic Policy Research. Meyerowitz-Katz is an epidemiologist at the University of Wollongong and research monitoring and surveillance coordinator at the Western Sydney Local Health District. Owusu-Boaitey has a Ph.D. in immunology and is currently engaged in graduate work in medicine and bioethics at the Case Western Reserve University School of Medicine. Cochran and Walsh are recent graduates of Dartmouth College.
Declaration
The authors have no financial interests nor any other conflicts of interest related to this study. No funding was received for conducting this study. The views expressed here are solely those of the authors and do not represent the views of any other person or institution.
Footnotes
Assessment of publication bias. Uses metaregression results to analyze variations in population IFR across geographical locations. Supplemental materials updated. Main text edited for clarity.