Abstract
A variety of predisposing factors have been associated with serious illness and death from COVID-19. Understanding the distribution of risks associated with these factors by local communities can provide important opportunities for targeting interventions. We characterize the distribution of risk for COVID-19 mortality for populations at large across 442 US cities, by utilizing recently published estimates of risk associated with age, gender, ethnicity, social deprivation and 12 health conditions from a very large UK-based study, combined with the information available on prevalence and co-occurrence of these factors in the US through a variety of population-based public databases. We estimate that across all the cities, an underlying weighted risk-score can identify a total of approximately 12.65 million, 4.09 million and 1.34 million individuals who are at 2−, 5− and 10-fold higher risk, respectively, compared to the average risk for the US population. The percentage of population which exceed the respective risk thresholds varies across the cities in the range (1st-99th percentile), 3.6%−20.1%, 0.7%−8.0% and 0.1%−3.2%, respectively. The percentage of deaths within a city that are expected to occur above these risk-thresholds varies in the range of 20.1%−53.5%, 8.5%−38.2% and 2.9%−25.4%, respectively. Our analysis can provide guidance to national and local policy makers regarding resources needed to protect the most vulnerable populations in these communities, and how much utility such interventions may have in reducing the total population burden of death.
Introduction
The first case of SARS-CoV-2 infection in the US was reported on January 20th, 2020, in the state of Washington1,2, and to date the pandemic has led to nearly 100,000 COVID-19 deaths – making US by far the most affected country globally. There is, however, major variation in rates of infections and underlying deaths across US states, counties and cities. Various local population characteristics, such as mitigation measures3,4, population density and mobility patterns5,6 define background risks of illness and death across the regions. Further, epidemiologic studies7–16 are providing evidence for pre-disposing factors that can put individuals at differential risks of serious illness and mortality.
In the US, both the number of reported daily infections and the number of reported daily deaths have recently reached peak, but the post-peak decline of these numbers has been slow17. During the first phase of the pandemic, US and other countries have relied on broad and strict intervention measures, such as country/state-wide lockdowns and travel restrictions. However, as it becomes evident that the pandemic is likely to last for months and possibly years to come, mitigation efforts in the future will rely on both broad but more relaxed measures, such as social distancing, and more strict intervention for targeting towards high-risk populations and individuals. Clearly, a large fraction of deaths has occurred among individuals of old age, and in US and other western countries, community living in nursing home settings has been a major source of risk for these individuals. Further, serious illness and death have been shown to be more common among male, various minority populations, and individuals with selected health conditions10,14. As lockdown and travel restrictions are lifted, measures will need to stay in place to protect these high-risk individuals through “shielding”18 and prioritization for scarce preventive resources19,20. As future planning for such effort requires understanding the size of “high-risk” populations, a few studies have now emerged to provide such information for UK21, US22 and globally by nations and regions23. All of these studies, however, define high-risk group in a broad fashion based on risk-factor prevalence, without specific definition of the level of the underlying risk.
In this article we report results from our study for estimating the size of general populations who are at various levels of risk for COVID-19 mortality due to predisposing factor across a large number of US cities. We use recently published results from a large UK-based study on risk of mortality associated with a variety of predisposing factors, which could influence risk of infection or fatality or both14. We define a risk-score based on multivariate adjusted risk estimates and combine it with information on prevalence and co-occurrence of these factors from data sources available from various national agencies. We use a series of novel methods to obtain estimates of proportion of individuals within each city who exceed different risk-thresholds. We also provide projections for the number of deaths that are expected to arise within the defined high-risk groups, as a percentage of the total number of deaths in the underlying city populations.
Results
We observe wide variation in the underlying risk-score values across individuals who participated in National Health Interview Survey (Supplementary Figure 1). The value of the risk-score at the 99th and 1st percentile of the distribution corresponds to a risk ratio of approximately 8-fold among the age-group 18–39, and 305-fold among the age group 40+. Overall, we observe that 12.3%, 4.4% and 1.4% of individuals are at or above risk-thresholds associated with elevated (>2-fold), high (>5-fold) and very-high risk (>10-fold) categories (Table 1). A small, but not negligible, fraction of the population exceeds the threshold for extremely high-risk (25-fold). The percentage of the populations exceeding these thresholds vary strongly by age. In particular, only a small fraction (<3%) of individuals who are younger than 70 exceed the threshold for high-risk. In contrast, majority of the people who are 80 years or older are at high-risk, and a quarter of them are at very high-risk. We further examine the distribution of various other risk factors among individuals in the defined high-risk groups (Supplementary Figures 2–3). As expected from the nature of risk-factor association, male, Hispanic and African Americans, and individuals with obesity and various health conditions are more common in the different risk groups compared to the general NHIS population. In addition, some factors, such as former smoking and hypertension, which were not identified to be strong risk-factors in the UK study, appear to be more prevalent in the high-risk groups because of their association with strong risk-factors, such as age and type-2 diabetes.
We observe substantial variation in prevalence of all the risk-factors across the different cities (Supplementary Figures 4–5). We evaluate the Index of Excess Risk for COVID-19 (IER-C19) mortality, a measure of aggregated risk in a locality associated with the prevalence of the underlying factors (Figure 1). We observe an almost 8-fold risk ratio between the highest and the lowest ranked city according to this index. Five cities with the highest values of the index are Hemet (CA), Detroit (MI), Youngstown (OH), Shreveport (LA) and Deerfield Beach (FL), and five cities with the lowest values of the index are Provo (UT), Frisco (TX), Fishers (IN), West Jordan (UT) and Allen (TX). A number of large cities in the US east coast tend to rank high in this index. Notably, Detroit and Ann Arbor, two cities only separated by 43.4 miles within Michigan, rank in the two extremes with almost 4-fold difference in average risk.
The proportion of individuals in different risk categories varies widely across cities (Figure 2, Supplementary Figure 6, Supplementary Table 1). Our analysis identifies 93 cities which have at least 5% of individuals at high-risk category (>5-fold risk). These same cities have at least 14.2% and 1.6% of individuals who could be classified at elevated (>2-fold) and very-high (>10-fold) risk groups. The number of individuals in the high-risk categories depends heavily on the underlying population sizes. In the New York City, we estimate that there are 1228K, 413K and 139K individuals who exceed these risk-thresholds for elevated, high and very high-risk categories. A number of other major population centers, including Los Angeles, Chicago, Philadelphia, Houston, San Antonio and Detroit, have at least 64K people, in each city, who can be classified at the > 5-fold risk category. The number of individuals who are in the elevated- and very- high-risk categories across these cities varies in the range of 150K-517K and 27K-53K, respectively.
Our analysis also identifies 78 cities which have less than 2% of individuals in the high-risk category (>5-fold). A number of these cities are known for large University associated populations, including Cambridge (MA), College Station (TX), Iowa City (IA), Ann Arbor (MI) and Madison (WI). Other notable cities included in the list are Boulder (CO), Columbia (MO), Fort Collins (CO), Provo (UT) and Salt Lake City (UT). In this list, the large cities that have population size > 400K include Seattle (WA), Austin (TX), Raleigh (NC), Colorado Springs (CO) and San Jose (CA). The estimated number of individuals in these cities who exceed the risk-thresholds for the elevated, high and very high-risk categories ranges between 29K-79K, 7K-19K and 2K-5K, respectively.
The projected number of deaths that are expected to arise at or above various risk thresholds shows how deaths in the underlying populations arise disproportionately from a relatively small fraction of the population (Figure 3, Supplementary Figure 7, Supplementary Table 2). There are 94 cities, including major cities like New York, where we estimate that more than 25% of the total deaths are expected to have risen from a relatively small fraction (<5%) of high-risk individuals. We estimate that in the New York City, 43%, 27% and 16% of the deaths occurred within 14.2%, 4.8%, and 1.6% of the populations at the highest risk. Based on estimates of the total number of excess deaths due to COVID-19 in NYC until May 224, we project that the absolute numbers of deaths attributable to these high-risk categories are 10358, 6637 and 3859, respectively.
Discussion
In this article, we have characterized distribution of risk associated with a set of predisposing factors for COVID-19 death across a large number of US cities. We have utilized information on recently published estimate of risk of mortality associated with these factors from a large UK study14, prevalence of the same factors from multiple population-based data sources, individual-level data available on a nationally representative study, and novel statistical methods to estimate size of populations exceeding precisely defined risk-thresholds. Our results identify cities, including major metropolitan hubs, that have concentration of high-risk individuals. These results can provide guidance to local and national agencies for planning more targeted intervention efforts for high-risk individuals.
Mitigation efforts for the pandemic in most countries to date have focused on broad and strict intervention measures through series of lockdowns and travel restrictions. Additional efforts for targeting high-risk individuals have been generally limited. In England, about 1.5 million individuals who are at extremely high risk due to selected conditions were identified based on national health records, and were provided with government assistance for food delivery and medicine services21. In California, local and state government developed the Project Roomkey25 to provide free hotel room, meal and other services to asymptomatic homeless people who are at high risk due to their age or/and underlying conditions. In the future, as the statewide lockdowns are lifted, more initiatives for shielding high-risk individuals, starting with those who may be particularly susceptible to exposures, such as front-line workers and older population living in community settings, will be needed.
A few recent studies have investigated the proportions of “high-risk” individuals for COVID-19 related serious illness or mortality in the UK, US and across nations globally21–23. Further, the New York Times has recently produced a county-level map for the US to describe prevalence of some of these risk-factors26. These studies have defined high-risk individuals based on prevalence of one or more risk-factors, without taking into account the relative contribution of these factors. Further, because of the broad definition used, they estimate that a very large fraction of populations, 20% in UK and 16–31% across nations globally, are at “high-risk”. In contrast, we have defined different risk-categories based on an underlying score that allows one to assign more precise magnitude of risks to these categories. As a result, we have been able to show that it is possible to identify smaller groups of high-risk individuals which lead to disproportionately large number of deaths across different US cities. Efforts for any targeted interventions, such as government assistance for “shielding”, may not be economically viable if the definition of high-risk group becomes too broad.
Our analysis also shows that a large fraction of total deaths will occur outside of small high-risk groups. In NYC, for example, we estimate that 43% and 27% of deaths are expected to arise from 14.2% and 4.8% of the population who are at the highest risk. The estimate implies that a majority of deaths will occur outside of these risk groups. In particular, we observe that the current set of risk-factors have very limited ability to identify individuals who are younger than 60 at high risk groups (see Table 1) and yet current data suggest that a substantial fraction of deaths will arise from such younger age groups. Thus, targeted intervention for elevated and high-risk individuals through shielding and other efforts, cannot be a substitute for broader community level intervention through social distancing and other measures. Further, research is urgently needed for identifying additional risk-factors, including genetic predisposition and other biomarkers, which can better identify younger individuals who are likely to face serious illness and mortality.
In this article, we investigate the potential excess risks faced by cities, and individuals within cities, due to various predisposing factors. The absolute risks of these communities and individuals, however, heavily depend on the underlying local characteristics of the epidemic driven by key factors such as population density, mobility patterns and social distancing. Estimates available based on excess death, for example, indicate a mortality rate for the NYC from COVID-19 about 283 per 100K individuals during the period of March 13-May 227,28. According to our estimate, the rate of death in the high-risk group (>5-fold) is expected to be about 1620 per 100K individuals. Now, consider a hypothetical scenario where the pandemic returns with double its intensity later this year. Thus, over a similar period of time, such a resurgence will lead to a death rate due to COVID-19 of 566 and 3240 per 100K individuals, in the overall city and in the high-risk group, respectively. The increase in absolute risk due to doubling the intensity of the pandemic in these two groups will be 283 vs 1620 per 100K individuals, indicating a much more adverse impact on the individuals in the high-risk group. In general, our framework can be used to model absolute risk of different risk-groups under various types of pandemic scenarios typically evaluated by the forecasting models29.
While we present the most sophisticated analysis of its kind, our study has several limitations as well. We lacked individual-level data at the level of cities and thus proposed a series of approximations to estimate the distribution of risk. We estimate co-occurrence rates of various risk-factors based on underlying prevalence and odds-ratio measures of aggregation estimated from the nationally representative NHIS. Further, we use the individual-level data available from the NHIS study to evaluate the accuracy of the mixture normal approximation for estimating the proportion of high-risk individuals (Supplementary Figure 1). In the future, accuracy of the approximation may be further improved by using alternative distributional assumptions.
We assumed that the degree of association of COVID-19 death with various predisposing factors observed in the large UK study will be generalizable to the US population30–33. While a number of US-based studies12,27 using case series have reported overrepresentation of many of these factors among patients with severe illness, no large scale population-based epidemiologic studies are available to report precise risk associated with these factors in the US setting. In general, relative-risks associated with major predisposing factors for various outcomes, including communicable33,34 and non-communicable diseases32, tend to be similar between US and UK. The New York City Health Department publishes population-based estimates of rate of hospitalization and death by age, gender and ethnic groups35. We found that the crude (unadjusted) rate ratios for deaths reported in NYC with these factors are fairly consistent with those reported in the UK study. In our analysis, we consider a risk-score defined by the predisposing factors with weights obtained from the fully adjusted model published by the UK study. The risk-score, however, does not consider potential interactions between various predisposing factors and thus may over-/under-estimate risk for certain combination of these factors. In the future, as results from more complex models that include additional risk-factors and their interactions become available, our estimates can be further refined within the framework we have defined.
The Ethnic characteristics of the UK and US population are substantially different. We observed that the crude ratio of COVID-19 death rate for blacks compared to whites in UK is very similar to that observed for the African American population compared to non-Hispanic whites within the NYC. The UK study further reports an increased risk for Asians or British Asians. In contrast, in NYC, the Asian population appears to be at a comparable risk as non-Hispanic whites. The difference is likely to be due to different countries of origin and socioeconomic conditions for these groups across the two countries. In our analysis, we assigned the risk of Asian in the US population to be the same as that of non-Hispanic whites. For the Hispanic population, which is absent in the UK, we obtained age-adjusted rate ratio for death compare to non-Hispanic whites based on data available from the NYC36,37, and included an additional component of risk due to Hispanic origin. We could not find comparable risk estimates for other minority populations such as American Indians, Asian Indians and mixed races, and thus could not include a component of risk due to such ethnic origins. Nevertheless, it is likely that other predisposing conditions, such as age, gender and various health conditions will have similar link with risk of death in these populations.
The UK study reported a strong gradient of risk of COVID-19 death associated with the Index of Multiple Deprivation (IMD), an area-level measure of social deprivation. The study noted that the association of COVID-19 death with IMD remains strong (a risk ratio of 1.70 between 5th vs 1st quartile) even after adjusting for ethnicity and the known comorbidity conditions. In our analysis, we used an alternative county-level measure of Social Deprivation Index (SDI) that is available in the US setting and assigned each US city with the SDI measure of the corresponding county to which the city belongs. We assigned the same degree of risk across the different quintiles of SDI as those observed for IMD in the UK study. Both IMD and SDI capture the same major components of deprivation, namely income, education, employment and housing conditions. Some of these characteristics are known to confer similar risks across UK and US for broad health outcomes such as disability adjusted life years38. The two variables, however, have some unique components, such as ownership of cars in SDI, that may lead to additional risks for COVID-19 death. Future population-based epidemiologic studies are urgently needed to characterize risk of COVID-19 infection, serious COVID-19 illness and mortality in relationship to various ethnic groups and social deprivation in the US setting.
In summary, in spite of some limitations, we present a very comprehensive and rigorous analysis of distribution of risk for COVID-19 death across large number of US cities. While these projections can be further refined as better model and data become available in the future, the current results can provide guidance to national and local policy makers regarding size of high-risk populations who may benefit most with more targeted intervention efforts. In addition, the novel methodological framework we develop and the open-source code we make available will allow similar rigorous analysis of risk across other countries using relevant datasets.
Methods
Definition of COVID-19 risk-score
The risk-score for an individual is defined as a weighted combination of various sociodemographic characteristics and predisposing health conditions, with weights defined by the relative magnitude of the contribution of these factors to the risk of death due to COVID-19. We define the risk-score primarily using information from a very large UK-based studies involving a population of > 17 million individuals among whom more than 5000 COVID-19 deaths were reported14. The risk-factors included age, gender, ethnicity, an area-wide measure of social deprivation and 12 different health conditions. We define the COVID-19 death risk-score for an individual as where Xiks denote binary variables indicating the categories the ith individual belongs to across different risk-factors. We use information available from Table A1 of the paper from the UK study14 to define the level of different risk-factors and extract the corresponding log-hazard ratio values from the fully adjusted model to define the weights. We, however, adjust the risk-score to account for different ethnic composition of the US and UK populations and account for a component of risk for Hispanic population using information on age-adjusted mortality rate available from the NYC36,37. We note that in this definition, the “risk of mortality” refers to that of the general population, and not among infected population. Thus, the predisposing factors can increase risk of COVID-19 death due to their effect on rate of infection or/and rate of death among infected individuals. More details on definition of the risk-score can be found in Section 1 of the Supplementary Notes.
Data sources for obtaining prevalence and joint distribution of the risk-factors
US Census Bureau – American Community Survey
American Community Survey (ACS) is a yearly basis survey that collects information on demographic, social, economic, and housing topics throughout the United States and Puerto Rico39. We obtain the prevalence of demographic variables across cities. Specifically, we extract information on age and gender from the 2017 table40, and the latest information available on ethnicity from the 2018 table41.
Behavioral Risk Factor Surveillance System (BRFSS)
The Center for Disease Control, US, has developed the BRFSS for conducting telephone survey to collect data on various heath related factors for US residents across states, cities, and Metropolitan/Micropolitan Areas. We use the BRFSS “500 Cities: Local Data for Better Health, 2019 release”42 to extract the prevalence on behavioral risk factors including obesity, smoking status, high blood pressure, and chronic health indicators including diabetes, asthma, chronic heart disease, stroke/dementia, kidney disease, rheumatoid/ lupus/ psoriasis. The 2019 release is based on the 2017 questionnaire data.
United States Cancer Statistics
The statistics are based on data collected from different cancer registries by the Centers for Disease Control Prevention (CDC) and the National Cancer Institute (NCI)43. We use the 2012–2016 data to obtain 5-year incidence rates at county level and overall 5-year survival rates for different cancer sites. In our study, the cancer site-specific prevalence is calculated from the incidence rate after adjusting for the survival rate. We assume the cancer prevalence in each city to be the same as that of the corresponding county to which the city belongs.
Robert Graham Center and American Community Survey
As a proxy for the Index of Multiple Deprivation (IMD) used in the UK study, we consider an analogous measure, Social Deprivation Index (SDI), used in the US setting. SDI is an area wide measure of 7 demographic characteristics, including the indicators for less than 12 years schooling, crowding, no car, non-employed, poverty, renter occupied, and single-parent family. The measure is derived by Robert Graham Center using 5-year estimates based on 2011–2015 data from the American Community Survey (ACS)44.
National Health Interview Survey (NHIS)
We accessed individual-level data from the NHIS of CDC. The study collects yearly cross-sectional questionnaire-based information on various health related factors for representative population of the United States45. We extracted risk factor information on about 20,000 adults from the 2017 NHIS data. All of the required variables, except SDI, were available for individuals in NHIS. We use the NHIS data to investigate the distribution of risk-score (excluding SDI) across the general US population, estimate co-occurrence of pairs of factors using the underlying odds-ratio parameters, and evaluate accuracy of mixture normal approximation for risk-score distribution.
Statistical Models and Methods
Similar to the UK study, in our analysis, we assume that the risk of COVID-19 death at time t for an individual i residing in location l, e.g. a city, can be described by the proportional risk model where λl(t) denotes the baseline risk for location l due to underlying pandemic characteristics.
Here t refers to calendar time since some landmark, such as the day when cumulative death reaches some minimum threshold. The average risk of the population at location l can be defined as where El denotes the expectation (average) with respect to distribution of the risk factors in location l.
We define the quantity an Index of Excess Risk (IER). If two locations have same baseline rate of deaths, then the ratio of this index across them will correspond to their rate ratio associated with death and a value of IER>1 will correspond to excess death due to difference in risk-factor distribution across the two places. In our analysis, we present the scaled version of IER as , where denotes the weighted average of Rl(β) across cities with population sizes as the weights. Further, we examine the distribution of Ril(β) across individuals within a location to identify size of the underlying most “vulnerable” populations. For these evaluations, ideally one would require individual-level data for the set of risk-factors X = {X1, … . . XK} for a representative sample of individuals from each city. However, in the absence of such data, we develop a framework to approximate the distributions using city-specific information on prevalence, and individual-level data from a representative sample of the whole US population available from the NHIS study. Specifically, we use data from NHIS to estimate degree of cooccurrence of the different risk-factors and to evaluate the accuracy of mixture normal approximation for tail probability calculations (see Supplementary Figure 1). Further details of the methods can be found in Section 2 of the Supplementary Notes.
Code Availability
All codes for data management and the analyses in this article can be accessed at https://github.com/nchatterjeelab/COVID19Risk.
Data Availability
All data used in the manuscript are publically available. All codes for data management and analysis can be accessed at https://github.com/nchatterjeelab/COVID19Risk.
https://github.com/nchatterjeelab/COVID19Risk
https://www.census.gov/programs-surveys/acs
https://www1.nyc.gov/assets/doh/downloads/pdf/imm/covid-19-deaths-race-ethnicity-04162020-1.pdf
https://chronicdata.cdc.gov/500-Cities/500-Cities-Local-Data-for-Better-Health-2019-relea/6vp6-wxuq
https://www.cdc.gov/cancer/uscs/dataviz/download_data.htm
https://www.graham-center.org/rgc/maps-data-tools/sdi/social-deprivation-index.html
Acknowledgements
We thank Dr. Allison Meisner from the Johns Hopkins University, Biostatistics Department and Dr. Montserrat García-Closas, Division of Cancer Epidemiology and Genetics at National Cancer Center for their comments on a previous version of the manuscript. The funding for this research came from the Bloomberg Distinguished Professorship endowment.