Abstract
This paper uncovers the socioeconomic and health/lifestyle factors that can explain the differential impact of the coronavirus pandemic on different parts of the United States. Using a dynamic panel model with daily reported number of cases for US counties over a 20-day period, the paper develops a Vulnerability Index for each county from an epidemiological model of disease spread. County-level economic, demographic, and health factors are used to explain the differences in the values of this index and thereby the transmission and concentration of the disease across the country. These factors are also used in a zero-inflated negative binomial pooled model to examine the number of reported deaths. The paper finds that counties with high per capita personal income have high incidence of both reported cases and deaths. The unemployment rate is negative for deaths implying that places with low unemployment rates or higher economic activity have higher reported deaths. Counties with higher income inequality as measured by the Gini coefficient experienced more deaths and reported more cases. There is a remarkable similarity in the distribution of cases across the country and the distribution of distance-weighted international passengers served by the top international airports. Counties with high concentrations of non-Hispanic Blacks, Native Americans, and immigrant populations have higher incidence of both cases and deaths. The distribution of health risk factors such as obesity, diabetes, smoking are found to be particularly significant factors in explaining the differences in mortality across counties. Counties with higher numbers of primary care physicians have lower deaths and so do places with lower hospital stays for preventable causes. The stay-at-home orders are found to be associated with places of higher cases and deaths implying that they were perhaps imposed far too late to have contained the virus in the places with high-risk populations. It is hoped that research such as these will help policymakers to develop risk factors for each region of the country to better contain the spread of infectious diseases in the future.
1 Introduction
The novel coronavirus, also known as COVID-19, has brought the global economy to a screeching halt. It is sweeping through the United States and the country has now taken a lead in not only the total number of positive cases but in terms of the number of reported deaths as well. New York’s total number of cases exceeds the total number reported for any country, including China.
The virus is believed to have originated in Wuhan, China in late 2019. As the virus spread through Wuhan and the rest of China, it raised alarms across the scientific communities and governments around the world. With every passing day the virus continued to spread exponentially. The impact of the virus in the United States started to grab public attention from late February and many states started imposing stay-at-home orders in mid-March, 2020. The dramatic increase in the number of infected patients in a nursing home in Seattle, Washington stunned a nation and made evident the contagiousness of the virus and its lethality. While the national attention was focused on Seattle, the virus was taking a deadly hold in New York city and its surroundings. With every passing day, one state after another started announcing their first reported cases. No US state has been spared. However, the spread of the virus has been anything but uniform. Figure 1 shows the geographical distribution of reported cases across US counties in April, 2020.
This paper attempts to uncover the socioeconomic conditions that are dominant in the areas with the high number of cases and deaths. The literature on the transmission of infectious diseases often finds that highest impact areas have low income, poor sanitary conditions, and poor health care conditions due to their focus on viruses that have significantly impacted developing countries (Campos et al. (2018) for Zika, Redding et al. (2019) for Ebola are recent examples). Moore et al. (2017) used Ebola to develop an Infectious Disease Vulnerability Index for countries in Africa. The literature on the socioeconomic determinants of the spread of infectious diseases in developed countries is not extensive - Adda (2016) is an exception. Using data from France, it offers an extensive analysis of the transmission of three viruses - influenza, gastroenteritis, and chickenpox. The paper asks the important questions whether virus spread more rapidly during periods of economic growth and if their spread follows a “gradient determined by economic factors.” Using data from France, Adda (2016) finds that the viruses studied indeed propagated faster during times of economic boom due to increased economic activity and contact between people. Qiu (2020) have conducted a similar analysis for Wuhan, China. Both papers find a positive relationship between the spread of the virus and economic activity. Avery et al. (2020) offers a list of resources both in terms of relevant research and data sources for researchers.
Unlike some of the literature cited above that concentrate on the impact of mitigation andor containment strategies along with economic conditions such as GDP, employment, and weather-related factors such as temperatures, and pollution (Wu (2020)), this paper focuses on economic, demographic, and health conditions in explaining the number of cases and deaths in the US. Figure 1 clearly indicates that the spread of the virus has been in regions of high economic activity on the two coasts. The virus has arrived on the US shores through international travel. While the initial spread of the virus is expected to be triggered by international travel and economic activity, it is important to understand whether its continued spread and concentration is restricted to such places. As the lockdown continues and the medical profession is trying to understand the susceptibility of individuals in contracting the disease, this paper attempts to understand the underlying socioeconomic conditions of the geographic regions around the US that make them susceptible to becoming hotspots. This is related to the question about the factors that determine the gradient followed by the virus as it spreads through the country.
Introducing heterogeneity that captures region-specific uniqueness in an epidemiological model of disease spread, the paper develops a Vulnerability Index for the counties included in the study. These indexes capture the underlying factors that impact the vulnerability of a region to the virus. Economic, demographic, and health/lifestyle factors are used to explain the observed differences in the vulnerability index. These factors are also used to explain the differences in the number of deaths reported across the countries. The results indicate that the underlying demographic and health/lifestyle factors have a more significant impact in explaining deaths than disease spread. This is not a surprising result since the spread of the virus does not depend on a person’s ethnicity or education status. Once people contract the disease, however, the health outcome depends on a multitude of factors that go beyond an individual’s control.
The paper uses available county level data to identify economic, demographic, and health/lifestyle risk factors for different parts of the US. The paper finds that people in regions of high economic activity and economic inequality are particularly at elevated risk of both disease spread and mortality. There is a remarkable parallel between the spread of the disease and distance-weighted distribution of passengers arriving at international airports. The demographic distribution in terms of race shows higher vulnerability both in terms of disease contraction and death for non-Hispanic Blacks, Native Americans, and immigrants. Counties with higher numbers of personal care physicians per 1000 individuals have lower deaths and so do places with fewer preventable cases of hospital stays. Some of the high risk factors such as obesity, diabetes, are found to have a more mixed result. This can be partly explained by the fact that many of these risk factors have a high degree of concentration in many of the southern states. These states have not reported as many cases or deaths as the regions around New York, Detroit, Chicago, and the western states of California and Washington. The paper includes the number of days since the onset of stay-at-home orders issued by governors at the state level across the country. This variable is not found to be statistically significant in influencing the vulnerability index in the spread of the virus. Regions that have longer stay-at-home orders have experienced higher number of deaths. These regions would have experienced much higher number of cases and deaths without those orders. It is likely that they were imposed too late to have been successful in containing the virus.
This paper has identified socioeconomic and health/lifestyle factors that have played a critical role in helping the virus to develop a stronghold in certain parts of the country and cause high fatalities. It is true that a single gathering of individuals can lead to a spike in the number of cases and large number of deaths in a region. The members of the Coronavirus Task Force are monitoring where a sudden spike is occurring. As data on cases and deaths are collected, it is important to be able to better predict if the population of a certain area is particularly vulnerable to the disease. This paper shows that it is possible to develop a vulnerability index both for disease spread and deaths based on the socioeconomic composition of the population and their health/lifestyle choices. Developing such a profile will be particularly important as the various parts of the country contemplate lifting stay-at-home orders before the invention of a therapeutic or a vaccine.
Recent experience suggests that infectious diseases are a major threat to both the health and economic well being of people around the world. In spite of the experience with HINI, SARS, and Ebola, countries such as the United States did not develop a coherent infrastructure or strategy to determine which parts of the country are at particularly higher risk of disease transmission. This paper shows that it is possible to utilize the economic, demographic, and lifestyle profiles of regions to develop a risk factor for each geographical area so that when the next epidemic arises, public officials are better prepared to anticipate where the hotspots are likely to arise and take the necessary containment steps. The experience with COVID-19 shows how rapidly an infectious disease can bring an economy down. Without advance preparation the next disease will be just as difficult to contain as this. The large differences within state boundaries show the importance of developing more local strategies that take into consideration a multitude of factors.
2 Methodology and Data
The coronavirus pandemic has impacted all 50 states in the United States. The experience of each state, county, and city has been anything but homogeneous. To understand this differential effect across counties in the US, we consider two sets of factors. Epidemiological models explain how an infectious disease evolves in a region based on population and the size of the pool of infected individuals. We will use epidemiological models such as the SIR model to determine the fundamental differences in cases based on population size and number of infections. These factors alone cannot explain the entire heterogeneous outcomes across the country. We expect differences in types and amounts of economic activities, living conditions, demographic makeups, and lifestyle choices to determine the vulnerabilities of communities in the spread of a highly contagious virus such as the coronavirus.
We will conduct this analysis in two steps. In the first step an epidemiological model of disease spread will be used to generate estimates of a vulnerability index for each county once population and infections are accounted for. In the second step we will use county level economic, demographic, and health data to explain differences in the vulnerability indexes across counties.
Epidemiological models of the SIR type such as in Blackwood et al. (2018) describe disease spread dynamics based on three main factors - the size of the population, the number of susceptible individuals, and the number of infected individuals. With a population of size N, if I denotes the number of infected individuals, the number of individuals susceptible to the disease is given by S = N – I. At each time t, the number of new infections will depend on the interactions of the susceptible (S) and infected (I) individuals. The infected individuals are non-infectious during the latent period and asymptomatic but infectious from the end of the latent period to the end of the incubation period and infectious with symptoms after the end of the incubation period. If j denotes the number of days it takes to become infectious, at time t the interactions of susceptible people with people infected t – j days earlier will lead to new cases.
Using daily reports of coronavirus cases for counties across the United States, we generate a panel dataset of US counties over a 20 day period from March 30 to April 18. The panel data approach in estimating the growth of the virus in different parts of the US allows us to introduce county-specific fixed effects in the estimation. The panel estimates the number of cases as a function of the potential pool of susceptible and infected individuals and time and county-specific fixed effects and is given by the following equation: where, Cit denotes the number of reported cases in county i at time t, γi gives the fixed effect parameter for county i, δ is the parameter for the time variable, and uit is the error for county i at time t. The lagged value of the cases shows that the number reported in any day depends on the numbers reported the previous day.
Estimation of the above regression will generate parameter values, γ, for each county. These values will reflect the county-specific fixed effects that influence the vulnerability of each county to the virus. From these fixed effects we generate a vulnerability index for each county. This approach is similar to the one used by Mukherji and Silberman (2013) in studying patent citations between metro areas in the US. In the second step of the analysis, we use county-level economic, demographic, and health care factors to explain how they influence the vulnerability index for each county. The factors that may explain the county vulnerability index are classified into three groups. The first group of factors relate to the economic conditions and include factors such as: per capita personal income, the unemployment rate, the level of income inequality, poverty, access to housing, and concentration of different types of industries such as manufacturing, mining, and others. The second group of factors relate to a set of demographic factors including the size of the population and its density, the racial profile of the counties, the age distribution of the population, and the percentage of the population that was born outside the United States. The third group of factors considered include health or lifestyle related factors such as the number of primary care physicians per capita, the percentage of the population with obesity and diabetes, the percentage of the population that smokes and drinks, the percentage of the population with inactive lifestyles.
In addition to the county level economic, demographic, and health data, spatial factors are considered as well. The contagious nature of the disease compels one to consider the spillover effects to neighboring counties. We introduce inverse-distance weighted values of the number of international passengers served by the top 46 international airports in the contiguous US. Since the virus is presumed to have originated in China and then spread to other parts of the world including Europe before taking a hold in the United States, international passenger data is introduced to examine if proximity to international airports is related to the concentration of confirmed cases. While international passengers often arrive at a particular airport and then use domestic airlines to travel to other parts of the country, the locations of the international airports are closely tied to areas with concentrations of activities that are globally oriented. Consequently, a large number of the international passengers served by these airports are expected to interact in the regions around these airports. Using a 300-mile radius around each county where the airports are located, an inverse-distance matrix is used to assign the number of international passengers in the areas surrounding the airports. The bottom part of Figure 1 displays the weighted distribution of international passengers. While this data is unrelated to the number of confirmed COVID-19 cases, the spatial distribution of the passenger data is similar to the spatial distribution of confirmed COVID-19 cases.
The estimation of the impact of these regional factors in explaining differences in vulnerabilities to the disease will be based on Equation (2).
In the above equation, Vi represents the vulnerability index of county i, eki represents the set of k economic variables that makes a county susceptible to the spread of the disease due to the enhanced interactions between people and working in close proximity. Although the economic activity of a county changes with time, the general distribution of such activities across the country remains relatively stable within short periods of time. dm represents the demographic factors and hn represent the health-care factors discussed above. This equation includes a spatially weighted number of international passengers in the region by multiplying an inverse distance-weighted matrix W with the number of international passengers, I, served by an international airport in the neighborhood of county i.
This paper uses county-level data for the United States. The data on COVID-19 cases and deaths is obtained from the COVID tracking data provided by the New York Times and Johns Hopkins University. Figure 1 displays the distribution of cases in the 2512 counties.
Data sources for the various demographic and economic variables such as population distribution by ethnicity, population density are listed in Table 1. While many of the data listed in the table are obtained from the USDA’s Atlas of Rural and Small Town America and the Federal Communication Commission, the original data sources are the Census Bureau and the American Medical Association. Some of the demographic data such as the distribution of the population by race and education are from the 2010 census. The total population, per capita personal income, unemployment data are from 2018. The percentage of the population with various heath-related factors such as obseity, diabetes, and life-style habits such as smoking and drinking are available from the 2014-15 period. Data on international air passengers was obtained from the Bureau of Transportation Statistics. This source provides the number of international passengers served by the top 50 international airports in the United States. Using airports in the contiguous United States only, 46 of the 50 airport data were used. The total number of passengers on international flights is over 109 million for 2018. In order to account for local spillover effects of the virus in the form of increased susceptibility due to higher prevalence of cases, an inverse distance weighted matrix was created with positive weights assigned upto a 300 mile radius around a county. This radius is just large enough to ensure that each county in the study had at least one other county in the study as a neighbor.
3 Estimation
3.1 Estimation of Cases
The previous section explained that the foundation of the analysis of the socioeconomic factors that can contribute to the spread and concentration of the coronavirus in the various parts of the country lies in the epidemiological model of disease transmission. The first step is to generate county-level vulnerability measures from an estimation of equation (1). The daily coronavirus data is available for over 2500 counties. To manage the computational load of estimating a panel that large, we restrict our analysis to counties that reported an average of 30 cases per day from March 30 through April 19. This generates a panel of 771 counties covering all 50 states. Each of the counties reported at least one confirmed case during the period of analysis resulting in a balanced panel. Equation (1) includes a lagged value of infections in determining the proportion of the population that is susceptible at any time t. The incubation period for this virus is estimated to be anywhere between 2 to 14 days. People are infections a few days before they develop symptoms and after they develop symptoms. We assume a 7 day lag for the results reported in the paper. Sensitivity analysis was conducted for different lag lengths.
Equation (1) shows that cases in period t depend on the number of cases in period t – 1 and also on the number of susceptible and infected people whose values depend on the number of cases in previous periods. The inclusion of the lagged dependent variable makes this a dynamic panel and requires the use of dynamic panel estimation methods. A model with small t (20) and large N (771) with a lagged dependent variable is expected to have the Nickell’s bias Stephen (1981). A difference GMM estimation is found to be the best option for the data. The Allerano-Bond estimation method Arellano and Bond (1991) that uses lagged values as instruments as implemented by Roodman (2006) was used. Results are reported in Table 2. The results show that although autocorrelation of the first order exists, there is no second order autocorrelation. The Sargan and Hansen tests of no overidentification of instruments are satisfied and the F statistic shows that the model fits the data well. The table shows that the one period lagged number of cases has a significant impact on the number of cases reported on any day. The interaction of the infected and susceptible population is also significant and positive.
One of the key objectives of this regression is to obtain a set of estimates for the county level fixed effects. The method of dynamic panel estimation that utilizes first differencing removes the impact of time-invariant variables such as the time-invariant fixed effects. These are, however, recoverable from the residuals. It is to be noted that for a dynamic panel model of the form, yit = ρyit−1 + ai + eit, the residual . The average ēi can be used as an estimator of the fixed effects to analyze how the underlying conditions in the various counties impact the fixed effects as long as those factors are uncorrelated with the eit. That condition is satisfied with average eit equalling −7.00e-09 for the results of the regression of equation 1. The plot of the fitted and observed values in Figure 5 shows the distance between the observed values and the fitted line and will be the county-level fixed effects.
3.2 Estimation of the Vulnerability Index
The estimates of the fixed effects derived from the dynamic panel regression of cases are converted to an index by transforming the mean value to 100 and is termed the Vulnerability Index. High values of the index indicate that the counties are more susceptible for the growth of the disease. The value of the index range from 63 for Lincoln, Arkansas to a high of 229 for New York City, New York. Table 3 offers a list of the 20 lowest and highest values of the index. The results show that the higher values are in the so called “hot spots”. The table lists the region codes and Urban Influence Codes (UIC) used by the USDA to distinguish between rural and urban areas. Codes 1 and 2 are for metro areas, 11 and 12 are for non-core areas that are not adjacent to any metro area. The table shows the concentration of the high index areas in the northeast and in large metro areas. The bottom values are found in counties mainly outside the northeast. There is a large difference in the population densities of the places with high values of the index than the ones with the smallest values. The table shows that there are differences in both location and type of county that distinguish areas with high values of infections from places with smaller outbreaks. We attempt to introduce additional factors that can shed light on why some places experienced significantly higher infection rates than others after controlling for the pool of susceptible individuals.
The values of the vulnerability index are used to estimate equation (2). Descriptive statistics of the variables are reported in Table 2 while the results are reported in Table 4. The differences in the three sets of results are based on the inclusion of population and population density in the regression. These two variables have a correlation of 0.76. As discussed in the previous section, the independent variables are classified into three broad groups - economic, demographic, and health/lifestyle. The results show that in the economic group, per capita income has a positive and significant effect showing that places of high income have higher vulnerability. The Gini coefficient measuring the degree of income inequality and severe housing problems are positive and significant if only population density is included. Another measure of economic hardship measured by the degree of food insecurity has a significant and negative effect. This is consistent with the result on income. Figure 2 shows that the largest concentration of counties with the highest levels of food insecurity are in the southern states of Georgia, Mississippi, Arkansas, Alabama - places that have not reported as many cases as some of the hot spot counties in the northeast and west. The unemployment rate and indicator of deep poverty are not found to be a significant variables. The results also show that places of severe housing shortage have higher vulnerability indexes only when population is not included. Together these results show that counties with higher vulnerability have higher economic activity. The measures of income inequality and severe housing problems have a positive impact on the vulnerability index but they are only significant when population is not included.
Figure 1 showed that the locations of the international airports are close to the regions of high infection and the results show that the distance-weighted number of international passengers served by these airports is positive and significant. Since the source of the virus is traced outside the United States and is expected to have spread here through people traveling from outside the United States, this result is not surprising. The results for the number of international passengers served by the airports measures the impact of the passengers in the counties in which the airports are located1 and this variable is positive and significant in most models.
The most significant variables in the demographic and health related groups relate to the racial profiles of the counties and some lifestyle choices such as the percentage of the population that drives alone to work and are physically active. The results show that counties with higher concentrations of non-Hispanic Blacks, Native Americans, and immigrants have higher infections. The foreign born or immigrant variable is significant in only the model with no population. It is not surprising that the other factors such as the age distribution or health indicators are not significant since anyone regardless of age and other health conditions can get infected. The economic indicators are significant because they determine the type of interactions people have that make them vulnerable in getting in contact with other carriers of the disease. The variable on driving alone to work is negative and highly significant. This is consistent with the notion that driving alone causes less exposure to others and can serve as a protection against getting infected. Population size is a highly significant indicator and so is density as long as population is not included.
3.3 Estimation of Deaths
While the age distribution and health indicators are not significant in explaining the differences in the number of cases across the counties, it is well established at the individual patient level those are important factors. The daily data provided by the New York Times and Johns Hopkins University report the number of deaths as well. Table 5 display the results of an estimation of the deaths based on variables similar to the one for the cases reported in Table 4. Unlike the regression related to the analysis of the number of confirmed cases, there are many instances of zero values of the dependent variable for the regression on deaths. The number of zero values reported for this sample is 1545 out of a total number of 9984 observations. The zero-inflated negative binomial distribution is preferred to the negative binomial when excessive zeros are present. Comparison of the model fit in terms of AIC and BIC values shows that the zero-inflated negative binomial better fits the data. Due to the very large number of observations, county level fixed effects and a panel approach are computationally difficult. A pooled model with indicator variables for the days for which the data is analyzed is used for the analysis.
The coefficients of the regressors are reported as incidence rate ratios to help in the interpretation of the values. Unlike the estimation of cases reported in Table 3, a window of 14 days is used from infection to death. The results of the pooled zero-inflated negative binomial model show that the number of deaths are positively related to the number of cases reported 14 days prior and the size of the population. The increase of reported cases by 1 increases the death rate by 0.023%. The indicator variables for the days of the analysis show that relative to the 20th day, days 15 through 17 had significantly fewer deaths. This is to be expected since the death counts have been rising during this period. The lack of significance in the values for days 18 and 19 relative to day 20 may indicate some slowing of the rise in the death counts after April 16. The results of this regression as they relate to the economic variables are very similar to what was reported for the cases. Counties with higher personal income and higher inequality in terms of income distrbution (Gini coefficient), severity of housing shortage have higher numbers of reported deaths even after controlling for the number of cases. Consistent with the income result, the unemployment coefficient is negative.
While the demographic and health related factors were largely not significant except for the racial distribution of the population, in this regression of the number of reported deaths, the demographic factors are more impactful. The results show that the counties with a higher percentage of the population with less than a college education have higher deaths and so do counties with a higher percentage of females. Counties with higher percentages of non-Hispanic Blacks, Native Americans, and immigrants have higher deaths while populations with higher Hispanics, Asian Americans, and multi-racial populations have lower values relative to the excluded category of non-Hispanic Whites.
On the health related factors, counties with more primary care physicians have reportedly fewer deaths. The remaining results related to health indicators are as follows - places with more preventable hospital stays, higher percentage of the population that has diabetes, HIV, and are physically inactive have higher reported deaths. These are not surprising since people with underlying health risks are expected to experience more severe reactions to the infection. Counties with higher obesity and percentage of the population that engages in excessive drinking have fewer deaths.
Obesity is a personal medical risk factor for morbidity. The county-level result reported in Table 6 is inconsistent with that. This is also true about the results for gender and age. Numerous variations of choices of variables and regression techniques show that when a large number of variables is considered, the signs and significance of all variables are not consistent with what are known as risk factors at the individual level. Using principal component analysis as an alternative method to address the correlations between variables, the results related to factor loadings are consistent with the results reported in Table 6. This suggests that when a region’s vulnerability to an infectious disease such as the coronavirus is concerned and multiple factors need to be taken into account, aggregated regional statistics that mask patient-level data may not be fully consistent with patient level risk factors. In preparation for future epidemics and pandemics this is an issue that needs more attention.
The coefficients of the region codes 2-4 are less than 1 indicating that relative to the excluded region, the northeast, the other regions had smaller incidence of death.
The results show that the economic factors are important for explaining the differential impacts experienced by counties across the country both in terms of confirmed cases and deaths reported. The demographic and health related factors are more pronounced in the estimation of deaths than reported cases. This is not surprising since the virus does not discriminate based on any factor other than immunity but the severity of the disease that can lead to a fatal outcome depends on underlying health and demographic factors.
4 Conclusion
This paper has examined the differential experience of infections and deaths across the United States due to the COVID-19 pandemic. Daily reported cases of confirmed cases and deaths were examined over a 20 day period from March 30 through April 19, 2020. Although data is available for over 2700 counties, this paper focused on 771 counties that reported an average of 30 cases over the 20 day period. The counties that are not included in the study had far fewer cases and reported deaths. The counties that remain in the sample includes a vastly diverse set of counties. The excluded counties are largely similar in the small number of cases and reported deaths and added significant costs in terms of computational complexity without adding much in terms of added value.
The analysis of the number of cases is based on an epidemiological model in which we included a county fixed effect. This is a novel way to introduce heterogeneity in such a model. As noted by Avery et al. (2020), the epidemiological models do not include the heterogeneity that economic models require. A dynamic panel regression of the number of cases included the potential number of interactions between susceptible and infected individuals as a proportion of the population along with county fixed effects. The results of the model were used to construct a Vulnerability Index for each county. Economic, demographic, and health/lifestyle factors were used to explain the differences in the Vulnerability Index across the counties. The results showed that counties with higher economic activity have higher vulnerability. The results show that regions around international airports experienced higher numbers of cases than ones that are over 300 miles away. This is consistent with the fact that the virus has arrived on the US shores through travelers coming to the US from abroad. The results also show that places with higher vulnerability also have a higher proportion of the population that does not use public transportation to go to work. Counties with more non-Hispanic Black, Native American, and immigrants are more vulnerable. The remaining demographic and health variables were largely insignificant.
Due to many counties reporting zero deaths during many of the days used in the sample, a zero-inflated negative binomial pooled regression was used to analyze how the economic, demographic, and health conditions impact the severity of the infection experienced by the counties. The results show that the economic factors have a similar impact on deaths. That is, counties with higher income and cases also experienced higher deaths. Counties with higher income inequality and housing shortage also experienced more deaths. In contrast to the results of the reported cases, this regression showed that not only are counties with higher percentages of non-Hispanic Blacks, Native Americans, and immigrants more likely to die relative to counties with non-Hispanic Whites, so are counties with a higher concentration of people with less than a college education. Counties with more personal care physicians per capita experienced lower deaths and so did counties with a lower percentage of the population with diabetes, smokers, and preventable hospital stays. Counties with higher obesity, HIV, and drinking are associated with lower deaths. It is to be noted that results here are based on reported deaths at the county level and do not include any patient-level information.
The coronavirus pandemic has demonstrated how quickly a highly contagious respiratory illness can bring the global economy to a standstill. There have been several such infections in the last ten years although none of them had the virulence or lethality of this virus. Most of them spread to a few countries and then disappeared. The developed world remained largely unaffected by most of them and the experience of this pandemic has laid bare the lack of infrastructure to respond to such an incident. The economics literature is not extensive in the area of pandemics and epidemics in developed countries. The contribution of this study is to understand the various socioeconomic conditions that can make a county or region more vulnerable to both disease spread and severity of cases. A national strategy to prepare the infrastructure for controlling the spread of infectious diseases should consider these factors and develop Vulnerability Indexes for each region.
Data Availability
All data used in the project are available in the files stored in the link below.
https://github.com/nivedita-mukherji/Covid-socioeconomic-research-project
Footnotes
↵1 The diagonal values of the weight matrix used for the calculation of the weighted international passengers are zeros. Consequently the weighted values measure the impact in the surrounding areas only.