A Multiple Linear Regression Analysis of Rural-Urban COVID-19 Risk Disparities in Texas ======================================================================================= * Amber K. Luo * Sophia Zhong * Charles Sun * Jasmine Wang * Alexander White ## Abstract As the number of COVID-19 cases in the U.S. rises, the differential impact of the pandemic in urban and rural regions becomes more pronounced, and the major factors relating to this difference remain unclear. Using the 254 counties of Texas as units of analysis, we utilized multiple linear regression to investigate the influence of 83 county-level predictor variables including race demographics, age demographics, healthcare and financial status, and prevalence of and mortality rate from COVID-19 risk factors on the incidence rate and case fatality rate from COVID-19 in Texas on September 15, 2020. Here, we report that urban counties experience, on average, 41.1% higher incidence rates from COVID-19 than rural counties and 34.7% lower case fatality rates. Through comparisons between our models, we found that this difference was largely attributable to four major predictor variables: namely, the proportion of elderly residents, African American residents, and Hispanic residents, and the presence of large nursing homes. According to our models, counties with high incidence rates of COVID-19 are predicted to have high proportions of African American residents and Hispanic residents coupled with low proportions of elderly residents. Furthermore, we found that counties with the highest case fatality rates are predicted to have high proportions of elderly residents, obese residents, and Hispanic residents, coupled with low proportions of residents ages 20-39 and residents who report smoking cigarettes. In our study, major variables and their effects on COVID-19 risk are quantified, highlighting the most vulnerable populations and regions of Texas. Keywords * COVID-19 * multiple linear regression * incidence rate * case fatality rate * rural-urban health disparities ## Introduction The COVID-19 (Coronavirus Disease 2019) pandemic continues to exert major effects on healthcare systems and on the well-being of residents in both urban and rural regions. Texas is one of the hardest hit states by COVID-19, experiencing alarming surges in confirmed cases, hospitalizations, and deaths starting in mid June of 2020. As of September 15, 2020, there have been 15,785 deaths and 668,746 confirmed cases of COVID-19 across the 254 counties in Texas [1]. Previous studies have shown that urban counties in Texas experience significantly higher incidence rates and lower case fatality rates from COVID-19 when compared to rural counties [2]. Rural communities experience lower testing rates and higher positive test rates for COVID-19 [3], along with fewer financial resources [4], limited access to healthcare [5], and higher all-cause mortality rates in comparison to urban communities [6]. Furthermore, many hospitals in rural communities lack the ICU beds and primary care physicians necessary to deal with a local COVID-19 outbreak [7]. Rural counties also have higher prevalences of CDC (Centers for Disease Control and Prevention)-defined COVID-19 risk factors [8], including cardiovascular disease, chronic respiratory disease, obesity, and smoking, as well as higher rates of poverty and lower levels of physical activity [9]. Of those hospitalized with COVID-19, 75% have some underlying medical condition, regardless of age, such as diabetes, chronic respiratory diseases, and cardiovascular diseases [10]. Furthermore, rural and urban counties have many demographic differences, including a higher proportion of elderly and black residents in rural counties, populations that are thought to be especially vulnerable to COVID-19 [11]. These health and demographic disparities, among others, hold the key to understanding the differential impact of COVID-19 on urban and rural counties. Identifying the variables responsible for these differences and quantifying their effects on measures of COVID-19 risk will allow policy makers and healthcare professionals to determine the most suitable plans for COVID-19 based on the predicted vulnerability of a region. Currently, few studies have investigated the relation of demographic differences in rural and urban counties to COVID-19 risk. Our study is the first to use multiple linear regression to definitively quantify the relationship of population characteristics to urban-rural COVID-19 disparities by creating models that explain the effect of various characteristics (predictor variables) on a single measure of COVID-19 risk (response variable). These multiple linear regression models were run for on two response variables: incidence rate from COVID-19 and case fatality rate from COVID-19. The county-level predictor variables we analyzed include, but are not limited to, prevalence of and mortality rate from CDC-defined COVID-19 risk factors, age demographics, race demographics, financial and living conditions, political leaning, and potential super-spreading sites. Our study highlights the most significant county-level variables in predicting COVID-19 incidence rate and case fatality rate and identifies the major underlying variables responsible for urban-rural disparities in COVID-19 risk. ## Data and Methods ### Data Sources The units of analysis for this study are N = 254 counties in Texas. The COVID-19 Data Hub 2.2.0, accessed through the COVID19 Package in R, provided cumulative county-level reports of COVID-19 confirmed cases and deaths. Data was filtered to a 7 day window around September 15 (9/12 - 9/19), and the 7 day rolling averages of COVID-19 incidence rate and case fatality rate were calculated from this window. The 83 predictor variables and their sources can be grouped into the following four categories: 1. Population demographics: County-level data on percent population by five-year age strata from 0-85 years of age, race, education status, health insurance status, sex, unemployment status, and poverty status, as well as data on median income, population density per square mile, and the percentage of healthcare workers per county were obtained from the 2018 American Community Survey and the 2020 County Health Rankings [12, 13]. 2. Health-related risk factors: County-level data on the prevalence of diabetes, cigarette smoking, obesity, and hypertension, as well as the mortality rate per 100,000 population from chronic respiratory disease, cardiovascular disease, alcohol use disorders, and drug use disorders were obtained from the IHME (Institute for Health Metrics and Evaluation) GHDx (Global Health Data Exchange) US Health Datasets. Data on survey responses on mask use, where participants from all counties in the US were asked to self-report how often they wear masks in public from 7/2/2020 - 7/14/2020, was obtained from the New York Times [14, 15]. 3. Potential superspreading sites: Three discrete variables were created to examine the true effect of sites generally thought to have a higher incidence of COVID-19 on COVID-19 risk. These variables represented the presence of one or more meatpacking facilities, the presence of one or more prisons, and the presence of one or more large nursing homes (population *>*100) by county from Niche Meat Processing, Texas Almanac, and Texas Health and Human Services, respectively [16, 17, 18]. 4. Political leaning: Due to the heavily politicized climate surrounding the COVID-19 pandemic, several measures of political leaning were collected. Data on the percentage of voters for Trump/Clinton in the 2016 presidential election and the percentage of voters for O’Rourke/Cruz in the 2018 Texas Senate election was obtained from the New York Times to estimate the true political leanings of the counties in Texas [19, 20]. All statistical analyses, models, and mapping of data were done using R version 1.3.959. A complete list of the predictor and response variables analyzed and their sources and dates can be found in Table 1. ### Statistical Analysis Prior to model creation, counties were grouped into quintiles (1-5) by measures of risk: COVID-19 incidence rate per 100,000 population and case fatality rate, both calculated from 7-day rolling averages. Medians and IQRs were calculated for select predictor variables by quintile, and their significance in relation to COVID-19 risk was determined using Kruskal-Wallis tests, with a *p*-value of *<* 0.05 considered significant. These tables of COVID-19 risk quintiles (Table 2, 4) can be compared with a study by Khose, Moore, and Wang [2] using data from April 8, 2020. ### Models Multiple Linear Regression (MLR) was used to model the effect of different subsets of predictor variables on each of the response variables. To investigate the difference between urban and rural counties, all models include the *Urban* variable, which denotes the urban-rural status of the county. ![Formula][1] *y**i* represents either incidence rate or case fatality rate from COVID-19 for the *i*th county, and *x**ij* represents the value of the *j*th predictor variable for the *i*th county. The errors, *ϵ**i* ∼ *N* (0, *σ*2), are assumed independent. The best subset of predictors for each response variable was chosen in two phases. In phase one, using the LEAPS package in R, all possible subsets of our 83 predictor variables were compared using the Bayesian information criteria. The optimal Box-Cox transformation was then fit to the best four models. In each case, the 95% confidence interval for the optimal power included 0, which suggested the natural log transformation. In phase two, the LEAPS package in R was used once more with the transformed response variable. The best four models for each response were then compared for strength of prediction, as determined by adjusted *R*2 and *F*-statistic; validity of prediction, as determined by normally distributed residual plots; and significance, as determined by the *p*-values of the variables included in the model and ANOVA tables. To assess possible multicollinearity, VIF (variance of inflation) factors were calculated for all predictors in our MLR models using the car package in R. In each case, the VIF never exceeded 2.744, indicating a lack of variable redundancy (threshold: VIF= 5). ## Results ### Incidence Rate Analysis To elucidate the basic nature of the predictor variables in relation to COVID-19 risk and to provide for a deeper understanding of the coefficients in our MLR models, we analyzed the medians and IQRs of certain predictor variables by COVID-19 risk quintiles. Table 2 presents the distribution of county-level demographics and health outcomes by quintiles of COVID-19 incidence rates in Texas on September 15, 2020. Counties with higher incidence rates had generally higher median percentages of residents with diabetes (*p*-value: *<* .001), African American residents (*p*-value: .002), Hispanic residents (*p*-value: *<* .001), obese residents (*p*-value: *<* .001), residents 20-39 years of age (*p*-value: *<* .001), unemployed residents (*p*-value: .078) and generally higher population density per square mile (*p*-value: *<* .001). These counties also had generally lower median percentages of Caucasian American residents (*p*-value: .002) and elderly residents (*p*-value: *<* .001, *>* 70 years of age). In particular, the fifth quintile of COVID-19 incidence rate (*>* 2816.3 per 100,000 population) had the highest median percentage of residents beneath the poverty line (median: 16.6%; *p*-value: .002) and the lowest percentage of residents with health insurance (median: 97.7%; *p*-value: .012). These data identify significant variables in relation to COVID-19 incidence rate and illustrate the trend of these variables as the COVID-19 incidence rate increases. To further analyze these trends, multiple linear regression models were run. For each response variable (incidence rate — IR; case fatality rate — CFR), a base model was run: a simple linear regression of the form ![Formula][2] where the discrete variable *Urban* is 1 for urban counties and 0 for rural counties, in which the urban/rural status of counties is defined by the US Office of Management and Budget [21]. Table 3 presents the base, full, and rural-only incidence rate models. Three counties (Fig. 1, left map; Loving: left; Borden: center; King: right) with no confirmed cases were removed to allow for the natural log transformations. The base model regresses ln(*IR*) on *Urban*, providing a control to mathematically establish both the significance and nature of a county’s urban-rural status when *Urban* is the only predictor in consideration. Changes in the *p*-value of *Urban* as a result of adding more predictor variables to the base model highlight the importance of certain predictor variables in determining urban-rural COVID-19 risk disparities. ![Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/01/08/2021.01.05.20248921/F1.medium.gif) [Figure 1.](http://medrxiv.org/content/early/2021/01/08/2021.01.05.20248921/F1) Figure 1. Residuals of the full incidence rate model (left) and full case fatality rate model (right). Counties with no confirmed cases are displayed in gray on the left, and counties with no deaths are displayed in gray on the right. The scales for the map on the left and the map on the right refer to ln (*IR*) and ln (*CFR*), respectively. The IR base model predicts an incidence rate of 0.0151 in a rural county and 0.0213 in an urban county, not accounting for any other variables: a 41.1% increase. Of note, *Urban* is a highly significant variable in the base model (*p*-value: *<* .001). According to our full incidence rate model, counties with high percentages of Hispanic and/or African American residents are expected to experience significantly higher incidence rates of COVID-19, as are counties with one or more large nursing homes (*>* 100 residents). *Hispanic* is the most significant variable in the model, with a very small *p*-value (*p*-value: *<* .001) and the largest impact on the *R*2 upon removal (Table 9; −19.3%). Counties with one or more large nursing homes are particularly at risk for COVID-19; specifically, a county with one or more large nursing homes (*Nursing Homes* = 1) is expected to experience a 40.6% higher incidence rate from COVID-19 than a similar county without any large nursing homes. These data underscore the importance of outbreaks in nursing homes, establishing the presence of large nursing homes as a major factor affecting the incidence rate of COVID-19 in a particular county. *Elderly* is the only predictor variable in the model with a negative coefficient, indicating that counties with a higher percentage of elderly (*>* 70 years of age) residents are expected to have lower COVID-19 incidence rates. *Black* is the variable with the largest positive coefficient; however, the standard deviation of the percentage of black residents (Table 7; 0.066) is lower than the standard deviation of Hispanic residents (Table 7; 0.233), so that *Black* has a lesser impact on ln ![Graphic][3] than *Hispanic*. The objective of these MLR models is to analyze urban-rural differences, hence the presence of the variable *Urban* in both the base and full models. The *p*-value of *Urban* was significant in the base model and insignificant in the full model; adding the variables *Elderly, Nursing Homes, Hispanic* and *Black* increased the *p*-value of *Urban* from *<* .001 to 0.957, implying that the increased incidence rate of COVID-19 in urban counties is largely attributable to differences in the percentages of elderly, Hispanic, and black residents, and by the presence of large nursing homes. Adding the four aforementioned variables also hugely improved the adjusted *R*2 (0.052 to 0.448) and *F* statistic (14.59 to 41.61). Full models were filtered to include only rural counties to observe any unusual shifts in coefficients or *p*-values of the variables in the full model. As expected, little difference was observed between the full and rural IR models. *Elderly* is slightly less significant in the rural model (.020 vs. .046) and *Black* has a considerably larger coefficient (2.553 vs. 3.261). No variables changed signs and there were no major changes in significance, as is consistent with the insignificance of *Urban* as a predictor variable in the full model (*p*-value: .957; Table 9; −.001%). ### Case Fatality Rate Analysis Table 4 presents the distribution of county-level demographics and health outcomes by quintiles of COVID-19 case fatality rate in Texas on September 15, 2020. The median percentages of obese residents, unemployed residents, and diabetic residents follow generally positive trends, with medians increasing as they approach the third quintile of case fatality rate (1.9% − 2.9%) and deviating as they approach the fourth and fifth quintiles (*>* 2.9%). The percentage of elderly residents follows a strictly positive trend from the second quintile to the fifth quintile, with the first quintile deviating from this pattern. The fifth quintile of case fatality rate (*>* 4.2%) has the highest median percentage of elderly residents (median: 14.3%; *p*-value: *<* .001), lowest median percentage of Hispanic residents (median: 23.5%; *p*-value: .079), and lowest median percentage of residents ages 20-39 (median: 21.2%; *p*-value: *<* .001), in contrast with the low median percentage of elderly residents (median: 9.3%; *p*-value: *<* .001), high median percentage of Hispanic residents (median: 50.2; *p*-value: *<* .001), and high median percentage of residents ages 20-39 (median: 27.2; *p*-value: *<* .001) in the fifth quintile of COVID-19 incidence rate. Table 5 presents the base, full, and rural-only CFR models. 28 counties with no deaths as of September 15, 2020 were not included in the CFR models to allow for a natural log transformation on the response variable. The immediate takeaway from the CFR base model is that rural counties experience higher case fatality rates from COVID-19 as opposed to lower incidence rates from COVID-19. The base model predicts a case fatality rate of 1.94% in a urban county and 2.97% in an rural county: a 53.1% increase. Once again, *Urban* is highly significant in the base model (*p*-value: *<* .001). From our full CFR model, counties with a high percentage of elderly residents are predicted to experience a higher case fatality rate from COVID-19, as are counties with a high percentage of obese residents. Counties with a high percentage of residents ages 20-39 and, interestingly, counties with a high percentage of smokers are predicted to have lower case fatality rates. The coefficient of *Nursing Homes* is negative (−0.210), indicating that a county with one or more large nursing homes is expected to experience a 18.9% lower case fatality rate from COVID-19 than a similar county without any large nursing homes. To provide a basis for direct comparison between case fatality rate and incidence rate of COVID-19, a model was created that regressed ln ![Graphic][4] on the predictor variables from the full incidence rate model (*Urban, Elderly, Nursing Homes, Hispanic, Black*; Table 6). The following changes were observed: 1) The coefficient of *Elderly* reversed signs, increased considerably in magnitude (IR: −2.324 to CFR: 8.079), and had a much larger impact on the *R*2 upon removal (Table 9; IR: −1.21% vs. CFR: −11.7%), 2) The coefficient of *Nursing Homes* reversed signs (IR: 0.343 vs. CFR: −0.251), 3) *Hispanic* became a much less significant variable (Table 9; IR: −19.3% vs. CFR: −3.49%), and 4) *Black* was not significant in the case fatality rate model (*p*-value: .527). Of note, the coefficient of *Urban* remained high (IR: 0.957 vs. CFR: 0.945), indicating that both the difference in incidence rate and the difference in case fatality rate between urban and rural counties is largely attributable to these four variables. To visualize model performance and identify region-specific variations in prediction, we created residual maps. Fig. 1 presents the residuals of the 251 counties in the full incidence rate model (left) and the 226 counties in the full case fatality rate model (right). Of note, the full IR model generally overestimates the incidence rate of COVID-19 in the urbanized regions of the Coastal Plains, while the full CFR model underestimates the case fatality rate in these same regions. The opposite is true for counties in the Mountains and Basins region, for which the full IR model underestimates the incidence rate and the full CFR model overestimates the case fatality rate. The full CFR model was filtered to include only rural counties to observe any unexpected differences. Similar to the rural-only incidence rate model, no variables changed signs or shifted considerably in magnitude, but *Elderly* (*p*-value: .069) and *Nursing Homes* (*p*-value: .058) were no longer significant when including only rural counties in the CFR model. ## Discussion Our study highlights a concerning disparity between urban and rural counties, consistent with several other studies, wherein urban counties experience higher incidence rates, yet rural counties experience higher case fatality rates [2, 22]. Moreover, our study is the first to identify Hispanic & African American populations, elderly populations, and large nursing homes as major contributing factors to this urban-rural disparity in Texas. Knowing these important determinants of county health and their quantitative effect on COVID-19 incidence rate and case fatality rate makes it possible to predict which counties are at the highest risk for serious effects from COVID-19. As the counties with the highest levels of variables that positively impact the predicted COVID-19 case fatality rate are also those with lesser health resources and higher prevalence of health-related COVID-19 risk factors, it is imperative that traditionally vulnerable populations in such counties, including the elderly and immunocompromised, receive adequate essential resources from local providers and health agencies. Furthermore, the alarming positive effect of elderly populations on case fatality rate stresses the need to insulate and closely monitor counties with high percentages of elderly people. Equally concerning is the observation that rural counties, which are predicted to experience higher case fatality rates due to their high percentage of elderly residents and obese residents, lack the health resources and infrastructure of urban counties, exacerbating these effects. Streamlining hospital transportation, providing at-home medical services, and stockpiling resources for rural hospitals can lessen COVID-19 mortality in rural regions. Urban counties face a different problem, having higher incidence rates and lower case fatality rates. While urban counties have more widely available healthcare services, they also have denser populations. These denser populations, coupled with higher predicted incidence rates, have the potential to strain hospital capacities, as demonstrated in mid-July. High incidence rates in urban counties are linked to lower percentages of elderly residents, higher percentages of black residents, and presence of large nursing homes. Of these, the variable that can most reasonably be addressed is nursing homes. The significance of COVID-19 outbreaks in nursing homes is widely known, and such outbreaks are often a result of a lack of testing and close contact with healthcare providers. A plausible measure could be to conduct consistent facility-wide testing, which has been shown to prevent outbreaks in nursing homes [23]. In addition to analyzing COVID-19 incidence rate and case fatality rate, our study uncovered interesting behavior in the predictor variables analyzed. *Black* is not significant in the case fatality rate model and has a very low *R*2 upon removal, in contrast with its significance in the incidence rate model. This indicates that while counties with a high percentage of black residents have higher incidence rates, they do not necessarily have higher case fatality rates, despite these counties having higher rates of poverty and lesser access to healthcare. Counties with many Hispanic residents experience both higher incidence rates and higher case fatality rates, which can be partially attributed to high levels of obesity and diabetes coupled with low rates of health insurance coverage and higher representation in “superspreader” sites such as prisons or homeless shelters [24, 25]. Furthermore, Hispanic residents face language barriers and immigration status, which can complicate the process of obtaining medical aid [26, 27]. This highlights the concerning vulnerability of Hispanic populations along with the importance of adequate communication and provision of medical aid during this time. The negative coefficient of *Smoking* in the case fatality rate model was unexpected, as smoking is a CDC-defined risk factor for COVID-19 and prevalence of smoking in a county is positively correlated with elderly populations and with the incidence of respiratory disease, hypertension, obesity, and cardiovascular disease in Texas. Our study is not the first to report this relationship: Norden et al. [28] at Stanford University and the University of Washington found a similar negative relationship between smoking and COVID-19 case fatality rate across multiple countries, controlling for sex ratio, obesity, temperature, and elderly populations. In their study, several biological explanations for a protective effect of smoking on COVID-19 fatality were proposed. While the possibility of a true protective effect of smoking on COVID-19 can not be ruled out, it is likely that a confounding variable, a variable not accounted for in our model, the effects of county-level data aggregation, or an interaction between another predictor variable and smoking is responsible for the negative coefficient. Regardless, the magnitude of the negative effect of smoking on COVID-19 case fatality rate prompts further exploration to clarify the role of smoking in COVID-19 fatality, both biologically and statistically. The variables included in our full models are those that are the statistically strongest in predicting COVID-19 risk. In creating our models, we were surprised to find that health access (health insurance, primary care physicians), financial status (median income, poverty levels), mortality rate from respiratory disease, and population density were not among the best predictors, as these variables are thought to have significant effects on COVID-19 risk. However, several variables (*Elderly, Hispanic, Nursing Homes*) made consistent appearances, decreasing the likelihood that these variables were merely acting as proxies for more significant variables and indicating that there is some intrinsic aspect of populations of elderly/Hispanic residents, apart from correlations with other variables, that affects COVID-19 risk. This emphasizes the need to ensure that these populations are properly insulated against COVID-19 outbreaks and establishes them as important factors in population heterogeneity in relation to COVID-19 risk. Moreover, the residuals of our models contain important information about which counties are performing better or worse than expected. Comparing counties with negative residuals for common preventative measures could reveal which prevention strategies are most effective, which could then be implemented in counties with high residuals. There exist a number of limitations in our study and opportunities for further work. Several counties with a value of 0 for the response variable were removed from our models to allow for a log transformation on the response variable. Excluding counties with no cases or deaths causes our study to underestimate the impact of variables that may be particularly significant in those counties, potentially leading to the omission of important variables in determining incidence rate or case fatality rate from COVID-19 and lowering the accuracy of prediction in counties with no deaths and/or confirmed cases. As the pandemic progresses, this limitation may fade. Another limitation was the designation of urban/rural and variables for super spreading contexts as discrete variables. In reality, the urban/rural designation is not black or white. Many counties cannot be classified as definitively urban or rural, with some sections being urban and others rural. A simple discrete variable does not capture this variability, and a more proper designation may be large metropolitan/midsize metro/small metro/micropolitan/semirural/rural. Regarding super spreading contexts, a discrete variable does not adequately convey the extent to which the specific super spreading context impacts the county, and a variable that captures the proportion of people in the population who are employed or reside in a super spreading context may be a more robust predictor. Future research can extend the premise of our models to investigate more response variables, including *R**t*, testing rates, and possibly hospital capacity. By identifying at-risk counties and predicting the relative magnitude of county-level COVID-19 incidence rate or case fatality rate based on quantifiable population characteristics, resources can be more appropriately allocated to decrease the disproportionate impact of COVID-19 in Texas and a groundwork can be laid for future population studies to explore the relationships we discovered in depth. ## Data Availability The codes and data used to generate our results are available from the corresponding authors upon reasonable request. ## Funding This research received no external funding. ## Conflicts of Interest The authors declare no conflict of interest. ## Availability of data and material The codes and data used to generate our results are available from the corresponding authors (email: ambermath99{at}gmail.com, aw22{at}txstate.edu) upon reasonable request. ## Author contributions AW designed the study. AKL, SZ, CS, and JW were responsible for data curation and model creation. AKL conducted statistical analysis and produced tables and figures. All authors contributed towards interpretation of study findings. AKL wrote the first draft, and all authors critically reviewed and approved the final contents of the paper. ## Appendices View this table: [Table 1.](http://medrxiv.org/content/early/2021/01/08/2021.01.05.20248921/T1) Table 1. Data Descriptions and Sources View this table: [Table 2.](http://medrxiv.org/content/early/2021/01/08/2021.01.05.20248921/T2) Table 2. Texas County-Level Variable Comparisons by Quintiles of COVID-19 Incidence Rate through September 15, 2020a View this table: [Table 3.](http://medrxiv.org/content/early/2021/01/08/2021.01.05.20248921/T3) Table 3. COVID-19 Incidence Rate Models View this table: [Table 4.](http://medrxiv.org/content/early/2021/01/08/2021.01.05.20248921/T4) Table 4. Texas County-Level Variable Comparisons by Quintiles of COVID-19 Case-Fatality Rate through September 15, 2020a View this table: [Table 5.](http://medrxiv.org/content/early/2021/01/08/2021.01.05.20248921/T5) Table 5. COVID-19 Case Fatality Rate Models View this table: [Table 6.](http://medrxiv.org/content/early/2021/01/08/2021.01.05.20248921/T6) Table 6. Case Fatality Rate Model with Full Incidence Rate Model Predictors View this table: [Table 7.](http://medrxiv.org/content/early/2021/01/08/2021.01.05.20248921/T7) Table 7. Summary Statistics for Incidence Rate Models View this table: [Table 8.](http://medrxiv.org/content/early/2021/01/08/2021.01.05.20248921/T8) Table 8. Summary Statistics for Case Fatality Rate Models View this table: [Table 9.](http://medrxiv.org/content/early/2021/01/08/2021.01.05.20248921/T9) Table 9. Impact of Each Predictor Variable on the R2 Upon Removal ## Acknowledgements A big thanks to Texas State University and the Mathworks Honors Summer Math Camp for providing a platform for us to collaborate virtually to undertake this research. We’d also like to thank Dr. Lauren Ancel-Meyers and the UT COVID-19 Modeling Consortium for sharing their knowledge and advice on a daily basis, and for providing inspiration and data leads for this project. * Received January 5, 2021. * Revision received January 5, 2021. * Accepted January 8, 2021. * © 2021, Posted by Cold Spring Harbor Laboratory The copyright holder for this pre-print is the author. All rights reserved. The material may not be redistributed, re-used or adapted without the author's permission. ## References 1. [1].Texas Department of State Health Services. DSHS COVID-19 Dashboard. Aug. 11, 2020. url: txdshs.maps.arcgis.com/apps/opsdashboard. 2. [2]. Swapnil Khose, Justin Xavier Moore, and Henry E Wang. “Epidemiology of the 2020 Pandemic of COVID-19 in the State of Texas: The First Month of Community Spread”. In: Journal of Community Health 45.4 (2020), pp. 696–701. 3. [3]. Jacob M Souch and Jeralynn S Cossman. “A Commentary on Rural-Urban Disparities in COVID-19 Testing Rates per 100,000 and Risk Factors”. In: The Journal of Rural Health (2020), pp. 1–3. 4. [4]. John Pender et al. Rural America at a Glance. Tech. rep. U.S. Department of Agriculture, 2019. 5. [5].The Cecil G. Sheps Center for Health Services Research. 128 rural hospital closures: January 2010 - present. Sept. 21, 2020. url: [http://www.shepscenter.unc.edu/programs-projects/rural-health/rural-hospital-closures/](http://www.shepscenter.unc.edu/programs-projects/rural-health/rural-hospital-closures/). 6. [6]. Arthur G Cosby et al. “Growth and persistence of place-based mortality in the United States: the rural mortality penalty”. In: American Journal of Public Health 109.1 (2019), pp. 155–162. 7. [7]. Brystana G Kaufman et al. “Half of Rural Residents at High Risk of Serious Illness Due to COVID-19, Creating Stress on Rural Hospitals”. In: The Journal of Rural Health (2020). 8. [8].Centers for Disease Control and Prevention. People who are at higher risk for severe illness. Sept. 11, 2020. url: [https://www.cdc.gov/coronavirus/2019-ncov/need-extra-precautions/people-at-higher-risk.html](https://www.cdc.gov/coronavirus/2019-ncov/need-extra-precautions/people-at-higher-risk.html). 9. [9]. Laura Richman et al. “Addressing health inequalities in diverse, rural communities: An unmet need”. In: SSM-Population Health 7 (2019), p. 100398. 10. [10].Nancy et al. “Preliminary estimates of the prevalence of selected underlying health conditions among patients with coronavirus disease 2019—United States, February 12–March 28, 2020”. In: Morbidity and Mortality Weekly Report 69.13 (2020), p. 382. 11. [11]. Carrie Henning-Smith. “The Unique Impact of COVID-19 on Older Adults in Rural Areas”. In: Journal of Aging & Social Policy (2020), pp. 1–7. 12. [12].United States Census Bureau. American Community Survey. Aug. 13, 2020. url: [https://www.census.gov/programs-surveys/acs](https://www.census.gov/programs-surveys/acs). 13. [13].County Health Rankings & Roadmaps. Rankings Data & Documentation. Aug. 12, 2020. url: [https://www.countyhealthrankings.org/explore-health-rankings/rankings-data-documentation](https://www.countyhealthrankings.org/explore-health-rankings/rankings-data-documentation). 14. [14].Global Health Data Exchange. US Data. Aug. 12, 2020. url: [http://ghdx.healthdata.org/us-data](http://ghdx.healthdata.org/us-data). 15. [15].New York Times. A Detailed Map of Who Is Wearing Masks in the U.S. July 17, 2020. url: [https://www.nytimes.com/interactive/2020/07/17/upshot/coronavirus-face-mask-map.html](https://www.nytimes.com/interactive/2020/07/17/upshot/coronavirus-face-mask-map.html). 16. [16].Niche Meat Processing. Texas Meat Processing Plants. Sept. 2014. url: [https://www.nichemeatprocessing.org/wp-content/uploads/2014/09/Texas-Meat-Processing-Plants.pdf](https://www.nichemeatprocessing.org/wp-content/uploads/2014/09/Texas-Meat-Processing-Plants.pdf). 17. [17].Texas Almanac. Texas Prisons. Aug. 31, 2014. url: [https://texasalmanac.com/topics/government/texas-prisons](https://texasalmanac.com/topics/government/texas-prisons). 18. [18].Texas Health and Human Services. Nursing Facilities. Oct. 29, 2020. url: [https://hhs.texas.gov/doing-business-hhs/provider-portals/long-term-care-providers/nursing-facilities-nf](https://hhs.texas.gov/doing-business-hhs/provider-portals/long-term-care-providers/nursing-facilities-nf). 19. [19].New York Times. An Extremely Detailed Map of the 2016 Election. July 25, 2018. url: [https://www.nytimes.com/interactive/2018/upshot/election-2016-voting-precinct-maps.html](https://www.nytimes.com/interactive/2018/upshot/election-2016-voting-precinct-maps.html). 20. [20].New York Times. Texas Senate Election Results: Beto O’Rourke vs. Ted Cruz. Jan. 28, 2019. url: [https://www.nytimes.com/elections/results/texas-senate](https://www.nytimes.com/elections/results/texas-senate). 21. [21].Texas Commission on the Arts. Rural Texas Counties. Aug. 13, 2020. url: [https://www.arts.texas.gov/initiatives/rural-initiatives/rural-texas-counties/](https://www.arts.texas.gov/initiatives/rural-initiatives/rural-texas-counties/). 22. [22]. Rashid Ahmed et al. “United States County-level COVID-19 Death Rates and Case Fatality Rates Vary by Region and Urban Status”. In: Healthcare. Vol. 8.3. Multidisciplinary Digital Publishing Institute. 2020, p. 330. 23. [23]. Carson T Telford et al. “Preventing COVID-19 Outbreaks in Long-Term Care Facilities Through Preemptive Testing of Residents and Staff Members—Fulton County, Georgia, March–May 2020”. In: Morbidity and Mortality Weekly Report 69.37 (2020), pp. 1296–1299. 24. [24]. Raul Macias Gil et al. “COVID-19 Pandemic: Disparate Health Impact on the Hispanic/Latinx Population in the United States”. In: The Journal of Infectious Diseases (2020). 25. [25]. Yiling J Cheng et al. “Prevalence of diabetes by race and ethnicity in the United States, 2011-2016”. In: Journal of the American Medical Association 322.24 (2019), pp. 2389–2398. 26. [26]. Benjamin D Sommers. “Targeting in Medicaid: The costs and enrollment effects of Medicaid’s citizenship documentation requirement”. In: Journal of Public Economics 94.1-2 (2010), pp. 174–182. 27. [27]. Office of Minority Health. Profile: Hispanic/Latino Americans. Aug. 22, 2019. url: [https://minorityhealth.hhs.gov/omh/browse.aspx?lvl=3](https://minorityhealth.hhs.gov/omh/browse.aspx?lvl=3). 28. [28]. Michael J Norden et al. “National smoking rates correlate inversely with COVID-19 mortality”. In: medRxiv (2020). [1]: /embed/graphic-1.gif [2]: /embed/graphic-2.gif [3]: /embed/inline-graphic-1.gif [4]: /embed/inline-graphic-2.gif