Abstract
Background The effect of socioeconomic factors, ethnicity, and other variables, on the frequency of COVID-19 cases [morbidity] and induced deaths [mortality] at sub-population, rather than at individual levels, is only partially understood.
Objective To determine which county-level features best predict COVID-19 morbidity and mortality for a given county in the U.S.
Design A Machine-Learning model that predicts COVID-19 mortality and morbidity using county-level features, followed by a SHAP-values-based importance analysis of the predictive features.
Setting Publicly available data from various American government and news websites.
Participants 3,071 U.S. counties, from which 53 county-level features, as well as morbidity and mortality numbers, were collected.
Measurements For each county: Ethnicity, socioeconomic factors, educational attainment, mask usage, population density, age distribution, COVID-19 morbidity and mortality, air quality indicators, presidential election results, ICU beds.
Results A Random Forest classifier produced an AUROC of 0.863 for morbidity prediction and an AUROC of 0.812 for mortality prediction. A SHAP-values-based analysis indicated that poverty rate, obesity rate, mean commute time to work, and proportion of people that wear masks significantly affected morbidity rates, while ethnicity, median income, poverty rate, and education levels, heavily influenced mortality rates. The correlation between several of these factors and COVID-19 morbidity and mortality, from 4/2020 to 11/2020 shifted, probably due to COVID-19 being initially associated with more urbanized areas, then with less urbanized ones.
Limitations Data are still coming in.
Conclusions Ethnicity, education, and economic disparity measures are major factors in predicting the COVID-19 mortality rate in a county. Between-counties low-variance factors (e.g., age), are not meaningful predictors.
Differing correlations can be explained by the COVID-19 spread from metropolitan to less metropolitan areas.
Primary Funding Source None.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
None.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
None.
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Paper in collection COVID-19 SARS-CoV-2 preprints from medRxiv and bioRxiv
The Chan Zuckerberg Initiative, Cold Spring Harbor Laboratory, the Sergey Brin Family Foundation, California Institute of Technology, Centre National de la Recherche Scientifique, Fred Hutchinson Cancer Center, Imperial College London, Massachusetts Institute of Technology, Stanford University, University of Washington, and Vrije Universiteit Amsterdam.