Abstract
Importance The rapid spread of COVID-19 means that government and health services providers have little time to plan and design effective response policies. It is therefore important to rapidly provide accurate predictions of how vulnerable geographic regions such as counties are to the spread.
Objective Developing county level prediction around near future disease movement for COVID-19 occurrences using publicly available data.
Design Original Investigation; Decision Analytical Model Study for County Level COVID-19 occurrences using data from March 14-31, 2020.
Setting Disease spread prediction for US counties.
Participants All US county level granularity based on data fused from multiple publicly available sources inclusive of health statistics, demographics, and geographical features.
Exposure(s) (for observational studies) Daily county level reported COVID-19 occurrences from March 14-31, 2020.
Main Outcome(s) and Measure(s) We developed a 3-stage model to quantify, firstly the probability of COVID-19 occurrence for unaffected counties using XGBoost classifier and secondly, the number of potential occurrences of a county via XGBoost regression. Thirdly, these results are combined to compute the county level risk. This risk is then used as an estimated after-five-day-vulnerability of the county.
Results Using data from March 14-31, 2020, the model shows a sensitivity over 71.5% and specificity over 94%.
Conclusions and Relevance We found that population, population density, percentage of people aged 70 or greater and prevalence of comorbidities play an important role in predicting COVID-19 occurrences. We found a positive association between affected and urban counties as well as less vulnerable and rural counties. The developed model can be used for identification of vulnerable counties and potential data discrepancies. Limited testing facilities and delayed results introduces significant variation in reported cases and produces a bias in the model.
Trial Registration Not Applicable
Question What are key factors that define the vulnerability of counties in the US to cases of the COVID-19 virus?
Findings In this epidemiological study based on publicly available data, we develop a model that predicts vulnerability to COVID-19 for each US county in terms of likelihood of going from no documented cases to at least one case within five days and in terms of number of occurrences of the virus.
Meaning Predicting county vulnerability to COVID-19 can assist health organizations to better plan for resource and workforce needs.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
There was no funding provided for any of the authors.
Author Declarations
All relevant ethical guidelines have been followed; any necessary IRB and/or ethics committee approvals have been obtained and details of the IRB/oversight body are included in the manuscript.
Yes
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
Our work uses open source programming and publicly available data. We will make the full dataset, sample modeling and result outputs available with instructions for use soon on: https://github.com/mihirpsu/covid_19