ABSTRACT
Epidemiological models have provided valuable information for the outlook of COVID-19 pandemic and relative impact of different mitigation scenarios. However, more accurate forecasts are often needed at near term for planning and staffing. We present our early results from a systemic analysis of short-term adjustment of epidemiological modeling of COVID 19 pandemic in US during March-April 2020. Our analysis includes the importance of various types of features for short term adjustment of the predictions. In addition, we explore the potential of data augmentation to address the data limitation for an emerging pandemic. Following published literature, we employ data augmentation via clustering of regions and evaluate a number of clustering strategies to identify early patterns from the data.
From our early analysis, we used CovidActNow as our underlying epidemiological model and found that the most impactful features for the one-day prediction horizon are population density, workers in commuting flow, number of deaths in the day prior to prediction date, and the autoregressive features of new COVID-19 cases from three previous dates of the prediction. Interestingly, we also found that counties clustered with New York County resulted in best preforming model with maximum of R2= 0.90 and minimum of R2=0.85 for state-based and COVID-based clustering strategy, respectively.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
No external funding was received.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The IRB exemption decision for this study was ruled by Western Institutional Review Board per below: "We determined this study is exempt from IRB review because it does not meet the definition of human subject research as defined in 45 CFR 46.102. Specifically, this project involves analysis of data from publicly available datasets and deidentified private datasets. The research activities do not involve human subjects, because the activities do not involve interaction or intervention with the subjects. Additionally, the investigator will not be able to readily ascertain the identity of any of the human subjects whose data is used in this project."
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Footnotes
robertsf{at}us.ibm.com, sayali.pethe{at}ibm.com, xuanliu{at}us.ibm.com, hu.huang{at}ibm.com, vishrawas.gopalakrishnan1{at}ibm.com, piyush.madan1{at}ibm.com, jyhu{at}us.ibm.com, prithwish.chakraborty{at}ibm.com, rsrin{at}us.ibm.com, ajayd{at}us.ibm.com, gretchen.jackson{at}ibm.com
Paper in collection COVID-19 SARS-CoV-2 preprints from medRxiv and bioRxiv
The Chan Zuckerberg Initiative, Cold Spring Harbor Laboratory, the Sergey Brin Family Foundation, California Institute of Technology, Centre National de la Recherche Scientifique, Fred Hutchinson Cancer Center, Imperial College London, Massachusetts Institute of Technology, Stanford University, University of Washington, and Vrije Universiteit Amsterdam.