Abstract
As of August 27, 2020, the number of cumulative cases of COVID-19 in the US exceeded 5,863,363 and included 180,595 deaths, thus causing a serious public health crisis. Curbing the spread of Covid-19 is still urgently needed. Given the lack of potential vaccines and effective medications, non-pharmaceutical interventions are the major option to curtail the spread of COVID-19. An accurate estimate of the potential impact of different non-pharmaceutical measures on containing, and identify risk factors influencing the spread of COVID-19 is crucial for planning the most effective interventions to curb the spread of COVID-19 and to reduce the deaths. Additive model-based bivariate causal discovery for scalar factors and multivariate Granger causality tests for time series factors are applied to the surveillance data of lab-confirmed Covid-19 cases in the US, University of Maryland Data (UMD) data, and Google mobility data from March 5, 2020 to August 25, 2020 in order to evaluate the contributions of social-biological factors, economics, the Google mobility indexes, and the rate of the virus test to the number of the new cases and number of deaths from COVID-19. We found that active cases/1000 people, workplaces, tests done/1000 people, imported COVID-19 cases, unemployment rate and unemployment claims/1000 people, mobility trends for places of residence (residential), retail and test capacity were the most significant risk factor for the new cases of COVID-19 in 23, 7, 6, 5, 4, 2, 1 and 1 states, respectively, and that active cases/1000 people, workplaces, residential, unemployment rate, imported COVID cases, unemployment claims/1000 people, transit stations, mobility trends (transit), tests done/1000 people, grocery, testing capacity, retail, percentage of change in consumption, percentage of working from home were the most significant risk factor for the deaths of COVID-19 in 17, 10, 4, 4, 3, 2, 2, 2, 1, 1, 1, 1 states, respectively. We observed that no metrics showed significant evidence in mitigating the COVID-19 epidemic in FL and only a few metrics showed evidence in reducing the number of new cases of COVID-19 in AZ, NY and TX. Our results showed that the majority of non-pharmaceutical interventions had a large effect on slowing the transmission and reducing deaths, and that health interventions were still needed to contain COVID-19.
Introduction
As of August 27, 2020, the number of cumulative cases of COVID-19 in the US exceeded 5,863,363 and included 180,595 deaths, thus causing a devastating public health and economic crisis. Since the number of new cases in the US remains high (43,814 in the US on August 27, 2020), curbing the spread of COVID-19 is urgently needed (Callaway 2020). There is increasing recognition that many geographic, economic and environmental factors contribute to the outbreak of COVID-19. In the absence of vaccines and specifically effective medications, non-pharmaceutical public health interventions and personal hygiene practices are the only options to slow the spread of COVID-19 (Priyadarsini and Suresh, 2020; Irfan 2020). The effects of the different factors and intervention measures on the spread of COVID-19 vary. Identifying key factors that most contribute to the rapid spread of COVID-19, and accurately estimating the potential impact of different non-pharmaceutical measures for containing COVID-19 are crucial for planning the most effective interventions to curb the spread of Covid-19 (Farseev et al. 2020).
The widely used statistical methods for COVID-19 epidemiological factor analysis and evaluation of intervention measures include correlation analysis (Farseev et al. 2020; Tantrakarnapa et al. 2020; Priyadarsini and Suresh, 2020; Nakada and Urban 2020), regression (Chaudhry et al. 2020), the spatial autoregressive (SAR) model (Baum and Henry, 2020), logistic regression (Coccia M 2020) and a transmission dynamic model coupled with a linear model (Livadiotis 2020). The most examined scalar factors consists of underlying health conditions such as high blood pressure, diabetes, stroke, cardiac or kidney diseases, and aging individuals (Priyadarsini and Suresh, 2020; Zhou et al. 2020; Raghupathi 2019), atmospheric temperature (Tantrakarnapa et al. 2020), age, gender, ethnicity, and population density (Priyadarsini and Suresh, 2020; Anderson et al. 2020), airflow (Priyadarsini and Suresh, 2020), and socioeconomics such as median income (Coccia 2020; Saadat et al. 2020).
The most explored non-pharmaceutical public health interventions and digital technologies for curbing the spread of COVID-19 include social distancing, case isolation and quarantine as well as closuring borders, schools travel restrictions, use of face-masks, and testing (Flaxman et al. 2020; Viner et al. 2020; Quilty et al. 2020; Ngonghala et al. 2020) andpopulation surveillance, case identification, contact tracing, mobility data collection, and communication technology, which utilize billions of mobile phones and large online datasets to provide information for the evaluation of intervention strategies and to strengthen the curb of the spread of COVID-19 (Budd et al. 2020; Ngonghala et al. 2020; Badawy and Radovic 2020).
Although association analysis is of great importance for curbing the spread of COVID-19, association measures dependence between two variables or two sets of variables in the data, and use the dependence for prediction and evaluation of the effects of environmental, social-economic factors and public health interventions on the spread of COVID-19 (Altman and Krzywinski 2015; Sharkey and Wood 2020). It is well recognized that association analysis is not a direct method to discover the causal mechanism of complex diseases. Association analysis may detect superficial patterns between intervention measures and transmission variables of COVID-Its signals provide limited information on the causal mechanism of the transmission dynamics of COVID-19 (Steigera et al. 2020). Association analysis has been a major paradigm for statistical evaluation of the effects of influencing factors and health interventions on the spread of COVID-19 (Li et al. 2020). Understanding the transmission mechanism of COVID-19 based on association analysis remains elusive. The question to uncover the transmission mechanisms of COVID-19 is causal in nature.
Distinguishing causation from association is an age-old problem. Methods for causation analysis that is one of the most challenging problems in science and technology need to be developed as an alternative to association analysis (Zenil et al., 2019).A number of researchers have performed causal analysis of COVID-19 to evaluate the causal effects of mobility, awareness, and temperature (Steigera et al. 2020), social distancing (Sharkey and Wood 2020), mobility (Ramachandra and Sun 2020), herd immunity (Friston et al. 2020), and mask use (Chernozhukov et al. 2020). However, most causal analysis of COVID-19 have treated time series data as pseudo-cross sectional data. In some cases causal analysis of COVID-19 treated the data as time series; time series was assumed stationary. In practice, the number of new cases and the number of deaths from COVID-19 were nonstationary time series in most cases. The environmental, social-economic and geographic factors, and intervention measures include two types of data: scalar variables and time series (stationary or nonstationary) variables.
The purpose of this paper is to develop a general framework for the causal analysis of COVID-19 in the US. The number of new cases and deaths from COVID-19 are taken as response variables. The factors and intervention measures are taken as potential causal variables. If the factor and intervention variables are scalar variables, the additive noise models (AMMs) (Peters et al. 2014) are used to test for causation between the response variable and potential causal variable where the number of new cases or deaths should be averaged over time. Most intervention measures are time series data. An essential difference between time series and cross-sectional data is that the time series data have temporal order, but cross sectional data do not have any order. As a consequence, the causal inference methods for cross sectional data cannot be directly applied to time series data. Basic tools in statistical analysis are the raw of large numbers and the central limit theorem. Applications of these tools usually assume that all moment functions are constant. When the moment functions of the time series vary over time, the raw of large numbers and the central limit theorem cannot be applied. In order to use basic probabilistic and statistical theories, the nonstationary time series must be transformed to stationary time series (Johansen 1991).
A widely used concept of causality for time series data is Granger causality (Granger 1969; Eichler 2013). Underlying the Granger causality is the following two principles:
Effect does not precede the cause in time;
The effect series contains unique causal series information which is not present elsewhere.
The multivariate linear Granger causality test will be used to test causality between the number of new cases and deaths from COVID-19 and environmental, economic and intervention time series variables (Bai et al. 2010). The proposed ANMs and multivariate linear Granger causality analysis methods are applied to the surveillance data of lab-confirmed Covid-19 cases in the US, UMD data, and Google mobility data from March 5, 2020 to August 25, 2020 in order to evaluate the contributions of social-biological factors, economics, the Google mobility indexes, and the rate of virus testing to the number of the new cases and number of deaths from COVID-19.
Materials and Methods
Nonlinear additive noise models for bivariate causal discovery
The ANMs are used for identifying causal effect of a factor or an intervention measure on the number of new cases or an intervention measure (Peters et al. 2014; Jiao et al. 2018). Assume no confounding, no selection bias, and no feedback. Let be the average number of new cases or new deaths from COVID-19 and be a scalar factor or an intervention measure such as gender, population density, ethnic group, among others. Consider a bivariate additive noise model X → Y where Y is a nonlinear function of X and independent additive noise EY : where X and EY are independent. Then, the density pX,Y is said to be induced by the additive noise models (ANM) from X to Y (Mooij et al. 2016). In some cases, we may have the following alternative direction of the ANMs: Y → X : where Y and EX are independent. If the density PX,Y is induced by the ANM X → Y, but not by the ANM X → Y, then the ANM X → Y is identifiable.
Assume that n +m state data were sampled. Divide the dataset into a training data set by specifying D1 = {Yn, Xn},Yn = [y1, …,yn]T, Xn = [x1, …,xn]T for fitting the model and a test data set for testing the independence, where n is not necessarily equal to m.
Procedures for using the ANM to assess causal relationships between two variables are summarized below (Jiao et al. 2018).
Step 1. Regress Y on X using the training dataset D1 and non-parametric regression methods:
Step 2. Calculate the residual using the test dataset D2 and test whether the residual is independent of causal X to assess the ANM X → Y.
Step 3. Repeat the procedure to assess the ANM Y→ X.
Step 4. If the ANM in one direction is accepted and the ANM in the other is rejected, then the former is inferred as the causal direction.
There are many non-parametric methods that can be used to regress Y on X or regress X on Y. For example, we can use neural networks (Heydari et al. 2019), smoothing spline regression methods (Wang 2011), B-spline (Wang 2017) and local polynomial regression (LOESS, see Cleveland, 1979). In this paper, the smoothing spline regression method was used to fit the regression models.
Covariance can be used to measure association, but cannot be used to test independence between two variables with a non-Gaussian distribution. A covariance operator that is a generalization of the finite dimensional covariance matrix to infinite dimensional feature space can be used to test for independence between two variables with arbitrary distributions. Specifically, we will use the Hilbert-Schmidt norm of the cross-covariance operator or its approximation, the Hilbert-Schmidt independence criterion (HSIC) to measure the degree of dependence between the residuals and potential causal variable and test for their independence (Gretton et al. 2005; Mooij et al. 2016).
The covariance operator can be defined as where f, g any nonlinear functions and C(X, Y) is the covariance operator and <.> ℱ is an inner product in the ℱ Hilbert space. The Hilbert-Schmidt norm of the covariance operator can be used as criterion for assessing independence between two random variables and is called the Hilbert-Schmidt independence criterion (HSIC). The Hilbert-Schmidt norm of the centered covariance operator is defined as where ‖ · ‖ HS is the Hilbert-Schmidt norm.
We know that (Wang et al. 2018)
HSIC2(X,Y) =0 if and only if X and are Y independent.
HSIC2(X,Y) can be approximated by where n is a sample size, K and L are n × n dimensional kernel matrices and . We used the Gaussian kernel: to test independence between the potential cause X and residual EY; we calculated HSIC2(X, EY) as follows.
In summary, the general procedure for testing independence between the average number of new cases or new deaths and the scalar factor or intervention measure is given as follows (Mooij et al. 2016; Jiao et al. 2018):
Step 1: Divide a data set into a training data set D1 = {Yn,Xn}for fitting the model and a test data set for testing the independence.
Step 2: Use the training data set and smoothing spline regression nonparametric regression methods
Regress Y on X :Y= f Y(x)+EY,
Regress X on Y : X = f X (y) + EX.
Step 3: Use the test data set and estimated smoothing spline regression nonparametric regression that fits the test data set D2={Yn, Xn} to predict residuals:
,
.
Step 4: Calculate the dependence measures HSIC2 (E Y, X) and HSIC2 (E X,Y).
Step 5: Infer causal direction:
If HSIC 2 (EY, X) = HSIC 2 (EX, Y), then causal direction is undecided.
We do not have closed the analytical forms for the asymptotic null distribution of the HSIC and hence it is difficult to calculate the P-values of the independence tests. To solve this problem, the permutation/bootstrap approach can be used to calculate the P-values of the causal test statistics. The null hypothesis is
H0: no causations X → Y and Y → X (Both X and EY are dependent, and Y and EX are dependent).
Calculate the test statistic:
Assume that the total number of permutations is np. For each permutation, we fix Xi,i=1, …, n and randomly permutate Yi,i = 1 …, n Then, fit the ANMs and calculate the residuals and test statistic Tc. Repeat above procedures np times. The P-values are defined as the proportions of the statistic (computed on the permuted data) greater than or equal to (computed on the original test data D2).
Multivariate Linear Granger Causality Test
Before performing multivariate linear Grander causality test, we first need to transform nonstationary time series to stationary time series.
Consider an m-variable VAR with p lags:
Where Yt is a dimensional vector, the Ai (I = 1,.., p) are m × m coefficient matrices and m dimensional residual vector εt, is assumed to have mean zero (E[εt = 0, with no autocorrelation , but can be correlated across equations .
Vector error correction model (VECM) consists of first differences of cointegrated I (1) variables, their lags, and error correction terms: where matrixes П and Фi(i=1,…,p-1) are functions of matrices Ai(i=1,…,p).
When two non-stationary variables are cointegrated, the VAR model should be augmented with an error correction term for testing the Granger causality (Engle and Granger, 1987).
The VECM can be reduced to
Where
Consider two non-stationary time series, Xt=[X1t,…,X m,t]T and Yt=[Y1t, …,Y mt]T. Let m=m1 + m2.
Suppose that Xt and Yt are cointegrated with the residuals ECMt. The VECM model for testing the Granger causality is given by where Ax and Ayare m1 and m2 dimensional vectors of intercept terms, respectively, Axx(L), Axy(L), Ayx(L) and Ayy(L) are n1 × n1, n1 × n2, n2 × n1 and n2 × n2 dimensional matrices of lag polynomials, respectively, and αx and αy are n1 and n2 dimensional coefficient vectors for the error correction term ECMt-1, respectively. The lag length was selected using the two-stage procedure (Abdalla and Murinde, 1997).
There are four different cases of causal relationships between two vectors of time series Xt and Yt (Bai et al. 2010)
If Axy (L) is significantly different from zero, while Ayx (L) shows no significantly different from zero, then there exists a unidirectional Ganger causality from time series Yt to Xt;
If Ayx (L) is significantly different from zero, while Axy (L) shows no significantly difference from zero, then there exists a unidirectional Ganger causality from Xt to Yt;
If both coefficients Axy (L) and Ayx (L) are significantly different from zero, then there exists bidirectional Granger causality between Xt and Yt;
If both coefficients Axy (L) and Ayx (L) are not significantly different from zero, then Xt and Yt are not rejected to be independent.
The four statements imply that Ganger causal relationships between Xt and Yt depend on the coefficients Axy (L) and Ayx (L). Therefore, the null hypotheses for testing the Ganger causality between Xt and Yt are
,
, and
Both and and Axy (L) = 0.
Likelihood ratio tests for multivariate Granger causality are given by the following.
The likelihood ration statistics for testing the null hypothesis: is which is asymptotically distributed as a central under the null hypothesis .
The likelihood ration statistics for testing the null hypothesis: is which is asymptotically distributed as a central under the null hypothesis .
The likelihood ration statistics for testing the null hypothesis: and and Ayx (L)=0 is
which is asymptotically distributed as a central under the null hypothesis and .
Data Collection
Data on the number of new cases and new deaths of Covid-19 across the 50 states in the US were obtained from John Hopkins Coronavirus Resource Center (https://coronavirus.jhu.edu/MAP.HTML). Google mobility indexes were downloaded from Google COVID-19 Community Mobility Reports (https://www.google.com/covid19/mobility/). Comprehensive data and insights on COVID-19’s impact on mobility, economy, and society were downloaded from the University of Maryland COVID-19 Impact Analysis Platform (https://data.covid.umd.edu) (Maryland Transportation Institute, 2020; Zhang et al. 2020). All data were collected from March 5, 2020 to August 25, 2020.
Results
Test for scalar potential causes
The scalar variables tested for causation of the new cases and deaths from COVID-19 in the US included the number of contact tracing workers per 100,000 people, percent of population above 60 years of age, median income, population density, percentage of African Americans, percentage of Hispanic Americans, percentage of males, employment density, number of points of interests for crowd gathering per 1000 people, number of staffed hospital beds per 1000 people, and number of ICU beds per 1000 people. The number of new cases and deaths were averaged over time. Each state was a sample. Since the sample sizes were small, the P-value for declaring significance was 0.05 without Bonferroni correction for multiple comparison. The P-values for testing 11 scalar potential causes of the number of new cases and deaths from COVID-19 in the US were summarized in Table 1. We observed from Table 1 that population density (P-value < 0.0002) and percentage of males (P-value < 0.03) showed significant evidence of causing the spread of COVID-19. Percentage of Hispanic Americans (P-value < 0.0575) was close to significance. Percentage of African American (P-value < 0.024) and population density (P-value < 0.025) showed significant evidence of causing deaths due to COVID-19. P-values of employment density (P-value < 0.059) and percentage of Hispanic Americans (P-value < 0.064) were close to significance level 0.05 for causing death.
The second most significant demographic risk factor for the spread of COVID-19 was percentage of males. We found higher COVID-19 morbidity in males than females. However, we did not find higher COVID-19 mortality in males than females.
Population density was an important risk factor for both the spread and death from COVID-High density resulted in closer contact, stronger interaction among residents and lower social distancing, which facilitated the spread and increased the death rate from COVID-19 (Rocklöv and Sjödin 2020; Pequeno et al. 2020; Henderson 2020; Rajan et al. 2020). However, our results were contradictory with the conclusion of Hamidi et al. (2020). Some literature also confirmed that high proportion of African Americans caused a high rate of deaths (Rajan et al. 2020; Golestaneh et al. 2020; Mahajan and Larkins-Pettigrew 2020). Our results concluded that percentage of Hispanic Americans was a weak risk factor for both the spread and death fromCOVID-19, while the literature showed stronger evidence that Hispanic communities were highly vulnerable to COVID-19 (Calo et. Al. 2020).
It was reported that higher COVID-19 mortality in males than females can be due to the following factors (Bwire 2020). The first factor was higher expression of angiotensin-converting enzyme-2 (ACE 2; receptors for coronavirus) in males than females. The second factor was sex-based immunological differences due to sex hormone and the X chromosome.
Test for Granger Causality
Daily mobility and social distancing data from a COVID-19 impacted the analysis platform, including four categories: category A: mobility and social distancing, category B: COVID and health, category C: economic impact, and category D: vulnerable population. A total of 12 temporal metrics in four categories and 12 metrics from the COVID-19 impact analysis platform, six daily Google Community Mobility indexes and protest attendee data that captured real-time trends in movement patterns for each state in the US were included in the analysis to test for Granger causality between these risk factors, health intervention measures and the number of new cases and deaths from COVID-19 across 50 states in the US (Zhang et al. 2020; Google community mobility reports, 2020). The total number of variables to be tested was 19. The P-value for declaring significance after Bonferroni correction was 0.0025.
All 19 metrics except for protest attendee showed high significance in causing a reduction of the new cases of COVID-19 in 19 less affected states: VT, WY, ME, AK, NH, WV, ND, SD, NM, RI, DE, KY, KS, CT, CO, IA, WA, WI, and MS. Most of these states were less populated. However, although CA was most affected and the most populated state, all 19 metrics except protest attendance showed a strong significance in causing rapid spread of COVID-19 (Table 2 and Table S1).
All 19 metrics showed no significance in causing reduction of the new cases of COVID-19 in Florida. The majority of the 19 metrics did not demonstrate evidence that they can significantly mitigate the spread of COVID-19 in most the affected states such as TX, NY, GA, IL, AZ, NJ, NC, and TN. These 10 states were in the top largest states by population in the US. Public health intervention measures such as closing schools and businesses, avoiding public gatherings, restricting traffic, placing residents to stay-at-home and adherence to guidelines were less well implemented or difficult to implement homogeneously due to large populations and geographical areas (Althouse et al. 2020). These results also explained why the number of new cases of COVID-19 in these states was high and confirmed by several studies (Lai et al. 2020; Masrur et al. 2020; Goldschmidt-Clermont 2020).
Table 3 summarized the ranges of P-values and Table S2 summarized all P-values for testing 18 temporal potential causes of the number of new deaths from COVID-19 across 50 states in the US. All 19 metrics except for protest attendance showed high significant evidence for causing a reduction of new deaths across 50 states except for Michigan (MI) in the US. Our results suggested that a cascade of causes led to the COVID-19 tragedy in the US.
Table 4 listed the most significant risk factor for the new cases of COVID-19 in each of the 50 states in the US. Active Cases/1000 People, workplaces, number of tests completed/1000 people, imported COVD cases, unemployment rate and unemployment claims/1000 people, mobility trends for places of residence (residential), retail & recreation, mobility trends for places like restaurants, cafes, shopping centers, theme parks, museums, libraries, and movie theaters (retail) and test capacity were the most significant risk factors for the new cases of COVID-19 in 23, 7, 6, 5, 4, 2, 1 and 1 states of the US, respectively.
Table 5 summarized the most significant risk factor for the deaths from COVID-19 in each of the 50 states in the US. Active Cases/1000 people, workplaces, residential, unemployment rate, imported COVID cases, unemployment claims/1000 people, transit, test done/1000 people, grocery, testing capacity, retail, percentage of change in consumption, percentage of working from home were the most significant risk factor for the deaths of COVID-19 in 17, 10, 4, 4, 3, 2, 2, 2, 1, 1, 1, 1 states, respectively. We also observed that the number of protest attendees showed mild significant evidence to cause increasing the number of new cases of COVID-19 in KY (P-value < 0.00012), KS (P-value < 0.00026), NH (P-value < 0.00108), MA (P-value < 0.0016) and TN (P-value < 0.0024) or to cause more deaths from COVID-19 in OR (P-value < 5.11 E-05), TX (P-value < 0.00017), ME (P-value < 0.00028), KS (P-value < 0.00061), MI (P-value < 0.0015), OH (P-value < 0.0021) and NC (P-value < 0.0023).
To illustrate the causal relationships between the risk factors and the number of new cases and deaths from COVID-19, we plotted Figures 1 and 2. Figure 1 plotted the social distance index curves as a function of time from March 5, 2020 to August 25, 2020 in Florida (FL) and Rhode Island (RI). Figure 1 showed that the social distance index in FL was much higher than that in RI state, which resulted in the larger number of new cases of COVID-19 in FL than that in RI. Figure 2 showed the number of imported COVID-19 cases as a function of time from March 5, 2020 to August 25, 2020 in Maryland (MD) and Wyoming (WY). We observed a huge difference in the number of imported COVID-19 cases between MD and RI. The very low number of imported cases of COVID-19 in WY resulted in the very low number of deaths from COVID-19 in WY, while the high number of imported cases in MD state led to the increased deaths from COVID-19 in MD. These results were consistent with the finding in the literature. It was reported that strong interventions would substantially decrease the number of deaths (Davies et al. 2020; Gagnon et al. 2020).
Discussion
Causal inference for COVID-19 is essential for selecting and implementing public intervention measures and understanding the role of the demographics in curbing the spread and reducing the deaths from COVID-19. In this paper, we systematically addressed the issues in identifying causal risk factors and evaluating the causal effects of risk factors and intervention measures on the spread and deaths from COVID-19 in the US. Risk factors and intervention measures included scalar variables and temporal variables. The ANMs were used to test for causal relationships between scalar risk factors and the average number of new cases or deaths from COVID-19 in the US. Transmission of COVID-19 is a dynamic system. Many risk factors and intervention measures are temporal variables. The Granger Causality Test was used to reveal the causal relationships between the temporal risk factors and intervention measures, and the number of new cases or new deaths from COVID-19 across the 50 states in the US.
The demographic risk factors were the major part of the scalar risk factors in the causal analysis of COVID-19. We found that population density was the most significant causal factor of both new cases and death from COVID-19. Population density measured the average number of people per kilometer living in a built-up area. Densely populated states generated conditions where COVID-19 can spread quickly and undetected in the densely populated areas and created high levels of vulnerability. The second significant demographic factor was percentage of males. Our data suggested that men were more vulnerable to Covid-19 than women. However, our analysis did not conclude that more men than women were dying from COVID-19.
We also discovered that more Black Americans were dying from COVID-19. The reasons for this were complex. Black Americans had higher rates of chronic disease conditions, including diabetes, heart disease, and lung disease, were poor and more easily exposed to the COVID-19, and lived in the cramped housing. Inequities in the social determinants of health affected mortality and morbidity of COVID-19 for Hispanic Americans with much milder significance.
We studied the causal effect of major public health interventions across the 50 states in the US. In the absence of centralized intervention measures and implementation of a timeline and presence of the complex dynamics of human mobility and the variable intensity of local outbreaks of COVID-19, evaluating the causal effect of public health intervention measures on COVID-19 transmission and deaths in the USA posed a great challenge. We used 6 Google mobility indexes and 12 daily metrics to measure the effects of COVID-19 spread and public health interventions on mobility and social distancing, derived from mobile device location data and COVID-19 case data, provided by the University of Maryland COVID-19 Impact Analysis Platform. These real time metrics capture the dynamics of social distancing. Granger causality tests were used to identify the causality relationships between time series metrics and time varying in the number of new cases or deaths from COVID-19. Although the risk factors differed by location, Active Cases/1000 people were a significant risk factor for both number of new cases and deaths from COVID-19 in most states. The most popular intervention measure in the US was workplaces (mobility trends for places of work). Workplaces were the significant cause of the number new cases of COVID-19 in 44 states and significant cause of death in 49 states. Therefore, workplaces should be considered as a very important risk mitigation measure to reduce the number of new cases and deaths from COVID-19. Tests done/1000 people was the second population intervention in the US. It was the significant cause of the new cases of COVID-19 in 46 states and significant cause of death in 47 states. Virus test results in quick case identification and isolation to contain COVID-19, and rapid treatment to reduce the number of deaths. Imported COVID cases were also a top significant risk factor for speeding the spread and increasing the deaths from COVID-19. Our results showed that the imported COVID case metric was the significant causal factor for the new cases in 46 states and the significant causal factor for the deaths in 47 states.
Our results showed that the high numbers of cases and deaths from COVID-19 were due to lacking strong interventions and high population density. We observed that no metrics showed significant evidence in mitigating the COVID-19 epidemic in FL and only a few metrics showed evidence in reducing the number of new cases of COVID-19 in AZ, NY and TX. Our results showed strong interventions were needed to contain COVID-19.
Although we tried to systematically and comprehensively analyze the data, this study has multiple limitations. First, we only analyzed the causal relationship between mobility patterns and the number of new cases or deaths and ignored the role of other potential mitigating factors (e.g, wearing face masks) that could also have contributed to the reduction of new cases or deaths from COVID-19. When data are available, more metrics should be included in the analysis.
Second, we have not addressed the confounding bias issue. When confounding is unknown, adjusting for confounding methods cannot be applied to eliminate confounding bias from the causal analysis. Unadjusted confounding bias will distort the inferred (true) causal relationship between the number of new cases or deaths from COVID-19, and metrics for social distancing when these two variables share common causes. This will have substantive implications for developing interventions to mitigate the spread of COVID-19 and reduce the deaths from COVID-19. However, removing confounding from causal analysis for COVID-19 is complicated and will be investigated in the future.
In summary, our analysis has provided information for both individuals and governments to plan future interventions on containing COVID-19 and reduction of deaths from COVID-19.
Data Availability
All data are publicly available.
https://coronavirus.jhu.edu/MAP.HTML
Acknowledgement
HW Deng was partially supported by NIH grants U19AG05537301 and R01AR069055. Momiao Xiong was partially supported by NIH grants U19AG05537301. The authors thank Sara Barton for editing the manuscript.