BayesSMILES: Bayesian Segmentation Modeling for Longitudinal Epidemiological Studies ==================================================================================== * Shuang Jiang * Quan Zhou * Xiaowei Zhan * Qiwei Li ## Abstract Coronavirus disease 2019 (COVID-19) is a pandemic. To characterize the disease transmissibility, we propose a Bayesian change point detection model using daily actively infectious cases. Our model is built upon a Bayesian Poisson segmented regression model that can 1) capture the epidemiological dynamics under the changing conditions caused by external or internal factors; 2) provide uncertainty estimates of both the number and locations of change points; 3) adjust any explanatory time-varying covariates. Our model can be used to evaluate public health interventions, identify latent events associated with spreading rates, and yield better short-term forecasts. Keywords * Bayesian hierarchical modeling * Multiple change-point detection * Poisson segmented regression * Stochastic SIR model ## 1 Introduction A newly identified coronavirus, SARS-CoV-2, is a lethal virus for humans. It has caused the worldwide pandemic – COVID-19. As reported by the Johns Hopkins University Center for Systems Science and Engineering (JHU-CSSE), the COVID-19 pandemic has spread to 188 countries and territories with more than 14 million confirmed cases by the end of July 2020. The extremely rapid spreading of the disease and the increasing burden on healthcare systems becomes a major problem for public health. In a response to “flatten the curve” (Akiyama et al., 2020), both federal and local governments in the United States (U.S.) have enforced a wide range of public health measures, such as border shutdowns, travel restrictions, and quarantine. As a consequence, the importance of understanding the COVID-19 dynamics is steadily increasing in the contemporary world. In epidemiology, the basic reproduction number, ℛ, is commonly used to evaluate the transmissibility of an infectious disease like COVID-19. ℛ is defined as the expected number of secondary cases produced by a typical case in a closed population. During the outbreak of an epidemic, the basic reproduction number can be affected by intervention strategies. For example, social measures that decrease the contact rate between individuals would control the basic reproduction number. Isolating or treating the infected cases could lower the ℛ value as well. Another concept in the epidemic theory is the effective reproduction number ℛ*t*, which describes the number of people who can be infected by an individual at any specific time *t* in a population. ℛ*t* is time-specific since it accounts for the varying proportions of the population that are immune to the disease over time. There are many recent studies implementing the SIR model (Kermack and McKendrick, 1927) or its modified version to analyze the COVID-19 transmissibility in terms of ℛ or ℛ*t* (see e.g. Chen et al., 2020; Alvarez et al., 2020; Kantner and Koprucki, 2020; Gostic et al., 2020; Cooper et al., 2020). Furthermore, several studies have incorporated the information on social measures to understand the COVID-19 dynamics over the world. For instance, Dehning et al. (2020) combined the SIR model with Bayesian inference to study the time-dependent spreading rate of COVID-19 in Germany. Song et al. (2020) extended the SIR model by considering the quarantine protocols with a focus on understanding the time-course dynamics of COVID-19 in Hubei, China. Giordano et al. (2020) enriched the SIR model with additional compartments to account for under-diagnosis, which could explain the gap between the actual infection dynamics and perception of COVID-19 outbreak in Italy. Because of heterogeneity in susceptibility and social dynamics, the COVID-19 transmissibility differs among locations and change over time. U.S. local governments have started different interventions since mid-March to combat the widespread of COVID-19. Therefore, the basic reproduction numbers should spatiotemporally vary. The basic reproduction number of an epidemic event is changing due to the effects of societal and political actions. The effective social measures such as business closures and stay-at-home orders could help lower the transmission rate and thus induce changes in ℛ. By studying the variations in ℛ over time, we can evaluate the dynamic transmissibility of infectious diseases like COVID-19. For instance, during the outbreak of Severe acute respiratory syndrome (SARS) in China around 2003, Riley et al. (2003); Lloyd-Smith et al. (2003) reported an ℛ ≈ 3.0 for the onset stage of SARS in Hong Kong. Later in Chowell et al. (2004), the reported ℛ for Hong Kong dropped to about 1.1 due to stringent control measures. Decreasing in ℛ captured the evolution of SARS transmission dynamics under the efficient diagnosis coupled with patient isolation measures. A recent study in Germany (Dehning et al., 2020) estimated the variations in COVID-19 transmission rates for the four phases defined by the time of three major official government interventions. These studies motivated us to detect the important transitioning times that occurred during the COVID-19 outbreak in the U.S. The change points that caused the transitions partitioned the COVID-19 data into segments characterized by unique temporal patterns. A stochastic version of the discrete-time SIR model was then applied to estimate the basic reproduction numbers across segments. For a given population, the variation in the reproduction numbers across different stages reflects the efficacy of the state governmental interventions. We propose BayesSMILES: Bayesian Segmentation ModelIng for Longitudinal Epidemiological Studies, to study the dynamics of COVID-19 transmissibility and to evaluate the effectiveness of mitigation interventions. The BayesSMILES adopts a Bayesian Poisson segmented regression model to detect multiple change points based on the daily infectious COVID-19 cases. This novel model can 1) capture the epidemiological dynamics under the changing conditions caused by external or internal factors; 2) quantify the uncertainty in both the number and time of change points; 3) adjust any relevant explanatory time-varying covariates that may affect the infectious case numbers. In addition, the BayesSMILES incorporated the change point information to quantify the COVID-19 transmissibility by estimating the distribution of the basic reproduction numbers among change-points segmented temporal spans. On the simulated data, we demonstrate that our approach can improve the accuracy of the change point detection as compared with a common multiple change-point search method. Applying the proposed methods into U.S. states, we find that the detected change points correlate well with the times of publicly announced interventions. We also demonstrate that the SIR model integrated with change point information provided a better short-term forecast. In all, the BayesSMILES enables us to understand the disease dynamics of COVID-19, and provides useful insights for the anticipation and control of current and future pandemics. The rest of the paper is organized as follows. We review the traditional susceptible-infectious-recovered (SIR) model in Section 2. In Section 3, we describe the framework of the proposed BayesSMILES. The Markov chain Monte Carlo algorithm and posterior inference procedures are described in Section 4. We provide a comprehensive simulations study to illustrate the performance of the proposed method against a competing approach in Section 5. In Section 6, we analyze the COVID-19 data for U.S. states using the proposed BayesSMILES. Finally, we conclude with remarks in Section 7. ## 2 Review of the SIR Model The susceptible-infected-removed (SIR) model was developed to simplify the mathematical modeling of human-to-human infectious diseases by Kermack and McKendrick (1927). It is a fundamental compartmental model in epidemiology. At any given time, each individual in a closed population with size *N* is assigned to three distinctive compartments with labels: susceptible (*S*), infectious (*I*), or removed (*R*, being either recovery or dead). The standard SIR model in continuous time that models the flow of people from *S* to *I* and then from *I* to *R* is described by the following set of nonlinear ordinary differential equations (ODEs): ![Formula][1] for *t* > 0, subjecting to *S*(*t*) + *I*(*t*) + *R*(*t*) = *N*. Here *β* > 0 is the diseases transmission rate and *γ* > 0 is the removal (recovery or death) rate. Conceptually, susceptible individuals become infectious (*S* → *I*) and then are ultimately removed from the possibility of spreading the disease (*I* → *R*) due to death, recovery with immunity against reinfection, or isolation from the rest of the population (quarantine). The rationale behind the first equation is (1) is that the number of new infections during an infinitesimal amount of time, −*dS*(*t*)*/dt*, is equal to the number of susceptible people, *S*(*t*), times the product of the contact rate, *I*(*t*)*/N*, and the disease transmission rate *β*. The third equation in (1) reflects that the infectious individuals leave the infectious population per unit time, *dI*(*t*)*/dt*, as a rate of *γI*(*t*). The second equation in (1) follows immediately from the first and third ones as a result of *dS*(*t*)*/dt* + *dI*(*t*)*/dt* + *dR*(*t*)*/dt* = 0. Assuming that only a small fraction of the population is infected or removed in the onset phase of an epidemic, we have *S*(*t*)*/N* ≈ 1 and thus the second equation reduces to *dI*(*t*)*/dt* = (*β* − *γ*)*I*(*t*), revealing that the infectious population is growing if and only if *β* > *γ*. As the expected lifetime of an infected case is given by *γ*−1, the ratio *β/γ* is the average number of new infectious cases directly produced by an infected case in a completely susceptible population. Since it is a good indicator of the transmissibility of an infectious disease, the epidemiologists name it the *basic reproduction number* ℛ = *β/γ* in the context of a standard SIR model, or the *effective reproduction number* ℛ*t* = *β**t**/γ**t* in the context of a time-variant SIR model, where *β* and *γ* are replaced by *β*(*t*) and *γ*(*t*) in (1). ## 3 The Proposed BayesSMILES Method ### 3.1 Data notations During a pandemic such as COVID-19, the most accessible and complete data are the daily reported numbers on confirmed cases and deaths. Suppose *N* is the total population size in a given region. Let ***C*** = (*C*1, …, *C**T*) and ***D*** = (*D*1, …, *D**T*) be the sequences of cumulative confirmed case and death numbers observed at *T* successive equally spaced points in time (e.g. day), where *C**t* and *D**t* ∈ ℕ for *t* = 1, …, *T*. For a region of which recovery cases are closely monitored day by day, we use ***E*** = (*E*1, …, *E**T*) to denote the sequence of cumulative recovery case numbers. Thus, due to the compositional nature of the basic SIR model, the three trajectories can be constructed as ***S*** = (*S*1, …, *S**T*) with *S**t* = *N* − *C**t*, ***R*** = (*R*1, …, *R**T*) with *R**t* = *D**t* + *E**t*, and ***I*** = (*I*1, …, *I**T*) with *I**t* = *N* − *S**t* − *R**t* = *C**t* − *D**t* − *E**t*. For a region of which recovery cases do not exist or under-reported, we consider both ***R*** and ***I*** as missing data and reconstruct these two sequences according to the last equation of (1) with a pre-specified constant removal rate *γ*. Specifically we set *I*1 = *C*1 and *R*1 = 0, and generate *R**t* = *R**t*−1 + *rI**t*−1 and *I**t* = *I**t*−1 +(*C**t* − *C**t*−1) − (*R**t* − *R**t*−1) from *t* = 2 to *T* sequentially, where. : [0, *∞*) → N denotes the ceiling function. For the choice of *γ*, we suggest estimating its value from publicly available reports in some region where confirmed, death, and recovery cases are all well-documented, or from prior epidemic studies due to the same under-reporting issue in actual data. Lastly, let *Y* be the initial value and ![Graphic][2] be the lag one difference of a sequence ***Y***, where ![Graphic][3] and each following entry ![Graphic][4], i.e. the difference between two adjacent observations. Note that ***Y*** could be any of those time-series data introduced above, i.e. ***C, D, E, S, I***, and ***R***. ### 3.2 Modeling epidemic dynamics via a modified stochastic SIR model An SIR model has three trajectories, one for each compartment. The compositional nature of the three trajectories, i.e. *S*(*t*) + *I*(*t*) + *R*(*t*) = *N*, implies that we only need two of them to solve the ODEs as shown in (1). As mentioned previously, assuming *S*(*t*) ≈ *N* for all *t* results in *dI*(*t*)*/dt* = (*β* − *γ*)*I*(*t*) and further leads to an exact solution: *I*(*t*) = *I*(0) exp((*β* − *γ*)*t*). For modeling daily reported infectious data ***I***, we utilize its discrete-time version, ![Formula][5] with a time-varying rate *β**t* to account for the transmissibility changes of the disease. For simplicity’s sake, we assume a constant removal rate *γ*. Based on (2), we introduce a probabilistic model, which approximately mimics the dynamics of the deterministic standard SIR model as shown in (1) during the onset phase of a pandemic. Specifically, we suppose the infectious population size at time *t* is sampled from a Poisson model, ![Formula][6] where *α**t* = *I* exp ((*β**t* − *γ*)*t*)*/N* is a redefined time-varying transmissibility parameter that depends on the initial infectious population size *I*, the transmission rate *β**t*, the removal rate *γ*, and any latent factors (e.g. public health intervention, social behavior, virus mutation, healthcare quality, etc.) that may affect the disease transmissibility. This model automatically accounts for measurement errors and uncertainties associated with the counts. Note that (3) can be generalized to a negative binomial (NB) model, i.e. *I**t*|*α**t* *∼* NB(*Nα**t*, *ϕ**I*) if needed, where *ϕ**I* is a dispersion parameter to account for over-dispersion that might be observed in ***I***. Here we use NB(*µ, ϕ*), *µ, ϕ* > 0 to denote an NB distribution with expectation *µ* and variance *µ* + *µ*2*/ϕ*. ### 3.3 Detecting multiple change-point via a Poisson segmented regression model Our multiple change-point detection builds upon the above modified stochastic time-variant SIR model (3) with stationary transmissibility *α**t* in a certain segment. Particularly, we assume that *β**t* only changes at certain time points. Identifying those change points is of significant importance, as it not only enables us to characterize the temporal dynamics of the pandemic but also helps policymakers evaluate the effectiveness of the past and ongoing mitigation and intervention strategies. In this paper, the change points are defined as those discrete time points that significantly alter the transmission rate *β**t* between two adjacent segments, while assuming the removal rate *γ* is constant across all time points. We introduce a latent binary vector ***γ*** = (*γ*1, …, *γ**T*), *γ**t* ∈ {0, 1}, with *γ**t* = 1 if time point *t* is a change point and *γ**j* = 0 otherwise. We restrict *γ*1 = 1, although the first time point is not a change point. Those points with *γ**j* = 0 can be partitioned into segments bounded by two adjacent change points. Thus, we use another vector ***z*** = (*z*1, …, *z**T*), *z**i* ∈ {1, …, *K*} to reparameterize ***γ***, where *z**t* = *k* if time point *t* is in segment *k*, that is, between the *k* and (*k* + 1)-th change points. The total number of change points excluding the first time point is *K* − 1. Mathematically, ***z*** is the cumulative sum of ***γ***, i.e. ![Graphic][7], while ***γ*** is the lag one difference of ***z***, i.e. *γ**t* = *z**t* − *z**t*−1 with *γ*1 = 1. Figure 1 shows the representation of ***γ*** and ***z*** for a simulated time-series dataset (*T* = 10) with two change points. ![Figure 1:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/10/08/2020.10.06.20208132/F1.medium.gif) [Figure 1:](http://medrxiv.org/content/early/2020/10/08/2020.10.06.20208132/F1) Figure 1: An example of time-series data (*T* = 10) with two change points (*K* = 3) and its associated parameterizations in terms of ***γ*** and ***z***, respectively. Red triangles are change points, while black circles are not. Note that the first time point is not a change point, although it is defined as a “change point” for convenience’s sake. To infer ***γ*** or ***z*** given the sequence of infectious population size ***I***, we adopt a Poisson segmented regression framework, which can be written as, ![Formula][8] where ***x****t* is a *p*-dimensional column vector of covariates that includes a scalar of one for the intercept, time *t*, and *p* − 2 explanatory variables observed at time *t* if applicable. Those explanatory variables could contain the number of tests, weather information, or other temporal information that are important to adjust for during a longitudinal epidemiological study. ***b****k* is a *p*-dimensional column vector of segment-specified coefficients that includes an intercept representing the proportion of infectious people at logarithmic scale, i.e. *b**k*1 = log(*I**/N*), in segment *k*, and a slope accounting for the segment-varying transmission rate, i.e. *b**k*2 = *β**k* − *γ*. For simplicity’s sake, we assume that *ϵ**t* is an independent and identically distributed random variable from a zero-mean normal distribution with segment-specific variance ![Graphic][9]. To ensure the identifiability of all model parameters, we try a set of considerably small values for ![Graphic][10] and employ a robust cross validation method called Pareto-smoothed importance sampling leave-one-out (PSIS-LOO) cross validation to determine the best choice (Vehtari et al., 2017). Let ***α****k* be the sequence of all *α**t*’s in segment *k*, i.e. ![Graphic][11], where we denote ![Graphic][12] as the time point corresponding to the *k*-th change point and ![Graphic][13] as the number of time points in segment *k*. We can rewrite the second stage in (4) as ![Graphic][14], where ![Graphic][15] is an identity matrix and ***X****k* can be explicitly written as ![Formula][16] We assume a zero-mean multivariate normal distribution for ***b****k*, that is, ***b****k* *∼* MN(*****p*, ***H***), where *****p* is an *p*-by-1 all zero column vector and ![Graphic][17] is set to be a diagonal variance-covariance matrix. For a non-informative setting, we recommend choosing a large value for each ![Graphic][18]. Through this prior specification, we are able to marginalize out the nuisance parameter ***b****k* and obtain ![Graphic][19]. Consequently, we can write the collapsed model of (4) as ![Formula][20] To complete the model specification, we impose an independent Bernoulli prior on ***γ*** as ![Graphic][21], where *ω* is interpreted as the probability of a time point being a change point *a priori*. We further relax this assumption by allowing *ω ∼* Be(*a**ω*, *b**ω*) to achieve a beta-Bernoulli prior. In practice, we suggest a constraint of *a**ω* + *b**ω* = 2 for a vague hyperprior of *ω* (Tadesse et al., 2005). In addition to that, we make the prior probability of ***γ*** equal to zero if two adjacent time points are jointly selected as change points. In other words, a segment should consist of at least two time points. ### 3.4 Estimating basic reproduction numbers via a stochastic SIR model Given the multiple change points ***γ***, we estimate the basic reproduction number *R* for each segment via a stochastic version of the standard SIR model as shown in (1), which only needs the cumulative confirmed case numbers ***C***. This is because recovery data only exist in few states in the U.S., which makes both model inference and predictions infeasible. This model considers both of the removed and actively infectious cases as missing data and mimic their relationship in spirit to some compartmental models in epidemiology. Specifically, we assume the number of new removed cases at time *t*, i.e. ![Graphic][22], is sampled from a Poisson distribution with mean *γI**t*−1, that is, ![Graphic][23], where *γ* should be pre-specified. Based on this simplification, we rewrite the discrete version of the first equation in (1) as, ![Formula][24] resulting in ![Formula][25] We further assume the new case number observed at time *t*, i.e. ![Graphic][26], is sampled from an NB model, ![Formula][27] as it automatically accounts for measurement errors and uncertainties associated with the counts. Following most epidemiological models, we assume this stochastic process is a Markov process, where the present state (at time *t*) depends only upon its previous state (at time *t* − 1). For the prior distribution of the dispersion parameter *ϕ*, we choose a gamma distribution, *ϕ ∼* Ga(*a**ϕ*, *b**ϕ*). We recommend small values, such as *a**ϕ* = *b**ϕ* = 0.001, for a non-informative setting. This model, on average, mimics the epidemic dynamics and is more flexible than those deterministic epidemiological models. We assume *β* comes from a gamma distribution with hyperparameters that makes both mean and variance of the transformed variable ℛ = *β/γ* equal to 1. ## 4 Model Fitting In this section, we describe the Markov chain Monte Carlo (MCMC) algorithms for posterior inference of the proposed BayesSMILES method, including the inferential strategy to identify multiple change points and to estimate the basic reproduction numbers, respectively. ### 4.1 MCMC algorithms for detecting multiple change-point Our primary interest lies in identifying the change-point locations via the vector ***γ*** based on the actively infectious cases ***I***. According to Section 3.3, the full data likelihood and the priors of the change-point detection model are written as, ![Formula][28] where we use the superscript (*γ*) to denote the transmissibility parameters ***α****k*, the design matrix ***X****k*, and the number of time points *n**k* in segment *k* under a specific configuration of ***γ***. At each MCMC iteration, we perform the following steps. #### Update change-point indicator *γ* We update the binary latent vector ***γ*** via an *add-delete-swap* algorithm. We randomly select an entry in ***γ***, say *γ**t*, and change its value to ![Graphic][29] to form a new ***γ******. This is an *add* step if ![Graphic][30] and a *delete* step otherwise. The *swap* step is performed every ten iterations, where we randomly select a change point, say *γ**t* = 1, and swap the values between the *t* and (*t* ± 1)-th entries in ***γ*** to form a new ***γ******. We accept the proposed ***γ****** with the probability min(1, *m*MH), where the Hasting ratio is ![Formula][31] Here we use *J* (· ← ·) to denote the proposal probability distribution for the selected move. Note that the last proposal density ratio equals one. This step simultaneously updates the segmentation vector ***z***, as it can be constructed from ***γ***. #### Update transmissibility parameters *α* For each segment partitioned by ***γ***, we update *α**t* within the same segment, say segment *k*, sequentially by using a random walk Metropolis-Hastings (RWMH) algorithm. We first propose a new ![Graphic][32], of which logarithmic value is generated from ![Graphic][33]. Let ![Graphic][34]. Then we accept the proposed value ![Graphic][35] with probability min(1, *m*MH), where the Hastings ratio is ![Formula][36] Note that the proposal density ratio cancels out for this RWMH update. The computation of the multivariate normal (MN) probability density involves matrix inversion, which can be time-consuming, particularly when ![Graphic][37] is large. To significantly improve the computational efficiency, we follow Zhou and Guan (2018) to approximate the exact inversion under an appropriate choice of ***H*** that satisfies the asymptotic condition. As mentioned previously, ***H*** is a *p*-by-*p* diagonal matrix, where the first entry ![Graphic][38] corresponds to the variance of the normal prior on *b**k*1. Under the asymptotic condition of ![Graphic][39], the inversion of an ![Graphic][40]-by-![Graphic][41] matrix reduces to an inversion of a *p*-by-*p* matrix (See more details in Appendix). In practice, we set ![Graphic][42] and ![Graphic][43] to ensure this asymptotic condition. ### 4.2 MCMC algorithms for estimating basic reproduction numbers Once the change points are determined, we aim to estimate the basic reproduction numbers ℛ′s across different segments and quantify their uncertainties based on the cumulative confirmed cases ***C*** only. According to Section 3.4, the full data likelihood and the priors of the stochastic SIR model are written as, ![Formula][44] where ***β*** = (*β*1, …, *β**K*) and ***ϕ*** = (*ϕ*1, …, *ϕ**K*), i.e. the collections of transmission and dispersion rates of all segments. For the hyperparameters, we set *a**β* = 1 and *b**β* = 1*/γ* so that both of the expectation and variance of the basic reproduction number ℛ = *β**k**/γ* are equal to one. We specify *a**ϕ* = *b**ϕ* = 0.001 to obtain a weakly informative gamma prior for all dispersion parameters. With a pre-defined removal rate *γ*, we propose the following updates in each MCMC iterations. #### Generate *R* based on *C* We assume *I*1 = *C*1 and *R*1 = 0, i.e. all the confirmed cases are capable to pass the disease to all susceptible individuals in a closed population at time point *t* = 1. Then we sample *R*2 *∼* Poi(*γI*1), where *γ* is a pre-specified tuning parameter. Due to the compositional nature of the SIR model, we can compute ![Graphic][45], where ![Graphic][46] and ![Graphic][47] are the new confirmed cases and new removed cases at time point *t* = 2, respectively. Next, we repeat this process of sampling *R**t* ∼ Poi(*γI**t*−1) and computing ![Graphic][48], to generate the sequence ***R*** used in an iteration. #### Update dispersion parameters *ϕ* For each segment, we update *ϕ**k* by using a RWMH algorithm. We first propose a new ![Graphic][49], of which logarithmic value is generated from ![Graphic][50]. Let ![Graphic][51], where only the *k*-th entry is replaced. Then we accept the proposed value ![Graphic][52] with probability min(1, *m*MH), where the Hastings ratio is ![Formula][53] Note that the proposal density ratio cancels out for this RWMH update. #### Update transmission rates *β* For each segment, we update *β**k* by using a RWMH algorithm. We first propose a new ![Graphic][54], of which logarithmic value is generated from ![Graphic][55]. Let ![Graphic][56], where only the *k*-th entry is replaced. Then we accept the proposed value ![Graphic][57] with probability min(1;mMH), where the Hastings ratio is ![Formula][58] Note that the proposal density ratio cancels out for this RWMH update. ### 4.3 Posterior inference We explore posterior inference on the parameters of interest by postprocessing the MCMC samples after burn-in. We start by obtaining a point estimate of the change-point indicator ***γ*** by analyzing its MCMC samples {***γ***(*b*), …, ***γ***(*B*)}, where *b* indexes the MCMC iteration after burn-in. One way is to choose the ***γ*** corresponding to the *maximum-a-posteriori* (MAP), ![Formula][59] The corresponding ![Graphic][60] can be obtained by taking the cumulative sum of ![Graphic][61]. An alternative estimate relies on the computation of posterior pairwise probability matrix (PPM), that is, the probabilities that time point *t* and *t*′ are assigned into the same segment, ![Graphic][62]. This estimate utilizes the information from all MCMC samples and is thus more robust. After obtaining this *T* -by-*T* co-clustering matrix denoted by ***P***, a point estimate of ***z*** can be approximated by minimizing the sum of squared deviations of its association matrix from the PPM, that is, ![Formula][63] The corresponding ![Graphic][64] can be obtained by taking the difference between consecutive entries in ![Graphic][65] and set the first entry to one. To construct a “credible interval” for a change point, we utilize its local dependency structure from all MCMC samples of ***γ*** that belong to its neighbors. Due to the nature of the MCMC algorithm described in Section 4.1, if a time point *t* is selected as a change point, i.e. *γ**t* = 1, then its nearby time points must not be a change point. Thus the correlation between the MCMC sample vectors ![Graphic][66] and ![Graphic][67] tends to be negative when *u* is small. Thus, we define the credible interval of a change point as the two ends of all its nearby consecutive time points, of which MCMC samples of ***γ*** are significantly negatively correlated with that of the change point. This could be done via a one-sided Pearson correlation test with a pre-specified significant level, e.g. 0.05. Although quantifying uncertainties of change points is not formal, it performs very well in the simulation study and yields reasonable results in the real data analysis. We constructed the confidence interval for a selected change point *t* by evaluating its MCMC samples ![Graphic][68]. In particular, according to the *swap* step that updates ***γ*** in the MCMC algorithm, the change point was typically switching between the selected time point *t* and its close neighborhood in the MCMC iterations. This is due to the fact that the change in patterns between segments may not be abrupt, and therefore the time points closed to time *t* could also be accepted as a change point in an iteration. Hence, for each change point with *γ**t* = 1, we calculated the correlation between ![Graphic][69] and ![Graphic][70] Here, we set different values for Δ*t* (e.g., Δ*t* = −3, −2, …, 1, 2) such that the collection of *t*+Δ*t* included a reasonable neighborhood for time *t*. Then, we calculated the correlation between ![Graphic][71] and ![Graphic][72]. Given the fact that only one change point would appear in the neighborhood region, vector ![Graphic][73] tended to have a negative correlation with ![Graphic][74]. Once the change points are determined, an approximate Bayesian estimator of the transmission rate *β**k* for each segment *k* can be simply obtained by averaging over all of its MCMC samples, ![Graphic][75]. In addition, a quantile estimation or credible interval can be obtained. Lastly, we summarize the basic reproduction number in each segment *k* as ![Graphic][76]. ### 4.4 Prediction Conditional on the multiple change-point locations, we can predict the cumulative or new confirmed cases at any future time *T**f* by Monte Carlo simulation based on the information in the last segment *K* only. Specifically, from time *T* + 1 to *T**f*, we sequentially generate ![Formula][77] Then, both short and long-term forecasts can be made by summarizing the (*T**f* − *T*)-by-*B* matrix of MCMC samples. For instance, the predictive number of cumulative and new confirmed cases at time *T* + 1, in average, are ![Graphic][78] and ![Graphic][79], respectively. ## 5 Simulation We used simulated data to evaluate the performance of our BayesSMILES method in terms of both multiple change-point detection and basic reproduction number estimation. It is shown that the proposed Bayesian framework outperforms alternative change-point detection method. ### 5.1 The generative model The three trajectories ***S, I***, and ***R*** with length *T* = 120 were generated in the following ways. We first divided the *T* = 120 time points into *K* = 4 segments with the same length, that is, the true change points were *t* = 31, *t* = 61, and *t* = 91. In other words, the first 30 entries of ***z*** equal to 1, the following 30 entries equal to 2, the next 30 entries equal to 3, and the last 30 entries equal to 4. To mimic the disease transmissibility dynamics across different segments, we chose segment-varying transmission rates *β**k* while fixing the removal rate *γ* = 0.03. We considered four scenarios of the set (*β*1, *β*2, *β*3, *β*4), corresponding to 1) **ℛ** = (3.0, 1.2, 2.0, 0.8); 2) **ℛ** = (3.0, 2.3, 1.5, 0.8); 3) **ℛ** = (3.0, 1.8, 0.8, 1.6); 4) **ℛ** = (3.0, 2.0, 1.1, 0.5). Then based on the stochastic version of the standard SIR model, we sampled *S**t* and *R**t* from negative binomial (NB) distributions, and obtained *I**t*, sequentially from *t* = 1 to *T* through ![Formula][80] where *N* = 1, 000, 000, the initial *I* = 100 and *R* = 0, and the dispersion parameters *ϕ**S* = *ϕ**R* = 10. Note that the generative scheme was with an NB error structure, which was different from our model assumption based on a Poisson error structure. We repeated the above steps to generate 50 independent datasets for each setting of **ℛ**. Figure 2 displays the temporal patterns of the simulated infectious counts ***I*** for the four scenarios. ![Figure 2:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/10/08/2020.10.06.20208132/F2.medium.gif) [Figure 2:](http://medrxiv.org/content/early/2020/10/08/2020.10.06.20208132/F2) Figure 2: Simulation study: The simulated actively infectious data ***I*** under the four scenarios. Each curve represents a replicated sequence of ***I*** under a scenario. The red dashed lines mark the true change-point locations. The blue curve under Scenario 4 was randomly chosen for evaluating the model fitting, of which results are shown in Figure 3. ### 5.2 Evaluation criteria To evaluate the multiple change-point detection, we may rely on either the binary change point indicator vector ***γ*** or the time point allocation vector ***z***. For the choice of ***γ***, a change point is considered to be correctly identified if its location is within a local window of the true position (Killick and Eckley, 2014). The selection of the window size is *ad hoc* and may bias the evaluation. In addition to that, since change points and non-change points are usually of very different sizes, most of the binary classification metrics are not suitable for model comparison here. Thus, we chose those metrics that quantify the agreement between the true and estimated allocation vectors, i.e. ***z*** and ![Graphic][81]. The two classic performance metrics for the analysis of clustering results are the adjusted Rand index (ARI) and mutual information (MI), proposed by Hubert and Arabie (1985) and Steuer et al. (2002), respectively. ARI is the corrected-for-chance version of the Rand index (Rand, 1971), as a similarity measure between two sample allocation vectors. Let ![Graphic][82]; and ![Graphic][83] be the number of pairs of time points that are a) in the same segment in both of the true and estimated partitions; b) in different segments in the true partition but in the same segment of the estimated one; c) in the same segment of the true partition but in different segments in the estimated one; and d) in different segments in both of the true and estimated partitions. Then, the ARI can be computed as ![Formula][84] The ARI usually yields values between 0 and 1, although it can yield negative values (Santos and Embrechts, 2009). The large the index, the more similar between ***z*** and ![Graphic][85], thus the more accurately the method detects the actual times at which change points occurred. An alternative metric choice is MI, which measures the information about one variable that is shared by the other (Steuer et al., 2002). Let ![Graphic][86] be the number of time points shared between the *k*-th segment in the true ***z*** and the *k**t*-the segment in the estimated one ![Graphic][87]. Then, the MI can be computed as ![Formula][88] where ![Graphic][89] is the number of segments and ![Graphic][90]’s are the segment lengths for segment ![Graphic][91] in ![Graphic][92]. It yields non-negative values. The larger the MI, the more accurate the partition result. To quantify how well a method estimates the dynamic transmissibility across different segments, we used the root mean square error (RMSE) that measures the deviation between the true and estimated values of **ℛ** over all *T* time points: ![Formula][93] A smaller value of RMSE indicates a more accurate estimation of ℛ’s. ### 5.3 Results As for the MCMC setting of change point detection, we set 40, 000 MCMC iterations and discarded the first half as burn-in. We adopted the weakly informative setting by setting *a**ω* = 0.1 and *b**ω* = 1.9 in the Beta-Bernoulli prior for the change-point indicator vector ***γ***. We set ![Graphic][94] with ![Graphic][95] and ![Graphic][96] as the covariance matrix in the prior distribution of ***b****k*’s. Finally, we let ![Graphic][97] take ten equally spaced values ranging from 0.0001 to 0.01 at the logarithmic scale (base 10) in the PSIS-LOO cross validation. In fitting the stochastic SIR model, we set 100, 000 MCMC iterations with the first half as burn-in. As suggested in Waqas et al. (2020), the value of removal rate *γ* could be estimated by ![Graphic][98] for each simulated dataset. Then, we set *a**β* = 1 and *b**β* = 1*/γ* so that both of the expectation and variance of the basic reproduction number ℛ = *β**k**/γ* are equal to 1. We first checked the performance of the BayesSMILES on a single simulated dataset, which was randomly selected from the 50 replicates in Scenario 4 (marked as the blue line in Figure 2). Figure 3(a) demonstrates the change-point detection result based on the Poisson segmented regression model. The red dashed and the blue solid lines represent the true and the estimated change point locations, respectively, while the gray ribbons represent the 95% confidence intervals for those identified change points. As we can see, the BayesSMILES successfully detected the three true change points in general, as each of the 95% confidence intervals covered the truth. The resulted values of ARI and MI were 0.93 and 1.28, respectively. Later on, the stochastic SIR model introduced in Section 3.4 was then fitted to quantify the disease transmissibility in each segment bounded by the identified change points. Figure 3(b) shows the posterior distributions of ![Graphic][99]’s for ![Graphic][100] from their MCMC samples. The red dashed and blue solid line pinpoints the true and posterior mean of ![Graphic][101]’s, while the two black solid lines mark the boundary of their 95% credible intervals. Clearly, those true values were within their corresponding 95% credible intervals. The final RMSE for ℛ estimation was 0.38 for this single simulated dataset. ![Figure 3:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/10/08/2020.10.06.20208132/F3.medium.gif) [Figure 3:](http://medrxiv.org/content/early/2020/10/08/2020.10.06.20208132/F3) Figure 3: Simulation study: The model fitting results based on a randomly selected simulated dataset (see the blue curve under Scenario 4 in Figure 2). (a) The locations of change points (blue solid lines) estimated from the posterior pairwise probability matrix (PPM) and their credible intervals (gray ribbons). The red dashed lines mark the true change-point locations; (b) The posterior distributions of ℛ*k*’s for *k* = 1, 2, 3, 4 estimated from the segmented time-series data, given the three identified change points as shown in (a). The red dashed and blue solid lines are the true and estimated values of ℛ*k*’s, respectively. The two black solid lines are the lower and upper bounds of the 95% credible intervals. To our best knowledge, there is no method like the BayesSMILES that can detect latent change points while characterizing the transmission dynamics through an SIR model. Thus, in setting up a comparison study, we therefore considered a two-stage approach that first identifies multiple change-point of time-series data based on a likelihood based framework, and then estimates the basic reproduction numbers between each pair of nearby change points, following the stochastic SIR model introduced in Section 3.4. The alternative change point model assumes time points within one segment follow a normal distribution with distinct mean and/or variance from its nearby segments (Hinkley, 1970; Jen and Gupta, 1987), and it uses likelihood ratio test (LRT) to detect multiple change points. An algorithm named binary segmentation (Edwards and Cavalli-Sforza, 1965; Sen and Srivastava, 1975) is commonly used to compute the test statistics for the LRT with high efficiency (Killick et al., 2012). In our case, to detect multiple change points using this alternative approach named likelihood ratio test with binary segmentation (LRT-BinSeg), we input the logarithmic scale of ***I*** into the function cpt.meanvar in the related R package changepoint (Killick and Eckley, 2014) for each of the simulated datasets. We specified to use the binary segmentation algorithm with up to 5 of change points to search for. Note that this restriction was not applicable to the alternative algorithms provided in the changepoint package. In practice, we found that the alternative algorithms tended to over-select the number change points. Figure 4(a) and (b) exhibit the change-point detection performances for the four scenarios of ***ℛ***. Our BayesSMILES performed much better than the likelihood ratio test with the binary segmentation algorithm (LRT-BinSeg) with respect to multiple change-point detection under both performance metrics, ARI and MI. For instance, the ARI by the BayesSMILES increased 39.29% to 122.16% over the LRT-BinSeg among the four scenarios, while the growth in MI could be up to 60.54%. Figure 4(c) compares the ability to capture the transmission dynamics in terms of RMSE, which depends on the change-point detection accuracy. As expected, Our BayesSMILES yielded smaller RMSE values across all scenarios since its identified change point locations were more accurate. In all, the simulation study demonstrated that the strengths of BayesSMILES. ![Figure 4:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/10/08/2020.10.06.20208132/F4.medium.gif) [Figure 4:](http://medrxiv.org/content/early/2020/10/08/2020.10.06.20208132/F4) Figure 4: Simulation study: The violin plots of (a) adjusted Rand index, (b) mutual information, and (c) ***R***0 root mean square error from 50 replicated datasets generated under the four scenarios. Red and blue violins correspond to the results obtained by the BayesSMILES and LRT-BinSeg. ### 6 Analysis of COVID-19 Data In this section, we applied the BayesSMILES to the COVID-19 daily report data provided by JHU-CSSE (COVID-19 Data Repository). Several recent COVID-19 studies also based their analyses on this resource (Dong et al., 2020; Zhou and Ji, 2020; Toda, 2020). We first performed a preprocessing step to ensure the quality of the infectious data ***I*** for the model fitting. Due to the fact that the recovery cases did not exist for some U.S. states, we treated ***I*** and ***R*** as missing data and reconstructed the two sequences according to the process described in Section 3.1. The observed cumulative confirmed case numbers ***C*** were collected for each U.S. state starting from an early stage of the pandemic outbreak. In particular, we chose the starting time for each state as when there were least ten confirmed COVID-19 cases for that state. We also set the removal rate *γ* = 0.1 as suggested by Pedersen and Meneghini (2020) and Weitz et al. (2020). Due to the fact that different states could have different starting times, we further trimmed the sequences ***I*** and ***R*** for each state based on the latest starting time available. Finally, we set March 22, 2020, as the new starting time (*t* = 1) for all 50 states, and let July 19, 2020, be the last observed time point (*t* = 120). We used the same hyperparameter and algorithm settings as described in Section 5.3. ### 6.1 Detecting change points for U.S. states We limit our analysis to four U.S. states with the highest cumulative confirmed cases as of July 19, 2020, to keep the paper in a reasonable length. They are New York, Texas, California, and Florida. The results for the 46 remaining states are available in [https://shuangj00.github.io/BayesSMILES/](https://shuangj00.github.io/BayesSMILES/) (see details in Section 8). Figure 5 displays the detected change points, as well as the estimated basic reproduction number ℛ’s cross segments, for the four states. The associated confidence interval to each identified change point is represented by a gray ribbon. In general, those change points detected by the BayesSMILES indeed captured the important COVID-19 events that might affect the transmission rates. For instance, some change points reflected the positive effects of the preventative strategies such as lockdown, while others explained the “bounce back” in confirmed cases after the reopening. Table 2 lists the change point locations and their potentially related events for the four states. View this table: [Table 1:](http://medrxiv.org/content/early/2020/10/08/2020.10.06.20208132/T1) Table 1: Case study: The mean absolute percentage errors (MAPEs) of the 7-day forecast of daily confirmed COVID-19 case numbers by the BayesSMILES and FullDataSIR for the states of New York, Texas, Florida, and California. View this table: [Table 2:](http://medrxiv.org/content/early/2020/10/08/2020.10.06.20208132/T2) Table 2: Case study: The list of the identified change points by the BayesSMILES and the related supporting evidences for the states of New York, Texas, Florida, and California. ![Figure 5:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/10/08/2020.10.06.20208132/F5.medium.gif) [Figure 5:](http://medrxiv.org/content/early/2020/10/08/2020.10.06.20208132/F5) Figure 5: Case study: The change point detection and basic reproduction number estimation for the states of New York, Texas, Florida, and California. The black circles are the actively infectious case numbers divided by the total population (in thousands) at logarithmic scale, i.e. *log*(*It/N*). The black solid lines pinpoint the change point locations, with the associated gray ribbons indicating the credible intervals. The red dashed lines describe the variation in the basic reproduction numbers **ℛ** across segments. The posterior means and the 95% credible intervals for ℛ*k*’s are given by red numbers. In New York, the first change point was estimated to be March 28. We estimated the posterior mean of the basic reproduction number decreased from 2.24 (between March and to March 27) to 1.63 (between March 28 and April 8). Notably, March 28 was the date when the Centers for Disease Control and Prevention (CDC) issued a 14-day domestic travel advisory for non-essential persons, which presumably alleviated the situation for the populated states such as New York. The second change point appeared around April 9, and the *R* of the third segment dropped to 0.98 with a 95% credible interval of [0.76, 1.25]. This matched the exact day when the New York state posted its first drop of the ICU admissions since the COVID-19 outbreak began. The third change point was around April 27. Though there was no direct intervention issued in late April, we noticed that the mayor of New York City announced that all the major events had been canceled starting from April 20. This action could bring a positive effect in controlling the outbreak, and our estimation from the SIR model suggested a further decrease in the basic reproduction number down to 0.66 with a 95% credible interval of [0.54, 0.81]. We observed another change point around June 18, which was close to the Phase II reopening of New York state on June 22. During the Phase II reopening, restaurants were allowed for outdoor dining, stores opened for in-person retail, and more services resumed operational under strict limitations. Thus, we saw a little “bounced back” in *ℛ* from 0.66 to 0.82. The last change point was on June 29. As expected, the basic reproduction number increased to 1.04 with a 95% credible interval of [0.84, 1.29] in the last segment. Although there was no public announcement around June 29 with a confidence interval from June 28 to July 4, we suspect that the increased social interaction during the Independence Day long weekend (between July 3 and July 5) could be responsible for the increase of the transmission dynamics. In Texas, there were five change points detected. The first change point was estimated to be March 28, the same day as the first one for New York state. Due to the similar reason, the policy of mandatory 14-day quarantines for travelers entering Texas could bring a decrease in terms of the basic reproduction number (decreased from 2.97 to 2.07). The second change point was around April 9 with a further drop of ℛ to 1.14 with a 95% credible interval [0.96, 1.35]. We found that the Texas Governor had extended the state’s disaster declaration for an additional 30 days on April 12. The extension aimed at protecting the health and safety of Texans by ensuring adequate capabilities to support communities. Organizations such as the State Operations Center and the Strategic National Stockpile would continuously supply the state government with resources needed to protect residents. May 25 was detected as the third change point, and it was the first time that ℛ increased after the two drops. The estimated basic reproduction number was 1.29 with a 95% credible interval [1.02, 1.62]. This increase appeared around May 25 could be due to the Governor’s updated executive order issued on May 26 that allowed additional services and activities to open for phase II reopening. The next change point was around June 16, and ℛ further increased to 1.72 with a 95% credible interval [1.40, 2.11]. According to the prediction reported by UT Austin’s COVID-19 Modeling Consortium at the end of May, there might be a significant increase in the number of cases and hospitalizations beginning mid-June (News from *kxan*). Here, the change point location, as well as the increased basic reproduction number, were consistent with the results of this report. The last change point was around June 28 with an estimated decrease in ℛ to 1.42 with a 95% credible interval [1.17, 1.72]. Notably, the Texas Governor issued multiple executive orders around late June to early July to mitigate the disease spreading. For instance, the executive order on June 26 reemphasized the limited occupancy for all business establishments in Texas. According to an executive order on July 2, all Texans were required to wear a face-covering in public spaces in counties with 20 or more positive COVID-19 cases. On the same day, the Governor announced an update regarding the executive order on June 26 with additional measures to slow down the spreading of COVID-19. In Florida, the first estimated change point was April 3. It was two days after the statewide stay-at-home order for Florida. We estimated that the basic reproduction number decreased from 2.70 to 1.28 after the change point. The second change point appeared around the middle of April. Starting from April 13, some counties such as Osceola county started the requirement of wearing a face covering while in public. It could explain the reason why we observed a slight decrease in ℛ, from 1.28 to 0.92 with a 95% credible interval [0.73, 1.15]. The next change point was located around May 13, and ℛ in this new stage went above 1 again, with a posterior mean of 1.19 and a 95% credible interval [0.96, 1.50]. We noticed that Florida entered the phase I reopening on May 18, which could lead to the “bounced back” situation. The fourth change point was around June 7, two days after the phase II reopening in Florida. Changes in the phase II reopening included Universal Orlando opened the parks to the general public for the first time in months, and we observed that ℛ increased again to 1.81 with a 95% credible interval [1.50, 2.19]. In the last segment (after June 27), our result revealed a slight drop in the basic reproduction number from 1.81 to 1.45. This change was potentially related to the consequence of requiring facial coverings in the four most populated cities in Florida: Tampa, Orlando, Miami, and Jacksonville. The face mask mandates went into effect for the four cities starting from June 19, 20, 25, and 29, respectively. Therefore, the drop in the transmissibility at the end of June may be explained by the effectiveness of wearing face masks as a non-pharmaceutical practice. In California, we detected two change points. California was the first state to announce lockdown in the COVID-19 pandemic and its stay-at-home order became effective on March 19. Our results in change point detection could miss these early actions since the data we analyzed started from March 22. The first selected change point was on April 5, with the value of the basic reproduction number decreased dramatically when transitioning to the second segment (from 2.20 to 1.20). The second change point was on June 17, and we saw that ℛ increased to 1.35 in the last segment with a 95% credible interval [1.12, 1.62]. According to California Governor, higher-risk businesses and venues (e.g. movie theaters, bars, gyms) were allowed to reopen with restrictions on June 12. Hence, increasing in the basic reproduction number could be the consequence of reopening. This observation was also observed in New York and Texas. ### 6.2 Clustering U.S. states based on their change-point locations We applied BayesSMILES on all 50 U.S. states. Based on the results, we seek to derive an overall picture of the COVID-19 dynamics across states. We summarized the temporally detected change points of the 50 states into common patterns, and then we labeled each state by matching its specific change point pattern to the common patterns. In particular, for each state, we calculated the marginal posterior probability of inclusion (PPI) for all time points, where the PPI for a time point *t* was calculated based on the *B* of MCMC samples after burn-in: ![Graphic][102]. Then we obtained the vector ![Graphic][103]. Each entry in ***p***PPI is a value between 0 and, 1, representing the proportion of time *t* selected as a change point among all iterations. Next, we computed the overall pattern by averaging over the vector ***p***PPI across 50 states. We noticed that some time points were rarely or never selected as change points. That naturally hinted us to group the time points. To illustrate this, we trimmed the top 20% values of ![Graphic][104] for each time *t* (Figure 6). The trimming step provided a clear pattern and highlighted the groups of dates that were commonly identified as change points. We observed three time spans as shown in Figure 6: March 27 - April 11, May 1 - May 10, and May 22 - July 3. For each state, we defined its cluster label based on the corresponding change point detection results. If this state had at least one change point (including the confidence interval) between March 27 - April 11, the first element in its cluster label is “Change”. Otherwise, the first element in the group label was set to “Stable”. We repeated the same process to determine the second and third elements of the class label for each state. In the end, each state was assigned to a cluster label from “Change-Change-Change”, “Change-Stable-Change”, or “Stable-Change-Change”. ![Figure 6:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/10/08/2020.10.06.20208132/F6.medium.gif) [Figure 6:](http://medrxiv.org/content/early/2020/10/08/2020.10.06.20208132/F6) Figure 6: Case study: The averaged marginal posterior probability of inclusion (PPI) for each time point to be selected as a change point over all 50 U.S. states, after trimming the top 20% PPI values. The black dashed lines partition the whole time range into three segments: March 27 - April 11, May 1 - May 10, and May 22 - July 3. The map in Figure 7 colors each of the 50 states based on its cluster label, where green, yellow, and pink correspond to temporal patterns “Change-Stable-Change”, “Change-Change-Change”, and “Stable-Change-Change”, respectively. Interestingly, three out of the four states we have analyzed, New York, Texas, and California, belonged to the same category, i.e. “Change-Stable-Change”. Several other states that are also in this category include Georgia, Arizona, North Carolina, and Louisiana. All of these states were in the top ten states with the most COVID-19 confirmed cases. We noticed that the phase I statewide reopening for all these states occurred in mid-May (May 15 for Georgia, May 13 for Arizona, May 8 for North Carolina, May 15 for Louisiana). Therefore, our model did not report any change points for these states between May 1 and May 10. The rest states in the 10 states with the most COVID-19 confirmed cases, including Florida, Illinois, and New Jersey, were labeled as “Change-Change-Change”, and all of them had a change point between May 1 and May 10. As discussed in Section 6.1, Florida had a change point around May 13 with a confidence interval [May 8, May 18]. According to the Executive Order 2020-32 issued by the Illinois governors, the state entered the phase II reopening starting on May 1 with a modified stay-at-home order. For New Jersey, the statewide state-at-home order was not lifted until June 9. However, our model suggested a change point around the end of April with a confidence interval [April 25, May 3] with a drop in *R* from 1.11 to 0.68 (details available at [https://shuangj00.github.io/BayesSMILES/](https://shuangj00.github.io/BayesSMILES/)). We noticed that on May 3 the New Jersey Governor announced a multi-state agreement to develop a regional supply chain for personal protective equipment, other medical equipment and testing. This joint-state protective measure allowed for efficient delivery and reliability of medical equipment for states, and therefore best utilized the life-saving resources in the face of the COVID-19 outbreak. ![Figure 7:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/10/08/2020.10.06.20208132/F7.medium.gif) [Figure 7:](http://medrxiv.org/content/early/2020/10/08/2020.10.06.20208132/F7) Figure 7: Case study: The temporal patterns of the COVID-19 transmission dynamics based on change points across the 50 U.S. states. Green, yellow, and pink correspond to “Change-Stable-Change”, “Change-Change-Change”, and “Stable-Change-Change” patterns, respectively. ### 6.3 Predicting new confirmed cases for U.S. states Short-term forecasting of the new daily confirmed COVID-19 cases ![Graphic][105] at a future time *T**f* is important for designing public policies. Here, we used the estimation results from BayesSMILES to predict the new confirmed cases for the four states that we analyzed: New York, Texas, Florida, and California. As suggested in Section 6.1, the basic reproduction number ℛ for one state could go above or below the threshold value of 1 at different disease stages. Hence, for short-term forecasting, the predictions given by the BayesSMILES were based on the last segment’s information, which ensured that the predictions fully reflected the most recent disease characteristics. In contrast, we evaluated an alternative approach that ignored the underlying disease dynamics and used all the data to fit a single SIR model for the prediction. We named this method FullDataSIR. Here, we illustrated the advantages of incorporating the change point information into prediction. In particular, we compared the prediction accuracy of the two methods: the BayesSMILES and FullDataSIR. Figure 8 showed the true value of the new daily confirmed COVID-19 cases and the two types of predictions for the four states. For each state, the 7-day forecast was from July 20 to 26. First of all, we observed that the predicted mean by the FullDataSIR model tended to be larger than that from the BayesSMILES for the same day. This was because the data in the early stage (in our case, from late March to early April) corresponded to a much larger basic reproduction number, as compared to the data from the late stage, and the BayesSMILES would ignore the early stage observations in making the prediction. Second, the closer the predicted value to the true value on each day, the better the prediction performance. We thus quantified the prediction accuracy using the mean absolute percentage error (MAPE). The MAPE for the 7-day forecast was defined as ![Formula][106] where ![Graphic][107] was the observed new confirmed cases at future time *T**f*, and ![Graphic][108] was the corresponding prediction. The smaller the MAPE value, the more accurate the prediction. ![Figure 8:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/10/08/2020.10.06.20208132/F8.medium.gif) [Figure 8:](http://medrxiv.org/content/early/2020/10/08/2020.10.06.20208132/F8) Figure 8: Case study: The 7-day forecast (between July 20 and 26) of the daily confirmed COVID-19 case numbers for the states of New York, Texas, Florida, and California. The red and blue circles and bars are the predictive means and 95% intervals by the BayesSMILES and FullDataSIR, respectively. The black thick lines indicate the observed truth. We calculated the MAPE for the 7-day forecast of the four states analyzed in Section 6.1, and Table 1 summarized the results. For the first three states, the MAPEs from the BayesS-MILES were much smaller than the MAPEs from the FullDataSIR model, suggesting a better performance of BayesSMILES. For California, the conclusion was reversed since the FullDataSIR model showed a slightly lower MAPE. However, if we look at Figure 8(d), the real data (black horizontal lines) and the predictions from the BayesSMILES (red circles) both showed a down-going trend. In this case, the consistency in trend could be more important to evaluate two types of predictions, as compared to the slight difference in MAPEs. Lastly, for most of the days shown in Figure 8, the 95% credible interval by BayesSMILES tended to be much narrower than the interval given by the FullDataSIR model under the same day, especially for Texas, Florida, and California. Combined with a smaller value of MAPE, the BayesSMILES was able to provide a more accurate and precise prediction as compared to the alternative approach. In all, incorporating the change point location information helped improve the prediction accuracy for the 7-day forecast of new confirmed COVID-19 case numbers. ## 7 Conclusion In this paper, we proposed the BayesSMILES, a Bayesian segmentation model for studying the longitudinal epidemiological data, to analyze the transmission dynamics of an infectious disease such as COVID-19. Our approach includes a Bayesian Poisson segmented regression model to detect multiple change-point from the sequence of actively daily infectious cases. Those identified change points correspond to latent events that significantly alter disease spreading rates, while the resulting segments are characterized by unique epidemiological patterns. We further describe the disease transmissibility for each of the segments by using a stochastic time-invariant SIR model, assuming the transmission rate remains the same till the next change points. Our model outputs a series of the basic reproduction numbers ℛ’s over stages to track the changes in spreading rates during a pandemic. We applied the BayesSMILES method to analyze the COVID-19 daily report data of 50 U.S. states. Our results showed that the COVID-19 outbreak declined substantially after implementing stringent interventions for several states, including New York, Texas, and Florida. Meanwhile, our identified change points matched well with the timelines of publicly announced intervention strategies. In addition to that, the change in the basic reproduction numbers between two adjacent segments might be used to quantify the effectiveness of an intervention, which could help us understand the impact of different control measures. Several downstream analyses based on the BayesSMILES results were conducted. In particular, we clustered the temporal patterns of the 50 U.S. states based on their change-point locations, leading to an interesting spatial pattern related to the COVID-19 dynamics. Lastly, we demonstrated that our method could also improve the short-term forecasting of the new daily confirmed cases. In the future, we plan to extend the Poisson error structure in the change point detection model to a negative binomial distribution for modeling the over-dispersed count data. Furthermore, the current BayesSMILES framework can be generalized to characterize temporal patterns in other epidemiological data. To do so, the segmented regression model should not be restricted to countable outcomes. ## 8 Software We provide software in the form of R/C++ codes on GitHub [https://github.com/shuangj00/BayesSMILES](https://github.com/shuangj00/BayesSMILES). It includes the tutorial of implementing the BayesSMILES, using U.S. state-level COVID-19 data as an example. Besides, we design a website [https://shuangj00.github.io/BayesSMILES/](https://shuangj00.github.io/BayesSMILES/) to summarize the inference results for the 50 U.S. states, as a supplement to Section 6. The website shows that 1) the detected change points for each U.S. state; 2) the COVID-19 transmission dynamics based on the segment-varying basic reproduction numbers *R*’s, including their poster means and 95% confidence intervals. ## Data Availability The COVID-19 dataset was downloaded from the COVID-19 Data Repository hosted by Johns Hopkins University Center for Systems Science and Engineering [https://github.com/CSSEGISandData/COVID-19](https://github.com/CSSEGISandData/COVID-19) ## 9 Appendix: Approximate the multivariate normal density function This section provides the details of the multivariate normal density approximation used to improve the computational efficiency in Section 4.1. We consider a general setting as follows. Let *y* be an *n* × 1 vector, *W* be an *n* × *q* matrix, *X* be an *n* × *p* matrix, and *A* = [*W X*] (which is an *n* × (*p* + *q*) matrix). Let Σ be a (*q* + *p*) × (*q* + *p*) diagonal matrix where the first *q* diagonal elements are *h* and the last *p* diagonal elements are *h*1. Assume *h*, *h*1, *σ*2 > 0. By Woodbury identity and Sylvester’s determinant identity, ![Formula][109] ![Formula][110] where | · | denotes the matrix determinant and *I* is a diagonal matrix of proper dimension. Define ![Formula][111] where ![Graphic][112] (respectively ![Graphic][113]) is the residual after regressing out *W* from *X* (respectively *y*). Zhou and Guan (2018) showed that the expressions in (10) and (11) can be further simplified when *h* → *∞*. The conclusions are summarized below with the proof available in the supplement of Zhou and Guan (2018). Lemma 1. *Let y*, ![Graphic][114] *be as defined above. Then*, ![Formula][115] *and* ![Formula][116] The conclusions in the lemma above can be used to improve the computational efficiency for approximating the multivariate normal probability density function (p.d.f.) in our model. Within each segment *k* we assumed log ***α***. = *X**k**β**k* + *ϵ**k*. Under the prior specification discussed in Section 3.3, we have ![Graphic][117], and the corresponding p.d.f. for log ***α***. is: ![Formula][118] where *n**k* is the segment length. Next, we simplify the calculation of ![Graphic][119] and ![Graphic][120] by using Lemma 1. Consider ![Graphic][121] and *W* as a column vector of 1’s with length *n**k*. The vector *y* in our case matches log ***α***., and ![Graphic][122]. Lemma 1 states that the inverse and determinant calculation of an *n**k* × *n**k* matrix can be reduced to that of a *p* × *p* matrix, and in our case *p* = 1 (since the regression model only includes “time” as a covariate except the intercept term). Therefore, the computational benefit could be significant when *n**k* was large. We first derive the formula of *P* as follows, ![Formula][123] Next, according to Equation (12), ![Graphic][124] can be approximated by ![Graphic][125] when *h* → +∞. In particular, we can derive Therefore, ![Formula][126] Therefore, ![Formula][127] To calculate |*h**WW* *T* + *σ*2*I*|, note that *WW* *T* is an *n**k* *× n**k* matrix, and we can derive that ![Formula][128] According to Sylvester’s determinant lemma, we have ![Formula][129] Therefore, the formula in Equation (16) equals the following, ![Formula][130] Combining the results in Equations (15) and (17), we can approximate the matrix determinant in (14) as follows when *h* → *∞*, ![Formula][131] Next, it is straightforward to derive the formula in the exponent part in Equation (14) using the result in Equation (12). In particular, we can derive: ![Formula][132] Then we approximate the exponent part in Equation (14) as follows under the condition of *h* → *∞*, ![Formula][133] We further introduce the following notation, ![Formula][134] By combining the approximations in (18) and (19), we obtain that ![Formula][135] In practice, the condition of *h* → *∞* is satisfied when *h* ≫ *h*1. We set *h* = 10, 000 and *h*1 = 10 and the approximation accuracy is desirable. * Received October 6, 2020. * Revision received October 6, 2020. * Accepted October 8, 2020. * © 2020, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution 4.0 International), CC BY 4.0, as described at [http://creativecommons.org/licenses/by/4.0/](http://creativecommons.org/licenses/by/4.0/) ## References 1. Akiyama MJ, Spaulding AC, Rich JD (2020). Flattening the curve for incarcerated populations—COVID-19 in jails and prisons. New England Journal of Medicine, 382(22): 2075–2077. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMp2005687&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F08%2F2020.10.06.20208132.atom) 2. Alvarez FE, Argente D, Lippi F (2020). A simple planning problem for COVID-19 lockdown. Technical report, National Bureau of Economic Research. 3. Billy Gates AGYC Matthew Prendergast (2020). Austin health leaders concerned about possible 2nd COVID-19 wave in June. kxan. URL: [https://www.kxan.com/news/austin-public-health-to-give-covid-19-update-at-9-a-m/](https://www.kxan.com/news/austin-public-health-to-give-covid-19-update-at-9-a-m/), accessed 2020-05-27. 4. Chen YC, Lu PE, Chang CS (2020). A Time-dependent SIR model for COVID-19. arXiv preprint arXiv:2003.00122. 5. Chowell G, Castillo-Chavez C, Fenimore PW, Kribs-Zaleta CM, Arriola L, Hyman JM (2004). Model parameters and outbreak control for SARS. Emerging Infectious Diseases, 10(7): 1258. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=15324546&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F08%2F2020.10.06.20208132.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000222456800011&link_type=ISI) 6. Ciara Rouege (2020). Abbott extends Texas disaster declaration | Here’s what that means. KHOU. URL [https://www.khou.com/article/news/health/coronavirus/gov-abbott-extends-disaster-declaration-another-30-days/285-9e494ee3-5ea5-48d9-bf35-b8c4b88f5126](https://www.khou.com/article/news/health/coronavirus/gov-abbott-extends-disaster-declaration-another-30-days/285-9e494ee3-5ea5-48d9-bf35-b8c4b88f5126), accessed 2020-04-12. 7. Cooper I, Mondal A, Antonopoulos CG (2020). A SIR model assumption for the spread of COVID-19 in different communities. Chaos, Solitons & Fractals, 139: 110057. 8. Dehning J, Zierenberg J, Spitzner FP, Wibral M, Neto JP, Wilczek M, et al. (2020). Inferring change points in the spread of COVID-19 reveals the effectiveness of interventions. Science. 9. Dobrzyn E (2020). Gov. Ron DeSantis says most of Florida will enter phase 2 of reopening Friday. Click Orlando. URL: [https://www.clickorlando.com/news/local/2020/06/03/gov-ron-desantis-says-most-of-florida-will-enter-phase-2-of-reopening-friday/](https://www.clickorlando.com/news/local/2020/06/03/gov-ron-desantis-says-most-of-florida-will-enter-phase-2-of-reopening-friday/), accessed 2020-06-03. 10. Dong E, Du H, Gardner L (2020). An interactive web-based dashboard to track COVID-19 in real time. The Lancet Infectious Diseases, 20(5): 533–534. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S1473-3099(20)30120-1&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F08%2F2020.10.06.20208132.atom) 11. Edwards AW, Cavalli-Sforza LL (1965). A method for cluster analysis. Biometrics, 362–375. 12. Frago C (2020). Tampa’s mask order went into effect Friday. Here’s what you need to know. Tampa Bay News. URL: [https://www.tampabay.com/news/tampa/2020/06/19/tampas-mask-order-goes-into-effect-today-heres-what-you-need-to-know/](https://www.tampabay.com/news/tampa/2020/06/19/tampas-mask-order-goes-into-effect-today-heres-what-you-need-to-know/), accessed 2020-06-19. 13. Giordano G, Blanchini F, Bruno R, Colaneri P, Di Filippo A, Di Matteo A, et al. (2020). Modelling the COVID-19 epidemic and implementation of population-wide interventions in Italy. Nature Medicine, 1–6. 14. Gostic KM, McGough L, Baskerville E, Abbott S, Joshi K, Tedijanto C, et al. (2020). Practical considerations for measuring the effective reproductive number,Rt. medRxiv. 15. Hinkley DV (1970). Inference about the change-point in a sequence of random variables. 16. Hubert L, Arabie P (1985). Comparing partitions. Journal of Classification, 2(1): 193–218. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/BF01908075&link_type=DOI) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1985AVF9100003&link_type=ISI) 17. Jen T, Gupta AK (1987). On testing homogeneity of variances for Gaussian models. Journal of Statistical Computation and Simulation, 27(2): 155–173. 18. Jesse Pound (2020). New York posts negative net change in ICU admissions for first time since coronavirus outbreak. CNBC. URL: [https://www.cnbc.com/2020/04/10/ny-has-first-negative-net-change-in-icu-admissions-since-coronavirus-outbreak](https://www.cnbc.com/2020/04/10/ny-has-first-negative-net-change-in-icu-admissions-since-coronavirus-outbreak). html, accessed 2020-04-10. 19. Kantner M, Koprucki T (2020). Beyond just “flattening the curve”: Optimal control of epidemics with purely non-pharmaceutical interventions. Journal of Mathematics in Industry, 10(1): 1–23. 20. Kermack WO, McKendrick AG (1927). A contribution to the mathematical theory of epidemics. Proceedings of the royal society of london. Series A, Containing papers of a mathematical and physical character, 115(772): 700–721. 21. Killick R, Eckley I (2014). changepoint: An R package for changepoint analysis. Journal of Statistical Software, 58(3): 1–19. 22. Killick R, Fearnhead P, Eckley IA (2012). Optimal detection of changepoints with a linear computational cost. Journal of the American Statistical Association, 107(500): 1590–1598. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1080/01621459.2012.737745&link_type=DOI) 23. Klas ME, Contorno S (2020). Florida Gov. Ron DeSantis issues statewide stay-at-home order. Tampa Bay News. URL: [https://www.tampabay.com/news/health/2020/04/01/florida-gov-ron-desantis-issues](https://www.tampabay.com/news/health/2020/04/01/florida-gov-ron-desantis-issues) statewide-stay-at-home-order/, accessed 2020-04-01. 24. Lloyd-Smith JO, Galvani AP, Getz WM (2003). Curtailing transmission of severe acute respiratory syndrome within a community and its hospital. Proceedings of the Royal Society of London. Series B: Biological Sciences, 270(1528): 1979–1989. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1098/rspb.2003.2481&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=14561285&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F08%2F2020.10.06.20208132.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000185739600001&link_type=ISI) 25. Metevia T (2020). Osceola County now requires face coverings in public. Click Orlando. URL: [https://www.clickorlando.com/news/local/2020/04/10/watch-live-osceola-county-officials-provide-update-on-covid-19-pandemic/](https://www.clickorlando.com/news/local/2020/04/10/watch-live-osceola-county-officials-provide-update-on-covid-19-pandemic/), accessed 2020-04-13. 26. NBC New York (2020a). CDC Issues 14-Day Travel Advisory for New York, New Jersey, Connecticut. NBC New York. URL: [https://www.nbcnewyork.com/news/local/even-with-relief-bill-passed-no-rest-for-ny-as-cuomo-says-peak-of-crisis-still-yet-to-come/2348306/](https://www.nbcnewyork.com/news/local/even-with-relief-bill-passed-no-rest-for-ny-as-cuomo-says-peak-of-crisis-still-yet-to-come/2348306/),accessed 2020-03-28. 27. NBC New York (2020b). NYC Moves to Phase II in Biggest Re-opening Step Yet; State Daily Virus Deaths Plunge to New Low. NBC New York. URL: [https://www.nbcnewyork.com/news/local/nyc-reopening-hits-highest-gear-yet-as-shops-outdoor-dining-and-playgrounds-return-monday/2477627/](https://www.nbcnewyork.com/news/local/nyc-reopening-hits-highest-gear-yet-as-shops-outdoor-dining-and-playgrounds-return-monday/2477627/), accessed 2020-06-22. 28. Office of the Texas Governor (2020a). A proclamation amending Executive Order GA-28 by Office of the Txas Governor. URL: [https://open.texas.gov/uploads/files/organization/opentexas/DISASTER-proclamation-amending-GA-28-mass-gatherings-IMAGE-07-02-2020.pdf](https://open.texas.gov/uploads/files/organization/opentexas/DISASTER-proclamation-amending-GA-28-mass-gatherings-IMAGE-07-02-2020.pdf), accessed 2020-07-02. 29. Office of the Texas Governor (2020b). Executive Order GA-11 by Office of the Texas Governor. URL:[https://www.nbcnewyork.com/news/local/nyc-reopening-hits-highest-gear-yet-as-shops-outdoor-dining-and-playgrounds-return-monday/2477627/](https://www.nbcnewyork.com/news/local/nyc-reopening-hits-highest-gear-yet-as-shops-outdoor-dining-and-playgrounds-return-monday/2477627/), accessed 2020-03-26. 30. Office of the Texas Governor (2020c). Executive Order GA-24 by Office of the Texas Governor. URL:[https://gov.texas.gov/uploads/files/press/DISASTER\_Adding\_Covered\_Services\_to\_GA-23\_No\_2\_COVID-19.pdf](https://gov.texas.gov/uploads/files/press/DISASTER\_Adding\_Covered\_Services\_to\_GA-23_No_2_COVID-19.pdf), accessed 2020-05-26. 31. Office of the Texas Governor (2020d). Executive Order GA-28 by Office of the Texas Governor. URL:[https://gov.texas.gov/uploads/files/press/EO-GA-28\_targeted\_response\_to\_reopening\_COVID-19.pdf](https://gov.texas.gov/uploads/files/press/EO-GA-28\_targeted\_response_to_reopening_COVID-19.pdf), accessed 2020-06-26. 32. Office of the Texas Governor (2020e). Executive Order GA-29 by Office of the Texas Governor. URL:[https://open.texas.gov/uploads/files/organization/opentexas/EO-GA-29-use-of-face-coverings-during-COVID-19-IMAGE-07-02-2020.pdf](https://open.texas.gov/uploads/files/organization/opentexas/EO-GA-29-use-of-face-coverings-during-COVID-19-IMAGE-07-02-2020.pdf), accessed 2020-07-02. 33. Pedersen MG, Meneghini M (2020). Quantifying undetected COVID-19 cases and effects of containment measures in italy. ResearchGate Preprint (online 21 March 2020) DOI, 10. 34. Piggott J, McLean J (2020). Jacksonville changes course, issues face mask man-date. News4JAX. URL: [https://www.news4jax.com/news/local/2020/06/29/jacksonville-issues-face-mask-mandate/](https://www.news4jax.com/news/local/2020/06/29/jacksonville-issues-face-mask-mandate/), accessed 2020-06-29. 35. Rand WM (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66(336): 846–850. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2307/2284239&link_type=DOI) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1971L498800028&link_type=ISI) 36. Riley S, Fraser C, Donnelly CA, Ghani AC, Abu-Raddad LJ, Hedley AJ, et al. (2003). Transmission dynamics of the etiological agent of SARS in Hong Kong: impact of public health interventions. Science, 300(5627): 1961–1966. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEzOiIzMDAvNTYyNy8xOTYxIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMTAvMDgvMjAyMC4xMC4wNi4yMDIwODEzMi5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 37. Sandhya Kambhampati MM, Krishnakumar P (2020). Which California counties are reopening? Los Angeles Times. URL: [https://www.latimes.com/projects/california-coronavirus-cases-tracking-outbreak/reopening-across-counties/](https://www.latimes.com/projects/california-coronavirus-cases-tracking-outbreak/reopening-across-counties/). 38. Santos JM, Embrechts M (2009). On the use of the adjusted rand index as a metric for evaluating supervised classification. In: International Conference on Artificial Neural Networks, 175–184. Springer. 39. Sen A, Srivastava MS (1975). On tests for detecting change in mean. The Annals of Statistics, 98–108. 40. Silcox F, Turco R (2020). Florida Businesses Ready for Full Phase One Reopening. Bay News. URL: [https://www.baynews9.com/fl/tampa/coronavirus/2020/05/18/ongoing-coronavirus-coverage](https://www.baynews9.com/fl/tampa/coronavirus/2020/05/18/ongoing-coronavirus-coverage), accessed 2020-05-18. 41. Song PX, Wang L, Zhou Y, He J, Zhu B, Wang F, et al. (2020). An epidemiological forecast model and software assessing interventions on COVID-19 epidemic in China. medRxiv. 42. Speck E, Sandoval E (2020). Orange County residents will be required to wear face masks under new executive order. Click Orlando. URL: [https://www.clickorlando.com/news/local/2020/06/18/orange-county-residents-will-be-required-to-wear-face-masks-under-new-executive-order/](https://www.clickorlando.com/news/local/2020/06/18/orange-county-residents-will-be-required-to-wear-face-masks-under-new-executive-order/), accessed 2020-06-18. 43. Steuer R, Kurths J, Daub CO, Weise J, Selbig J (2002). The mutual information: detecting and evaluating dependencies between variables. Bioinformatics, 18(suppl_2): S231–S240. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/18.suppl_2.S231&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=12386007&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F10%2F08%2F2020.10.06.20208132.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000178836800032&link_type=ISI) 44. Tadesse MG, Sha N, Vannucci M (2005). Bayesian variable selection in clustering high-dimensional data. Journal of the American Statistical Association, 100(470): 602–617. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1198/016214504000001565&link_type=DOI) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000233311300022&link_type=ISI) 45. The City Manager of the City of Miami (2020). City Mandates Facial Coverings in Public; Civil Penalties Approved. URL: [https://www.miamigov.com/files/sharedassets/public/news/2020/0625-emergency-order-20-16.pdf](https://www.miamigov.com/files/sharedassets/public/news/2020/0625-emergency-order-20-16.pdf), accessed 2020-06-25. 46. Toda AA (2020). Susceptible-infected-recovered (SIR) Dynamics of COVID-19 and Economic Impact. arXiv preprint arXiv:2003.11221. 47. Vehtari A, Gelman A, Gabry J (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing, 27(5): 1413–1432. 48. Waqas M, Farooq M, Ahmad R, Ahmad A (2020). Analysis and Prediction of COVID-19 Pandemic in Pakistan using Time-dependent SIR Model. arXiv preprint arXiv:2005.02353. 49. Weitz JS, Beckett SJ, Coenen AR, Demory D, Dominguez-Mirazo M, Dushoff J, et al. (2020). 50. Modeling shield immunity to reduce COVID-19 epidemic spread. Nature Medicine, 1–6. 51. Zhou Q, Guan Y (2018). On the null distribution of bayes factors in linear regression. Journal of the American Statistical Association, 113(523): 1362–1371. 52. Zhou T, Ji Y (2020). Semiparametric Bayesian Inference for the Transmission Dynamics of COVID-19 with a State-Space Model. arXiv preprint arXiv:2006.05581. [1]: /embed/graphic-1.gif [2]: /embed/inline-graphic-1.gif [3]: /embed/inline-graphic-2.gif [4]: /embed/inline-graphic-3.gif [5]: /embed/graphic-2.gif [6]: /embed/graphic-3.gif [7]: /embed/inline-graphic-4.gif [8]: /embed/graphic-5.gif [9]: /embed/inline-graphic-5.gif [10]: /embed/inline-graphic-6.gif [11]: /embed/inline-graphic-7.gif [12]: /embed/inline-graphic-8.gif [13]: /embed/inline-graphic-9.gif [14]: /embed/inline-graphic-10.gif [15]: /embed/inline-graphic-11.gif [16]: /embed/graphic-6.gif [17]: /embed/inline-graphic-12.gif [18]: /embed/inline-graphic-13.gif [19]: /embed/inline-graphic-14.gif [20]: /embed/graphic-7.gif [21]: /embed/inline-graphic-15.gif [22]: /embed/inline-graphic-16.gif [23]: /embed/inline-graphic-17.gif [24]: /embed/graphic-8.gif [25]: /embed/graphic-9.gif [26]: /embed/inline-graphic-18.gif [27]: /embed/graphic-10.gif [28]: /embed/graphic-11.gif [29]: /embed/inline-graphic-19.gif [30]: /embed/inline-graphic-20.gif [31]: /embed/graphic-12.gif [32]: /embed/inline-graphic-21.gif [33]: /embed/inline-graphic-22.gif [34]: /embed/inline-graphic-23.gif [35]: /embed/inline-graphic-24.gif [36]: /embed/graphic-13.gif [37]: /embed/inline-graphic-25.gif [38]: /embed/inline-graphic-26.gif [39]: /embed/inline-graphic-27.gif [40]: /embed/inline-graphic-28.gif [41]: /embed/inline-graphic-29.gif [42]: /embed/inline-graphic-30.gif [43]: /embed/inline-graphic-31.gif [44]: /embed/graphic-14.gif [45]: /embed/inline-graphic-32.gif [46]: /embed/inline-graphic-33.gif [47]: /embed/inline-graphic-34.gif [48]: /embed/inline-graphic-35.gif [49]: /embed/inline-graphic-36.gif [50]: /embed/inline-graphic-37.gif [51]: /embed/inline-graphic-38.gif [52]: /embed/inline-graphic-39.gif [53]: /embed/graphic-15.gif [54]: /embed/inline-graphic-40.gif [55]: /embed/inline-graphic-41.gif [56]: /embed/inline-graphic-42.gif [57]: /embed/inline-graphic-43.gif [58]: /embed/graphic-16.gif [59]: /embed/graphic-17.gif [60]: /embed/inline-graphic-44.gif [61]: /embed/inline-graphic-45.gif [62]: /embed/inline-graphic-46.gif [63]: /embed/graphic-18.gif [64]: /embed/inline-graphic-47.gif [65]: /embed/inline-graphic-48.gif [66]: /embed/inline-graphic-49.gif [67]: /embed/inline-graphic-50.gif [68]: /embed/inline-graphic-51.gif [69]: /embed/inline-graphic-52.gif [70]: /embed/inline-graphic-53.gif [71]: /embed/inline-graphic-54.gif [72]: /embed/inline-graphic-55.gif [73]: /embed/inline-graphic-56.gif [74]: /embed/inline-graphic-57.gif [75]: /embed/inline-graphic-58.gif [76]: /embed/inline-graphic-59.gif [77]: /embed/graphic-19.gif [78]: /embed/inline-graphic-60.gif [79]: /embed/inline-graphic-61.gif [80]: /embed/graphic-20.gif [81]: /embed/inline-graphic-62.gif [82]: /embed/inline-graphic-63.gif [83]: /embed/inline-graphic-64.gif [84]: /embed/graphic-22.gif [85]: /embed/inline-graphic-65.gif [86]: /embed/inline-graphic-66.gif [87]: /embed/inline-graphic-67.gif [88]: /embed/graphic-23.gif [89]: /embed/inline-graphic-68.gif [90]: /embed/inline-graphic-69.gif [91]: /embed/inline-graphic-70.gif [92]: /embed/inline-graphic-71.gif [93]: /embed/graphic-24.gif [94]: /embed/inline-graphic-72.gif [95]: /embed/inline-graphic-73.gif [96]: /embed/inline-graphic-74.gif [97]: /embed/inline-graphic-75.gif [98]: /embed/inline-graphic-76.gif [99]: /embed/inline-graphic-77.gif [100]: /embed/inline-graphic-78.gif [101]: /embed/inline-graphic-79.gif [102]: /embed/inline-graphic-80.gif [103]: /embed/inline-graphic-81.gif [104]: /embed/inline-graphic-82.gif [105]: /embed/inline-graphic-83.gif [106]: /embed/graphic-32.gif [107]: /embed/inline-graphic-84.gif [108]: /embed/inline-graphic-85.gif [109]: /embed/graphic-34.gif [110]: /embed/graphic-35.gif [111]: /embed/graphic-36.gif [112]: /embed/inline-graphic-86.gif [113]: /embed/inline-graphic-87.gif [114]: /embed/inline-graphic-88.gif [115]: /embed/graphic-37.gif [116]: /embed/graphic-38.gif [117]: /embed/inline-graphic-89.gif [118]: /embed/graphic-39.gif [119]: /embed/inline-graphic-90.gif [120]: /embed/inline-graphic-91.gif [121]: /embed/inline-graphic-92.gif [122]: /embed/inline-graphic-93.gif [123]: /embed/graphic-40.gif [124]: /embed/inline-graphic-94.gif [125]: /embed/inline-graphic-95.gif [126]: /embed/graphic-41.gif [127]: /embed/graphic-42.gif [128]: /embed/graphic-43.gif [129]: /embed/graphic-44.gif [130]: /embed/graphic-45.gif [131]: /embed/graphic-46.gif [132]: /embed/graphic-47.gif [133]: /embed/graphic-48.gif [134]: /embed/graphic-49.gif [135]: /embed/graphic-50.gif