Characterizing the Dynamic of COVID-19 with a New Epidemic Model: Susceptible-Exposed-Symptomatic-Asymptomatic-Active-Removed
=============================================================================================================================

* Grace Y Yi
* Pingbo Hu
* Wenqing He

## Abstract

The coronavirus disease 2019 (COVID-19), caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has spread stealthily and presented a tremendous threat to the public. It is important to investigate the transmission dynamic of COVID-19 to help understand the impact of the disease on public health and economy. While a number of epidemic models have been available to study infectious diseases, they are in-adequate to describe the dynamic of COVID-19. In this paper, we develop a new epidemic model which utilizes a set of ordinary differential equations with unknown parameters to delineate the transmission process of COVID-19. Different from the traditional epidemic models, this model accounts for asymptomatic infections as well the lag between symptoms onset and the confirmation date of infection. We describe an estimation procedure for the unknown parameters in the proposed model by adapting the *iterated filter-ensemble adjustment Kalman filter* (IF-EAKF) algorithm to the reported number of confirmed cases. To assess the performance of our proposed model, we examine COVID-19 data in Quebec for the period of April 2, 2020 to May 10, 2020 and carry out sensitivity studies under a variety of assumptions. To reflect the transmission potential of an infected case, we derive the *basic reproduction number* from the proposed model. The estimated basic reproduction number suggests that the pandemic situation in Quebec for the period of April 2, 2020 to May 10, 2020 is not under control.

Key Words
*   Basic reproduction number
*   COVID-19
*   epidemic model
*   transmission
*   IF-EAKF algorithm

## 1 Introduction

The coronavirus disease 2019 (COVID-19), caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has spread stealthily and has present a tremendous threat to the public health. On March 11, 2020, the World Health Organization (WHO) declared COVID-19 to be a global pandemic. The first COVID-19 case in Canada was identified in Toronto on January 25, 2020 (The Canadian Press, 2020). In the early stage, imported cases from other countries were the main source of the COVID-19 outbreak in Canada. On March 16, 2020, the closure of Canadian borders was announced to people who are not Canadian citizens or permanent residents (Vogel, 2020). Various measures and interventions have been taken by the federal government and provincial governments to mitigate the virus spread.

While those steps are important to help contain the virus transmissions, it is imperative to investigate the transmission dynamic from the quantitative perspectives. In the literature, a variety of epidemic models have been developed to study infectious diseases, including the *Susceptible-Infectious-Recovered* (SIR) model (e.g., Kermack and McKendrick, 1927), the *Susceptible-Infectious-Susceptible* (SIS) model (e.g., Duan et al., 2015), the *Susceptible-Exposed-Infectious-Recovered* (SEIR) model (e.g., Duan et al., 2015), the Reed-Frost model (e.g., Abbey, 1952), and their variants (e.g., Ng and Orav, 1990; Ng, Turinici, and Danchin, 2003). Applications of those epidemic models have been extensive. To name a few, Osthus et al. (2017) used the SIR model to forecast seasonal influenza; Shah and Gupta (2013) applied the SEIR model to examine transmission processes of vector borne diseases; Ng and Orav (1990) proposed a generalized Reed-Frost model to predict human immunodeficiency virus (HIV) incidence in San Francisco’s homosexual population; and Ng et al. (2003) developed a modified SEIR model, called the *Susceptible-Exposed-Infectious-Recovered-Protection* (SEIRP) model, to study the outbreak of *Severe Acute Respiratory Syndrome* (SARS) in China.

While many epidemic models are available, they are inadequate to facilitate the unique epidemiological characteristics of COVID-19. In this paper, we propose a new epidemic model, called the *Susceptible-Exposed-Asymptomatic-Symptomatic-Active-Removed* (SEASAR) model, to delineate the COVID-19 transmission dynamic. This model generalizes the SIR and SEIR models by accounting for asymptomatic infections and the lag between symptoms onset and the diagnosis time, yet it does not require more assumptions required by the SIR and SEIR models. Consistent with many epidemic models, we make two rountine assumptions: (1) the population is homogeneous and well-mixed; and (2) there are no inbound and outbound travels. Similar to the SIR and SEIR models, the SEASAR model is a deterministic model which utilizes differential equations to describe the transmission dynamic of COVID-19. Furthermore, we derive the basic reproduction number from the proposed model to provide a scalar measure of the pandemic.

To implement the proposed model, we develop an estimation procedure for the model parameters by adapting the iterated filter-ensemble adjustment Kalman filter (IF-EAKF) algorithm (e.g., Li et al., 2020), where resampling from Bayesian posterior distributions is basically invoked. We evaluate the performance of our transmission model and the estimation algorithm by analyzing the COVID-19 data in Quebec for the period of April 2, 2020 to May 10, 2020, where we focus on comparing differences between the predicted cumulative numbers of cases and the reported cumulative numbers of cases. We also conduct sensitivity analyses to assess how the estimation of the model parameters and the predicted results may change as the model assumptions are altered.

Analyzing the COVID-19 data in Quebec is driven by the following considerations. Quebec is the worst-hit province in Canada, and it is thereby interesting to examine the virus transmissions in this province. More importantly, it is imperative to ensure the required conditions to be met as much as possible when applying the model to analyze data. As pointed out earlier, the validity of the proposed SEASAR model hinges on the assumption of no inbound and outbound travels, which is typically untrue in practice, but modeling data in a certain period is likely to make this assumption approximately true. For the period of April 2, 2020 to May 10, 2020, Quebec government set up checkpoints to block all non-essential travels into the province and people in Quebec were advised to stay home, thus, inbound and outbound travels in Quebec for this time window are perceived to be the least.

The remainder of this article is organized as follows. We develop the SEASAR model and elaborate on its rationale in Section 2. In Section 3, we present the initialization setup of the SEASAR model and describe the implementation procedure by adapting the IF-EAKF algorithm. In Section 4, we utilize the proposed SEASAR model to analyze the Quebec COVID-19 data for the period of April 2, 2020 to May 10, 2020. The article is concluded with a discussion presented in Section 5.

## 2 Model Framework

With a meta analysis, He et al. (2020) estimated that about 46% individuals with COVID-19 are asymptomatic, and they have the infectious ability to transmit the disease (e.g., Hao et al., 2020; Li et al., 2020). Due to the limited testing resources and the flu-like manifestation of COVID-19 as well as the incubation period, there is a time lag between symptoms onset and being confirmed for infected individuals (Kramer et al., 2020). To facilitate these features of COVID-19, we develop a new model, called the *Susceptible-Exposed-Asymptomatic-Symptomatic-Active-Removed* (SEASAR) model, under the assumptions of a well-mixed homogeneous population and of no inbound and outbound travels (i.e., the population size remains to be unchanged over time).

### 2.1 Illustration of the Proposed Model

To illustrate the ideas, we first consider a *static* framework by focusing at a given time point, say, on a given day. We divide the target population into six subpopulations with specific features, where *S* represents the subpopulation of *susceptible* cases (i.e., those at risk of becoming infected with the novel coronavirus), *E* is the subpopulation of *exposed* cases (i.e., those who are infected but do not have the infectious ability yet and are still in the latent period (e.g., Peng et al., 2020), *I**a* stands for the subpopulation of *asymptomatic* infections (i.e., those with the infectious ability but showing no symptoms), *I**s* represents the subpopulation of *symptomatic* infections (i.e., those who show symptoms and have the infectious ability but are not confirmed yet), *A* is the subpopulation of *active* cases (i.e., those confirmed cases who do not recover or die), and *R* denotes the subpopulation of *removed* cases (i.e., those confirmed cases who recover or die from COVID-19).

Next, we introduce the parameters to facilitate the dynamic changes among the subpopulations. Let *Z* denote the *average latent period*, defined as the average time from being infected to having the infectious ability. Various studies have been conducted to estimate the value of *Z* (e.g., Bai et al., 2020; Guan et al., 2020; He, Yi, and Zhu, 2020), so here we take *Z* as being available. Let *θ* denote the *symptomatic transmission rate*, defined as the average number of individuals infected by a symptomatic case per unit time (e.g., a day). Let the *asymptomatic transmission rate* be denoted as *µθ*, which is defined as the average number of individuals infected by an asymptomatic case per unit time. As asymptomatic infections are regarded as less infectious than symptomatic cases (e.g., Li et al., 2020), *µ* is a constant between 0 and 1. Let *α* denote the fraction of *symptomatic* infections relative to all infections, let *β* denote the average rate for *asymptomatic* infections to develop symptoms per unit time, and let *γ* denote the average recovery rate of *asymptomatic* infections per unit time. Let *F* stand for the average time from symptoms onset to the time of being confirmed, let *B* denote the average time from being confirmed to being recovered, and let *J* represent the average time from being confirmed to die. Figure 1 is a flow chart showing the relationship among the six subpopulations. A black solid arrow between two subpopulations indicates the members of one subpopulation can move into the other subpopulation; a red dashed arrow between two subpopulations indicates that members in one subpopulation can be infected by members in the other subpopulation. Once COVID-19 cases are confirmed, they will be quarantined and lose the infectious ability, so the group *S* cannot be infected by the group *A*. Due to the limited test capacity, asymptomatic individuals are not being tested for COVID-19 (especially in the early stage of the outbreak). Thus, there is no transition from *I**a* to *A*. The parameters on black solid lines determine the number of people who change from one state to another state per unit time; and the parameters on red dashed lines determine the number of people infected by asymptomatic or symptomatic cases per unit time.

![Figure 1:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/12/11/2020.12.08.20246264/F1.medium.gif)

[Figure 1:](http://medrxiv.org/content/early/2020/12/11/2020.12.08.20246264/F1)

Figure 1: 
Illustration of the SEASAR model. The population is divided into six compartments: S (susceptible), E (exposed), Ia (asymptomatic), Is (symptomatic), A (active), and R (removed).

### 2.2 Dynamic Model

Figure 1 shows a static chart for the transmissions among the six subpopulations at a given time point. However, for any time period, the transmissions are not static but dynamic. To characterize the *dynamic* evolution of the subpopulations over time, we modify the six subpopulations by showing their dependence on time *t*, yielding the state components {*S*(*t*), *E*(*t*), *I**a*(*t*), *I**s*(*t*), *A*(*t*), *R*(*t*)}. For ease of notation, we also use the same symbols to represent the *sizes* of those state components. Let *ϕ*(*t*) = (*S*(*t*), *E*(*t*), *I**a*(*t*), *I**s*(*t*), *A*(*t*), *R*(*t*))T denote the vector of the six subpopulation sizes at time *t*.

Given the setup, we now present the proposed SEASAR model, given by the following set of ordinary differential equations: ![Formula][1]</img>  ![Formula][2]</img>  ![Formula][3]</img>  ![Formula][4]</img>  ![Formula][5]</img>  ![Formula][6]</img>  where *N* is the population size, which is time-invariant due to the assumptions of no outbound and inbound travels. Because of this assumption, *S*(*t*)+*E*(*t*)+*I**a*(*t*)+*I**s*(*t*)+*A*(*t*)+*R*(*t*) = *N* for any time point *t*, and hence, any equation above is determined by other five equations.

For ease of referral of equations (1)-(6) in the later development, we express those equations in a compact form: ![Formula][7]</img>  where *g*(·, ·) represents the vector function determined by the right hand side of (1)-(6). Figure 2 is a flowchart of the transmission dynamic for the six subpopulations together with the associated values.

![Figure 2:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/12/11/2020.12.08.20246264/F2.medium.gif)

[Figure 2:](http://medrxiv.org/content/early/2020/12/11/2020.12.08.20246264/F2)

Figure 2: 
Dynamic of the transmission among the six subpopulations at any time t

The right hand side of each of equations (1)-(6) shows how the change rate of each subpopulation is related to the transmission rates as well as the associated parameters defined earlier. To see why, we first examine the right hand side of (1) by considering a small time interval [*t, t* + Δ*t*) of length Δ*t*. Over the time interval [*t, t* + Δ*t*), because *θI**s*(*t*)Δ*t* represents the average number of people infected by symptomatic infections and ![Graphic][8]</img> is the proportion of susceptible cases, so ![Graphic][9]</img> equals the average number of *susceptible* cases who are infected by *symptomatic* infections; similarly, ![Graphic][10]</img> equals the average number of *susceptible* cases who are infected by *asymptomatic* infections. Thus, ![Graphic][11]</img> records the average number of *susceptible* cases moving to the *exposed* state over the time interval [*t, t* + Δ*t*), i.e., the reduced number of *susceptible* cases over the time interval [*t, t* + Δ*t*) is ![Graphic][12]</img>. Then dividing Δ*t* on the both sides and letting Δ*t* → 0 yields (1).

Equations (2)-(6) can be illustrated in an analogous way. Regarding (2), due to the changing from susceptible cases to being in the *exposed* state, the number of *exposed* cases is increased by ![Graphic][13]</img> over the interval [*t, t* + Δ*t*); yet due to the latent period, the number of *exposed* cases is reduced by ![Graphic][14]</img> over the interval [*t, t* + Δ*t*), yielding that the difference in the number of *exposed* cases over the time interval [*t, t* + Δ*t*) is ![Graphic][15]</img>. Dividing Δ*t* on the both sides and letting Δ*t* → 0 yields (2).

Regarding (3), when *exposed* cases pass the latent period, they become either *asymptomatic* or *symptomatic* infections. As the fraction of *asymptomatic* infections is 1 − *α*, the number of asymptomatic infections is thus increased by ![Graphic][16]</img> over the interval [*t, t* + Δ*t*). On the other hand, *βI**a*(*t*)Δ*t* is the average number of infections who initially are *asymptomatic* but later show symptoms over the interval [*t, t* + Δ*t*) (i.e., people in this group turn from the *asymptomatic* state to the *symptomatic* state), and *γI**a*(*t*)Δ*t* is the average number of *asymptomatic* infections who recover from COVID-19 (i.e., people in this group move from the *asymptomatic* state to the *removed* state). Thus, the number of asymptomatic infections is reduced by *βI**a*(*t*)Δ*t* + *γI**a*(*t*)Δ*t* over the interval [*t, t* + Δ*t*). Therefore, the difference in the number of asymptomatic infections is ![Graphic][17]</img>, yielding (3) by dividing Δ*t* on the both sides and letting Δ*t* → 0.

Regarding (4), due to the change of individuals from the *exposed* and *asymptomatic* states to the *symptomatic* state, the number of symptomatic infections is increased by ![Graphic][18]</img> over the interval [*t, t* + Δ*t*). On the other hand, ![Graphic][19]</img> is the average number of *symptomatic* infections who are confirmed over the interval [*t, t* + Δ*t*) (i.e., people in this group turn from the *symptomatic* state to the *active* state); in other words, the reduced number of *symptomatic* infections over the interval [*t, t* + Δ*t*) is ![Graphic][20]</img>. Thus, the difference in the number of *symptomatic* infections is ![Graphic][21]</img>, leading to (4) if dividing Δ*t* on the both sides and letting Δ*t* → 0.

Regarding (5), due to the changing from the *symptomatic* state to the *active* state, the number of active cases is increased by ![Graphic][22]</img> over the interval [*t, t* + Δ*t*). Since ![Graphic][23]</img> is the average number of *active* cases who recover from COVID-19 and ![Graphic][24]</img> is the average number of *active* cases who die from COVID-19 over the interval [*t, t* + Δ*t*) (i.e., people in the two groups turn from the *active* state to the *removed* state), the number of active cases is reduced by ![Graphic][25]</img> over the interval [*t, t* + Δ*t*). Therefore, the difference in the number of active cases is ![Graphic][26]</img>, yielding (5) if we divide Δ*t* on the both sides and let Δ*t* → 0.

Regarding (6), due to the changing from the *active* and *asymptomatic* state to the *removed* state, the number of removed cases is increased by ![Graphic][27]</img> over the interval [*t, t* + Δ*t*), i.e., ![Graphic][28]</img>. Thus, dividing Δ*t* on the both sides and letting Δ*t* → 0 yields (6). Alternatively, (6) equals the negative sum of (1) to (5).

### 2.3 Basic Reproduction Number

Let *η* = (*θ, µ, α, β, γ, F, B, J*)T denote the vector of parameters of prime interest. Knowing the value of *η* allows us to describe the six subpopulations sizes using the models (1)-(6). Further, it enables us to describe the severity of the pandemic using some simple measure such as the *basic reproduction number*, denoted *R*, which is defined as the expected number of cases infected by one case in a population consisting of individuals susceptible to infection.

A large value of *R* indicates a more severe pandemic. Usually, comparing *R* to 1 describes the spread of the disease. “*R* > 1” suggests that the infection is spreading in the population, and “*R* < 1” indicates the dying down situation. In Appendix A, we show that the mathematical expression of *R* derived from the SEASAR model is ![Formula][29]</img>  

## 3 Estimation Procedure

In this section, we develop an estimation procedure for *η* by adapting the iterated filter-ensemble adjustment Kalman filter (IF-EAKF) algorithm (e.g., Li et al., 2020) in combination with the 4th order Runge-Kutta (RK4) method (e.g., Süli and Mayers, 2003, p.328).

### 3.1 Initial Sizes of Subpopulations

Let *τ* denote the initial time point from which we start examining the data. By time *τ*, let *R**c*, *C* and *D* denote the reported cumulative number of recoveries, confirmed cases and deaths from COVID-19, respectively, which are available; and let *R**a* denote the cumulative number of recovered asymptomatic cases, which is unavailable. Let *Q*(*t*) denote the number of reported cases with symptoms onset on day *t*.

To facilitate the relationship between the observed and unobserved values, let ![Graphic][30]</img> denote the ratio of the unobserved *R**a* to the observed *R**c*, and let ![Graphic][31]</img> represent the ratio of unobserved size *I**a*(*τ*) to the reported size *I**s*(*τ*). Motived by Hao et al. (2020), we express the size *E*(*τ*) in terms of its relative value to the total number of reported cases with symptoms onset over the time window of length ![Graphic][32]</img>, where the function [*x*] represents the biggest integer that is less than or equal to *x*, and *r*3 is a positive value.

While the introduction of the ratios *r*1, *r*2 and *r*3 does not give us a way to determine the un-observed values *R**a*, *I**a*(*τ*) and *E*(*τ*) using the observed data, these ratios offer us convenient measures to describe the pandemic situation in relative scales of the observed values at time *τ*. For instance, at the early stage of the pandemic, the testing kits are limited, so the number of recoveries from *confirmed* cases is likely to be a lot smaller than that from *asymptomatic* individuals, suggesting a large value of *r*1. If *r*2 is bigger than 1, then there are more *asymptomatic* infections than *symptomatic* infections.

Table 1 summarizes the initial sizes of the six subpopulations which are to be used in Section 3.3, where the values of *r*1, *r*2 and *r*3 are specified. Write *ϕ*(*τ*) = (*S*(*τ*), *E*(*τ*), *I**a*(*τ*), *I**s*(*τ*), *A*(*τ*), *R*(*τ*))T.

View this table:
[Table 1:](http://medrxiv.org/content/early/2020/12/11/2020.12.08.20246264/T1)

Table 1: 
Initial state size for the SEASAR model

### 3.2 Initialization of Model Parameters

Estimation of *η*, to be described in Section 3.3, is conducted by an iterative approximation procedure using the posterior distributions of *η*. Here we assume the prior information for the parameters to be non-informative except for constraining the parameters to certain ranges to reflect our a priori knowledge about them.

To be specific, the transmission rate *θ* of symptomatic infections is considered to be 0 ≤ *θ* ≤ 7 to cover the reported values in the literature, including Hao et al. (2020) and Li et al. (2020). The multiplicative factor *µ* is assumed to satisfy 0.1 ≤ *µ* ≤ 1, the fraction *α* of symptomatic infections relative to all infections is restricted to be 0.1 ≤ *α* ≤ 1, the average rate *β* of asymptomatic infections who develop symptoms per unit time is constrained to be 0.0002 ≤ *β* ≤ 0.8, and the average recovery rate *γ* of asymptomatic infections per unit time is set as 0.1 ≤ *γ* ≤ 1. The average time *F* from symptoms onset to being confirmed is considered to be between 1 and 10 days based on the study of Kramer et al. (2020). Based on the WHO report (WHO, 2020), the average time *B* from being confirmed to being recovered is considered to be in the range of 7 to 42 days, and the average time *J* from being confirmed to die is taken to change from 14 to 56 days.

### 3.3 IF-EAKF algorithm

With the setup in Sections 3.1 and 3.2, we describe an estimation procedure by adapting the iterated filtering (IF) algorithm. The IF approach basically produces the maximum likelihood estimates of model parameters and has been successfully applied to study many infectious diseases (Li et al., 2020), including cholera (e.g., King et al., 2008) and measles (e.g., He et al., 2010). An efficient IF approach roots in using the *ensemble adjustment Kalman filter* (EAKF) which can generate satisfactory results with only hundreds of samples (Li et al., 2020).

Since the number of confirmed cases is reported on a daily basis, we take the time unit as a day. Consider a population of interest. In contrast to the number *Q*(*t*) of reported cases with symptoms onset on day *t*, defined in Section 3.1, we let *Y* (*t*) denote the number of confirmed cases to be reported on day *t*, which is regarded as a random variable, and let *y**t* denote its realization. Let ![Graphic][33]</img> denote the number of confirmed cases on day *t* that is generated from the SEASAR model. We assume that *Y* (*t*) and *Ŷ*(*t*) are connected through the model ![Formula][34]</img>  where the *ϵ**t* are independent of each other and of the *Ŷ*(*t*), *ϕ*(*t*) and *η*; and *ϵ**t* follows a Gaussian distribution with mean 0 and time-dependent variance ![Graphic][35]</img>.

Being called the *observational error variance* (OEV) by Li et al. (2020), ![Graphic][36]</img> is often estimated heuristically via an assumed function form of the observations. For example, in the analysis in Section 4.2, ![Graphic][37]</img> is assumed to be ![Formula][38]</img>  

Similar forms of OEV were used in the studies of other infectious diseases, including influenza (e.g., Pei et al., 2018), Ebola (e.g., Shaman et al., 2014), West Nile virus (e.g., DeFelice et al., 2017), and respiratory syncytial virus (e.g., Reis and Shaman, 2016).

The model parameter *η* for (1)-(6), the subpopulation sizes *ϕ*(*t*) and *Ŷ*(*t*) at time *t* for *t* ≥ *τ* are unknown. One may consider the joint posterior distribution of *η*, log *Ŷ*(*t*) and log *ϕ*(*t*), and then use the mean of the posterior distribution of *η* to estimate *η*, where log *ϕ*(*t*) := (log *S*(*t*), log *E*(*t*), log *I**a*(*t*), log *I**s*(*t*), log *A*(*t*), log *R*(*t*))T. To reduce computation costs, we adapt the discussion of (Anderson, 2001, p.2888) and describe a simple iterative procedure by pairing two members in {*η*, log *Ŷ*(*t*), log *ϕ*(*t*)}, where we typically pair log *Ŷ*(*t*) with each component in {*η*, log *ϕ*(*t*)}. Let *τ* + *T* − 1 denote the time point by which we stop iterations. To see the main ideas, here we elaborate on the detail of the first iteration for the IF-EAKF algorithm which includes the following six stages, with similar details for other iterations omitted.

*   **Stage 1:** At time *t* = *τ*, generate prior values for *η* and *Ŷ*(*t*):
    
    *   − **Step 1:** Let *π**η* denote a prior distribution for parameter *η*, which is taken as the uniform distribution over the ranges specified in Section 3.2, with the assumption that the parameter components in *η* are independent of each other.
    
    *   − **Step 2:** Specify a positive integer, say *n*. Then generate *n* values from *π**η*, denoted ![Graphic][39]</img>, where for each ![Graphic][40]</img>.
    
    *   − **Step 3:** Using the initial size *I**s*(*τ*) of symptomatic infections, we generate *n* prior values for *Ŷ*(*τ*), denoted ![Graphic][41]</img>, by setting ![Graphic][42]</img> for *i* = 1, …, *n*. Then calculate the sample mean and variance: ![Formula][43]</img>  together with the pairwise sample covariance ![Formula][44]</img>  where *X* is a symbol for *θ, µ, α, β, γ, F*, *B*, and *J*.

*   **Stage 2:** At time *t* = *τ*, generate posterior values for *η* and *Ŷ*(*t*):
    
    We employ the following steps to generate *n* posterior values for *Ŷ*(*τ*) and *η* from their posterior distribution, derived in Appendix B.
    
    *   − **Step 1:** Generate *n* posterior values for *Ŷ*(*τ*), denoted ![Graphic][45]</img> For each ![Graphic][46]</img> is determined by ![Formula][47]</img>  where ![Graphic][48]</img> is given by (8) with *t* = *τ*, and ![Graphic][49]</img> is the number of confirmed cases reported on day *τ*.
    
    *   − **Step 2:** Generate *n* posterior values for *η*, denoted ![Graphic][50]</img>, where for each ![Graphic][51]</img> Specifically, each component of ![Graphic][52]</img> is given by ![Formula][53]</img>  where *X* is a symbol for *θ, µ, α, β, γ, F*, *B*, and *J*.

*   **Stage 3:** At time *t* = *τ* + 1, generate prior values for *η, ϕ*(*t*), and *Ŷ*(*t*):
    
    *   − **Step 1:** Generate *n* prior values for *η*, denoted ![Graphic][54]</img>, by setting ![Graphic][55]</img> for *i* = 1, …, *n*, where we denote ![Graphic][56]</img>, for each *i*.
    
    *   − **Step 2:** Using the RK4 method, we generate *n* prior values for *ϕ*(*τ* + 1), denoted ![Graphic][57]</img>, where for each ![Graphic][58]</img>. Specifically, let ![Graphic][59]</img>, and ![Graphic][60]</img>, Then we set ![Formula][61]</img>  
    
    *   − **Step 3:** Generate *n* prior values for *Ŷ*(*τ* +1), denoted ![Graphic][62]</img>, by setting ![Graphic][63]</img> for *i* = 1, …, *n*. Then we calculate ![Formula][64]</img>  together with the pairwise sample covariance ![Formula][65]</img>  where *X* is a symbol for *S, E, I**a*, *I**s*, *A*, and *R*.

*   **Stage 4:** At time *t* = *τ* + 1, generate posterior values for *η, ϕ*(*t*), and *Ŷ*(*t*). This stage is similar to Stage 2.
    
    *   − **Step 1:** Similar to Stage 2, we generate *n* posterior values for *Ŷ*(*τ* + 1) and *η*, denoted ![Graphic][66]</img> and ![Graphic][67]</img>, respectively.
    
    *   − **Step 2:** Generate *n* posterior values for *ϕ*(*τ* + 1), denoted ![Graphic][68]</img>, where for each ![Graphic][69]</img>. Specifically, each component of ![Graphic][70]</img> is given by ![Formula][71]</img>  where *X* is a symbol for *S, E, I**a*, *I**s*, *A*, and *R*.

*   **Stage 5:** At time *t* = *τ* + 2, generate prior values for *η, ϕ*(*t*), and *Ŷ*(*t*):
    
    *   − **Step 1:** Generate *n* prior values for *η*, denoted ![Graphic][72]</img>, by setting ![Graphic][73]</img> for *i* = 1, …, *n*.
    
    *   − **Step 2:** Generate *n* prior values for *ϕ*(*τ* + 2), denoted ![Graphic][74]</img>, using the RK4 method, where similar to (11), ![Formula][75]</img>  
    
    *   − **Step 3:** Similar to Step 3 in Stage 3, we generate *n* prior values for *Ŷ*(*τ* + 2).

*   **Stage 6:** Similar to Stage 4, we generate *n* posterior values of *η, ϕ*(*τ* + 2) and *Ŷ*(*τ* + 2).

We repeat Stages 5-6 for *t* = *τ* + 3, …, *τ* + *T* − 1, and obtain a sequence of posterior values for *η*, denoted ![Graphic][76]</img>. Then we calculate ![Formula][77]</img>  and use *x̄*1 for the next iteration.

The proceding descriptions show the steps for the first iteration; steps for subsequent iterations, similar to the first iteration, are outlined in Algorithm 1, where Σ is the covariance matrix of the prior distribution *π**η* which is shrunk by a discount factor *a* ∈ (0, 1), usually pre-specified to reduce the variance of the posterior distribution of the parameters as the iteration progresses. Using a discount factor is a standard procedure to ensure that ![Graphic][78]</img> in Algorithm 1, the estimate at the *L*th iteration, is nearly identical to the maximum likelihood estimate of *η*. If *a* is too small, the algorithm may “quench” too fast and fail to find the maximum likelihood estimate; if it is too close to 1, the algorithm may not converge in a reasonable time (see Li et al., 2020, Supplementary Materials, p.8). Practically, *a* can range between 0.9 to 0.99. The number *L* of iterations is set in an ad hoc way, often determined by inspecting the evolution of the posterior parameter distributions (see Li et al., 2020, Supplementary Materials, p.8). The scale of *n* is in hundreds, though in principle, the larger the better.

## 4 Data Analysis

### 4.1 Quebec COVID-19 Data

On April 1, 2020, Quebec provincial government in Canada set up checkpoints to block all non-essential travels into the province and advised people in Quebec to stay home. After May 10, 2020, Quebec and other provinces in Canada gradually reopened the economy, yielding inbound and outbound travels in Quebec. Therefore, it is reasonable to perceive inbound and outbound travels in the period of April 2, 2020 to May 10, 2020 in Quebec to be the least, and the assumption of no inbound and outbound travels required by the proposed SEASAR model is relatively feasible for the Quebec COVID-19 data in this period.

Driven by this, we apply the proposed model to analyze the daily reported number of COVID-19 cases in Quebec, Canada, in this period, available at [https://www.canada.ca/en/public-health/services/diseases/2019-novel-coronavirus-infection.html](https://www.canada.ca/en/public-health/services/diseases/2019-novel-coronavirus-infection.html). We take the initial time point *τ* as April 2, 2020 and split the study period into two parts: the period of April 1 to April 30, 2020, and the period of May 1 to May 10, 2020. The data for the first period are used to estimate the model parameters by applying Algorithm 1, and the data in the second period are used to assess the prediction performance of our proposed model.

Algorithm 1:
### IF-EAKF

![Figure3](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/12/11/2020.12.08.20246264/F3.medium.gif)

[Figure3](http://medrxiv.org/content/early/2020/12/11/2020.12.08.20246264/F3)

### 4.2 Estimation of Model Parameters

We run Algorithm 1 to the proposed SEASAR model, where we set *n* = 300, *a* = 0.9, *L* = 50, and the average latent period *Z* is taken as 5.2 days, an estimate reported by Bai et al. (2020) and Hao et al. (2020). For the initial sizes of the six subpopulations discussed in Section 3.1, we take *r*1 = 2, *r*2 = 1, and *r*3 = 2. As described in Section 3.1, the initial sizes of subpopulations depend on the daily number of COVID-19 patients with symptoms onset before May 10, 2020, which is not available in Quebec. However, the daily number of COVID-19 patients with symptoms onset in Canada is collected by the Public Health Agency of Canada ([https://health-infobase.canada.ca/covid-19/epidemiological-summary-covid-19-cases.html](https://health-infobase.canada.ca/covid-19/epidemiological-summary-covid-19-cases.html)). Using this information, we approximate the daily number of COVID-19 patients with symptoms onset before May 10, 2020 in Quebec. Specifically, let *n**t,C* denote the number of COVID-19 patients with symptoms onset on day *t* in Canada, and let *m**t,C* and *m**t,Q* denote the number of confirmed COVID-19 cases on day *t* in Canada and Quebec, respectively. We then take ![Graphic][79]</img> as an estimated number of COVID-19 patients with symptoms onset on day *t* in Quebec.

To account for stochastic effects, we run the IF-EAKF algorithm 1000 times and display in Figure 4 the histograms for those estimates. A 95% confidence interval (CI) for each parameter is determined by using the 2.5% and 97.5% percentiles of the 1000 estimates as its lower and upper bounds, respectively. Table 2 reports the average estimates and the associated 95% CIs for the parameters, together with the average of those 1000 estimated *basic reproduction numbers* and its associated 95% CI. The estimate of *R* suggests that the pandemic situation in Quebec for the period of April 2, 2020 to April 30, 2020 is not under control and the virus spread continues.

![Figure 3:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/12/11/2020.12.08.20246264/F4.medium.gif)

[Figure 3:](http://medrxiv.org/content/early/2020/12/11/2020.12.08.20246264/F4)

Figure 3: 
The estimated curves of S(t), E(t), Ia(t), Is(t), A(t), and R(t).

![Figure 4:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/12/11/2020.12.08.20246264/F5.medium.gif)

[Figure 4:](http://medrxiv.org/content/early/2020/12/11/2020.12.08.20246264/F5)

Figure 4: 
The distribution of 1000 estimates for each parameter. The range of the x-axis for each subfigure is set as the initial range of the corresponding parameter.

View this table:
[Table 2:](http://medrxiv.org/content/early/2020/12/11/2020.12.08.20246264/T2)

Table 2: 
The average estimates and 95% CIs for the parameters

### 4.3 Dynamic Projection of Subpopulation Sizes

To project possible trajectories of the pandemic in Quebec, Canada, we are interested in visualizing the estimated sizes for the six subpopulations in the next 1000 days. To this end, let ![Graphic][80]</img> denote the vector of the parameter estimates reported in Table 2. Let ![Graphic][81]</img>, and ![Graphic][82]</img>. Then similar to (11), we apply the RK4 method to estimate *ϕ*(*t*) recursively for *t* = *τ* + 1, *τ* + 2, …, *τ* + 1000: ![Formula][83]</img>  where the initial value *ϕ*(*τ*) is given in Section 3.1.

Figure 3 presents smooth lines connecting the estimates of an element of ![Graphic][84]</img> for the time points from *t* = *τ* to *t* = *τ*+1000, where *S*(*τ*) is roughly equal to ![Graphic][85]</img>, with *Ê*(*τ*), *Îa*(*τ*), *Îs*(*τ*), *Â*(*τ*) and ![Graphic][86]</img> respectively denoting estimates for the corresponding subpopulation size and *N* being the population size of Quebec. Figure 3 shows the patterns that may be useful for us to project the pandemic evolution in Quebec in the next 1000 days if no interference measures are taken. The susceptible subpopulation size *S*(*t*) decreases monotonically, suggesting that as time goes by, more people would move to the subpopulation of exposed cases. *E*(*t*), *I**a*(*t*), *I**s*(*t*) and *A*(*t*) have similar shape with a single peak before 250 days. The size *R*(*t*) of removed cases is an increasing function of time *t*, and it becomes fairly flat after 600 days.

### 4.4 Prediction of Confirmed Cases

With the estimated parameters for the SEASAR model using the data from April 2, 2020 to April 30, 2020, we now predict the daily number of cases for the period of May 1, 2020 to May 10, 2020, and also compare them to the actually reported daily cases in this period. For each *i*, the *i*th estimate and prediction are generated as follows:

*   **Step 1:** Let ![Graphic][87]</img> denote the *i*th estimate of parameters, then ![Graphic][88]</img> is the *i*th estimate of the number of cases on April 2, 2020.

*   **Step 2:** Let ![Graphic][89]</img> and ![Graphic][90]</img>, then similar to (11), the *i*th estimate of the sizes of the six subpopulations at the beginning of April 3, 2020, denoted ![Graphic][91]</img>, is given by ![Formula][92]</img>  and thus, ![Graphic][93]</img> is taken as the *i*th estimate of the number of cases on April 3, 2020.

Repeating Step 2 for *τ* + 2, …, *τ* + 28, we obtain the *i*th estimate of the daily number of cases from April 2, 2020 to April 30, 2020. Further repeating Step 2 for *τ* + 29, …, *τ* + 38, we obtain the *i*th prediction of the daily number of cases from May 1, 2020 to May 10, 2020.

To evaluate the differences between the predicted and reported daily number of cases from May 1 to May 10, we calculate the *mean absolute error* (MAE), ![Graphic][94]</img>, and the *relative mean absolute error* (RMAE), ![Graphic][95]</img>, for day *t*, where for *i* = 1, …, 1000, ![Graphic][96]</img> stands for the predicted number of cases on day *t*, and *y**t* is the reported number of cases on day *t*, defined in Section 3.3. The results are reported in Table 3. Furthermore, Figure 5 displays the mean estimates of the cumulative number of cases for the period of April 2, 2020 to April 30, 2020 (in green) and the mean predicted cumulative number of cases for the period of May 1, 2020 to May 10, 2020 (in blue), in comparison to the actually reported cumulative number of cases from April 2, 2020 to May 10, 2020 (in red). The fair agreement of the fitted or predicted values to the reported values suggest the good performance of the proposed SEASAR model.

View this table:
[Table 3:](http://medrxiv.org/content/early/2020/12/11/2020.12.08.20246264/T3)

Table 3: 
Differences between the predicted and reported numbers

![Figure 5:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/12/11/2020.12.08.20246264/F6.medium.gif)

[Figure 5:](http://medrxiv.org/content/early/2020/12/11/2020.12.08.20246264/F6)

Figure 5: 
The model fitting to the cumulative number of cases for each day (in green) in the period of April 2, 2020 to April 30, 2020, and the model prediction to the cumulative number of cases for each day (in blue) in the period of May 1, 2020 to May 10, 2020, as opposed to the reported cumulative number of cases for each day (in red) in the period of April 2, 2020 to May 10, 2020.

### 4.5 Sensitivity Analysis

To further assess the performance of the proposed SEASAR model, we conduct ten sensitivity analyses to evaluate the sensitivity of the results to the specification of the initial values. We take the same setup for Section 4.2 except for changing one value as one of the following settings: (**S1**): The observational error variance (OEV) is taken as: ![Graphic][97]</img>; (**S2**): The latent period is set as *Z* = 4.0 days (Guan et al., 2020); (**S3**): The latent period is taken as *Z* = 5.08 days (He et al., 2020); (**S4**): The latent period is set as *Z* = 6.4 days (Backer et al., 2020); (**S5**): Set *r*3 = 1; (**S6**): Set *r*3 = 3; (**S7**): Set *r*2 = 2; (**S8**): Set *r*2 = 0.5; (**S9**): Set *r*1 = 0.1; (**S10**): Set *r*1 = 3.

The results of sensitivity analyses are reported in Figure 6. While the disparity of the fitted values or predicted values from the reported values varies from setting to setting, overall, those differences seem to be fairly small, suggesting reasonable robustness of the results to the specification of associated values in fairly realistic ranges. Further, we report in Table 4 the estimate and associated 95% CIs of the basic reproduction number *R* obtained from those ten sensitivity analyses. All the estimates are greater than 1, and consequently, the proposed model suggests the ongoing virus spread under a variety of settings we consider.

View this table:
[Table 4:](http://medrxiv.org/content/early/2020/12/11/2020.12.08.20246264/T4)

Table 4: 
*R* and the corresponding 95% CIs for 10 sensitivity analyses

![Figure 6:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/12/11/2020.12.08.20246264/F7.medium.gif)

[Figure 6:](http://medrxiv.org/content/early/2020/12/11/2020.12.08.20246264/F7)

Figure 6: 
The model fitting to the cumulative number of cases for each day (in green) in the period of April 2, 2020 to April 30, 2020, and the model prediction to the cumulative number of cases for each day (in blue) in the period of May 1, 2020 to May 10, 2020, as opposed to the reported cumulative number of cases for each day (in red) in the period of April 2, 2020 to May 10, 2020.

The results of sensitivity analyses are reported in Figure 6. While the disparity of the fitted values or predicted values from the reported values varies from setting to setting, overall, those differences seem to be fairly small, suggesting reasonable robustness of the results to the specification of associated values in fairly realistic ranges. Further, we report in Table 4 the estimate and associated 95% CIs of the basic reproduction number *R* obtained from those ten sensitivity analyses. All the estimates are greater than 1, and consequently, the proposed model suggests the ongoing virus spread under a variety of settings we consider.

The results of sensitivity analyses are reported in Figure 6. While the disparity of the fitted values or predicted values from the reported values varies from setting to setting, overall, those differences seem to be fairly small, suggesting reasonable robustness of the results to the specification of associated values in fairly realistic ranges. Further, we report in Table 4 the estimate and associated 95% CIs of the basic reproduction number *R* obtained from those ten sensitivity analyses. All the estimates are greater than 1, and consequently, the proposed model suggests the ongoing virus spread under a variety of settings we consider.

## 5 Discussion

In this paper, we propose a new epidemic model, called the SEASAR model, to describe the transmission process of COVID-19. Consistent with many available epidemic models, our proposed model require two standard conditions: (1) the population is homogeneous and well-mixed; and (2) the population size remains invariant over time. To facilitate different states of the individuals in the population, we divide the population into six subpopulations (or compartments), respectively, called *susceptible, exposed, asymptomatic, symptomatic, active*, and *removed*. Such a classification allows us to accommodate the unique manifestations of COVID-19 which include asymptomatic infections and varying lag times between symptoms onset and diagnosis. The dynamic changes from compartment to compartment is delineated by ordinary differntial equations together with unknown parameters or transition rates.

To utilize the proposed model, we examine the COVID-19 data in Quebec, Canada, from April 2, 2020 to May 10, 2020. Our analysis is conducted under varying assumptions to reflect different possibilities due to the lack of the precise information. The sensitivity analyses reveal that the pandemic situation in Quebec, Canada, is not under control for the study period.

The proposed SEASAR model extends several existing epidemic models such as SIR and SEIR models. It would be interesting to develop more refined models along the same lines of the current development. For example, one may relax constant model parameters to be time-varying to gain a greater flexibility. Time-varying model parameters may be described by a parametric form or a weakly parametric form (e.g., piecewise constants over different time intervals). While the same principle can be applied to develop a procedure for estimating model parameters, technical details would be more notationally involved, and computation would be more costly.

## Data Availability

The data set used is available publicly.

## Acknowledgements

This research is partially supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) as well as the Rapid Response Program COVID-19 of the Canadian Statistical Sciences Institute (CANSSI). Yi is Canada Research Chair in Data Science (Tier 1). Her research was undertaken, in part, thanks to funding from the Canada Research Chairs Program.

## Appendix A: Expression of the Basic Reproduction Number

Following the principles of Diekmann et al. (2010), we derive the *basic reproduction number R*. The basic idea is to extract the information on the production of new infections from the model, and then calculate the expected number of new infections generated from a case in the population under the constraint *S*(*t*) = *N*.

Since a confirmed case must be quarantined, any individuals in the compartment of being active cannot be infectious. In addition, any individuals in the compartments of being susceptible or removed do not have infectious abilities. Therefore, only (2), (3) and (4) in the SEASAR model involves the information on new infections generated by infected cases who are not confirmed yet, thus not being quarantined. With the assumption that all individuals in the populations are susceptible (i.e., setting *S*(*t*) = *N*), those three equations become ![Formula][98]</img>  ![Formula][99]</img>  ![Formula][100]</img>  which can be equivalently written in a compact form: ![Formula][101]</img>  where *ϕ**sub*(*t*) = (*E*(*t*), *I**a*(*t*), *I**s*(*t*))T and ![Graphic][102]</img>.

Noting that only equation (12) includes the terms, *θI**s*(*t*)+*µθI**a*(*t*), of the production of new infections, we re-write the matrix *G* as *G* = *G*1 + *G*2, where ![Graphic][103]</img> reflects the coefficients in the expression *θI**s*(*t*)+*µθI**a*(*t*), and ![Graphic][104]</img> Although the definition of *R* is conceptually intuitive, its determination is mathematically complicated (Delamater et al., 2019). Diekmann et al. (1990) took *R* as the dominant eigenvalue of the next-generation matrix, which is derived as ![Graphic][105]</img> for the equations in (15) by adapting the arguments of Diekmann et al. (2010). Specifically, noting that ![Graphic][106]</img>, we obtain that ![Graphic][107]</img>, whose dominant eigenvalue is ![Graphic][108]</img>. Therefore, ![Graphic][109]</img>.

## Appendix B: Derivations of the Expressions in Section 3.3

In this appendix, we present the derivations of the expressions used in Section 3.3, and describe the implementation of the IF-EAKF algorithm using the terminology of the “joint state-observation vector” (see Anderson, 2001, p.2886) which basically includes time-dependent variables.

For ease of exposition, we let log *ϕ*(*t*) denote the vector (log *S*(*t*), log *E*(*t*), log *I**a*(*t*), log *I**s*(*t*), log *A*(*t*), log *R*(*t*))T. Let the joint state-observation vector at time *t* be *O*(*t*) = (*η*T, (log *ϕ*(*t*))T, log *Ŷ*(*t*))T. Because *ϕ*(*t*) is regarded as deterministic at *τ*, the beginning of the study, so when *t* = *τ*, *O*(*t*) degenerates as *O*(*τ*) = (*η*T, log *Ŷ*(*τ*))T. Here applying the logarithm to *ϕ*(*t*) and *Ŷ*(*t*) aligns with the transformation involved with model (7).

The EAKF algorithm basically aims to work out the posterior distribution of the joint state-observation vector *O*(*t*) for *t* ≥ *τ*. To reduce computation costs, we adopt the discussion of (Anderson, 2001, p.2888) and consider a simple implementation procedure by examining *O*(*t*) via its paired subvectors separately, where we pair log *Ŷ*(*t*) with each of other elements in *O*(*t*) and calculate the posterior distributions for (*θ*, log *Ŷ*(*t*))T, (*µ*, log *Ŷ*(*t*))T, (*α*, log *Ŷ*(*t*))T, (*β*, log *Ŷ*(*t*))T, (*γ*, log *Ŷ*(*t*))T, (*F*, log *Ŷ*(*t*))T, (*B*, log *Ŷ*(*t*))T, (*J*, log *Ŷ*(*t*))T, (log *S*(*t*), log *Ŷ*(*t*))T, (log *E*(*t*), log *Ŷ*(*t*))T,(log *I**a*(*t*), log *Ŷ*(*t*))T, (log *I**s*(*t*), log *Ŷ*(*t*))T, (log *A*(*t*), log *Ŷ*(*t*))T, and (log *R*(*t*), log *Ŷ*(*t*))T separately, which employs the same procedure in principle.

To show the ideas, we just describe the way of calculating the posterior distribution of (*θ*, log *Ŷ*(*t*))T for *t* ≥ *τ*. Write *Z**t* = (*θ*, log *Ŷ*(*t*))T, and let *H* = (0, 1) so that *HZ**t* = log *Ŷ*(*t*). Rather than attempt to find the analytic expression of the posterior distribution, the EAKF algorithm generates the posterior values of *Z**t* by using the prior values of *Z**t* and the observation *y**t*. To this end, let ![Graphic][110]</img> and ![Graphic][111]</img> denote sequences of prior values of *θ* and *Ŷ*(*t*) at time *t*, respectively.

First, we calculate their sample means, sample covariance, and sample variances, which are, respectively, given by ![Formula][112]</img>  ![Formula][113]</img>  ![Formula][114]</img>  ![Formula][115]</img>  ![Formula][116]</img>  

Then we set the mean of prior values of *Z**t* as ![Formula][117]</img>  and the covariance matrix of prior values of *Z**t* as ![Formula][118]</img>  

The EAKF algorithm assumes that the prior distribution of *Z**t* can be represented or reasonably approximated by a Gaussian distribution with the mean ![Graphic][119]</img> and the covariance matrix ![Graphic][120]</img>. Therefore, the density function of the prior distribution of *Z**t* is expressed as ![Formula][121]</img>  

In Appendix C, we show that the posterior distribution of *Z**t* is Gaussian with the mean ![Formula][122]</img>  and covariance matrix ![Formula][123]</img>  

To describe the generation of the posterior values of *θ* and *Ŷ*(*t*) at time *t*, for *i* = 1, …, *n*, we let ![Graphic][124]</img> denote posterior values of *θ* and log *Ŷ*(*t*) to be generated, and let ![Graphic][125]</img> denote prior values of *θ* and log *Ŷ*(*t*). Then the EAKF algorithm (Anderson, 2001, p.2887) generates *n* posterior values of *Z**t* as follows: ![Formula][126]</img>  where ![Graphic][127]</img> and ![Graphic][128]</img> are given by (21) and (23), respectively, and *D* can be any matrix such that the covariance matrix of ![Graphic][129]</img> computed from (25) equals that computed by (24).

The existence of such a matrix *D* is justified in Appendix A of Anderson (2001). For instance, *D* can be taken as ![Formula][130]</img>  as shown in Appendix D.

Substituting matrix *D* in (25) with (26), we obtain the following explicit expressions for posterior values of *θ* and log *Ŷ*(*t*) at time *t* for *i* = 1, …, *n*: ![Formula][131]</img>  ![Formula][132]</img>  

Setting *t* = *τ*, (27) and (28) yield (9) and (10), respectively.

## Appendix C: Derivations of the Posterior Distribution of *Z**t*

For ease of notation, we write *U* = log *Y* (*t*) and *Z* = *Z**t* for any given *t* ≥ *τ*. Let *H*1 = (1, 0), by the fact that *Z* = (*θ*, log *Ŷ*(*t*))T and *H* = (0, 1), we have *θ* = *H*1*Z* and log *Ŷ*(*t*) = *HZ*.

Now we show that *U* is conditionally independent of *H*1*Z*, given *HZ*. Indeed, for any *u, z*1, *z*2 ∈ ℝ, the conditional cumulative distribution function of *U* and *H*1*Z*, given *HZ* = *z*2, is ![Formula][133]</img>  where the second equality is due to (7), the fourth equality comes from the assumption that *ϵ**t* is independent of *H*1*Z*, and the second last step is because that *P* (*ϵ**t* ≤ *u* − *z*2|*HZ* = *z*2) = *P* (*ϵ**t* ≤ *u* − *HZ*|*HZ* = *z*2) = *P* (*HZ* + *ϵ**t* ≤ *u*|*HZ* = *z*2) = *P* (*U* ≤ *u*|*HZ* = *z*2) by (7). Therefore, we conclude that *U* is independent of *H*1*Z*, given *HZ*.

Next, we find the conditional distribution of *U* given *Z, f**U*|*Z*(*u*|*z*), where *z* = (*z*1, *z*2)T with *z*1, *z*2 ∈ ℝ, and *u* ∈ ℝ. To do this, we use the definition, ![Formula][134]</img>  and first determine the joint distribution of *U* and *Z, f**U,Z*(*u, z*). By the definition of *Z*, we have ![Formula][135]</img>  which equals ![Graphic][136]</img>. Since *U* is conditionally independent of *H*1*Z*, given *HZ*, therefore, we have that ![Formula][137]</img>  where the last equality is due to the definition of *Z* = (*H*1*Z, HZ*)T. Combining (30) and (29) gives that ![Formula][138]</img>  which is determined by (7). Therefore, using the original notation, by (31), we obtain the conditional density function of log *Y* (*t*) given *Z**t*, ![Formula][139]</img>  

Finally, we determine the posterior distribution of *Z*, i.e., the conditional distribution of *Z* given *U*, *f**Z*|*U* (*z*|*u*). Combining (22) and (32) gives the density function of the posterior distribution of *Z**t*: ![Formula][140]</img>  showing that the posterior distribution of *Z**t* is Gaussian with the mean ![Formula][141]</img>  and covariance matrix ![Formula][142]</img>  

Furthermore, we express (33) and (34) in terms of the observations and the prior values in (16), (17), (18), (19), and (20): ![Formula][143]</img>  

## Appendix D: Verification of the Condition Required from EAKF

Substituting the matrix *D* in (25) with (26) gives us ![Formula][144]</img>  yielding that ![Formula][145]</img>  and ![Formula][146]</img>  

Thus, (35) and (36) give us a sequence of posterior values of ![Graphic][147]</img>, and a sequence of posterior values of ![Graphic][148]</img>.

Next, we calculate the variance of the posterior values ![Graphic][149]</img> using (35), ![Formula][150]</img>  which is identical to the (1, 1) element of (24).

Using (36), we calculate the variance of the posterior values ![Graphic][151]</img> of log *Ŷ*(*t*), given by ![Formula][152]</img>  which is identical to the (2, 2) element of (24).

Furthermore, using (35) and (36), we calculate the covariance of ![Graphic][153]</img> and ![Graphic][154]</img>, ![Formula][155]</img>  which is identical to the (1, 2) and (2, 1) elements of the matrix (24). Therefore, we verify that the matrix *D* satisfies the condition required by the EAKF algorithm.

*   Received December 8, 2020.
*   Revision received December 8, 2020.
*   Accepted December 11, 2020.


*   © 2020, Posted by Cold Spring Harbor Laboratory

This pre-print is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), CC BY-NC 4.0, as described at [http://creativecommons.org/licenses/by-nc/4.0/](http://creativecommons.org/licenses/by-nc/4.0/)

## References

1.  Abbey, H. (1952). An examination of the reed-frost theory of epidemics. Human Biology, 24(3):201–233.
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=12990130&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F12%2F11%2F2020.12.08.20246264.atom) 

2.  Anderson, J. L. (2001). An ensemble adjustment kalman filter for data assimilation. Monthly Weather Review, 129(12):2884–2903.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1175/1520-0493(2001)129<2884:AEAKFF>2.0.CO;2&link_type=DOI) 

3.  Backer, J. A., Klinkenberg, D., and Wallinga, J. (2020). Incubation period of 2019 novel coronavirus (2019-nCov) infections among travellers from Wuhan, China, 20-28 January 2020. Eurosurveillance, 25(5):1–6.
    
    
4.  Bai, Y., Yao, L., Wei, T., Tian, F., Jin, D.-Y., Chen, L., and Wang, M. (2020). Presumed asymptomatic carrier transmission of COVID-19. The Journal of the American Medical Association, 323(14):1406–1407.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1001/jama.2020.2565&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F12%2F11%2F2020.12.08.20246264.atom) 

5.  DeFelice, N. B., Little, E., Campbell, S. R., and Shaman, J. (2017). Ensemble forecast of human West Nile virus cases and mosquito infection rates. Nature Communications, 8(1):1–6.
    
    
6.  Delamater, P. L., Street, E. J., Leslie, T. F., Yang, Y. T., and Jacobsen, K. H. (2019). Complexity of the basic reproduction number (R0). Emerging Infectious Diseases, 25(1):1–4.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.3201/eid2501.171901&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30560777&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F12%2F11%2F2020.12.08.20246264.atom) 

7.  Diekmann, O., Heesterbeek, J., and Roberts, M. G. (2010). The construction of next-generation matrices for compartmental epidemic models. Journal of the Royal Society Interface, 7(47):873–885.
    
    
8.  Diekmann, O., Heesterbeek, J. A. P., and Metz, J. A. (1990). On the definition and the computation of the basic reproduction ratio R0 in models for infectious diseases in heterogeneous populations. Journal of Mathematical Biology, 28(4):365–382.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/BF00178324&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=2117040&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F12%2F11%2F2020.12.08.20246264.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1990DG35000001&link_type=ISI) 

9.  Duan, W., Fan, Z., Zhang, P., Guo, G., and Qiu, X. (2015). Mathematical and computational approaches to epidemic modeling: A comprehensive review. Frontiers of Computer Science, 9(5):806–826.
    
    
10. Guan, W., Ni, Z., Hu, Y., Liang, W., Ou, C., He, J., Liu, L., Shan, H., Lei, C., Hui, D. S., et al. (2020). Clinical characteristics of coronavirus disease 2019 in China. New England Journal of Medicine, 382(18):1708–1720.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMoa2002032&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F12%2F11%2F2020.12.08.20246264.atom) 

11. Hao, X., Cheng, S., Wu, D., Wu, T., Lin, X., and Wang, C. (2020). Reconstruction of the full transmission dynamics of COVID-19 in Wuhan. Nature, 584(7821):420–424.
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F12%2F11%2F2020.12.08.20246264.atom) 

12. He, D., Ionides, E. L., and King, A. A. (2010). Plug-and-Play inference for disease dynamics: Measles in large and small populations as a case study. Journal of the Royal Society Interface, 7(43):271–283.
    
    
13. He, W., Yi, G. Y., and Zhu, Y. (2020). Estimation of the basic reproduction number, average incubation time, asymptomatic infection rate, and case fatality rate for COVID-19: Meta-analysis and sensitivity analysis. Journal of Medical Virology, 92(11):2543–2550.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/jmv.26041&link_type=DOI) 

14. Kermack, W. O. and McKendrick, A. G. (1927). A contribution to the mathematical theory of epidemics. Proceedings of the Royal Society of London. Series A, Containing Papers of a Mathematical and Physical Character, 115(772):700–721.
    
    
15. King, A. A., Ionides, E. L., Pascual, M., and Bouma, M. J. (2008). Inapparent infections and cholera dynamics. Nature, 454(7206):877–880.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nature07084&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=18704085&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F12%2F11%2F2020.12.08.20246264.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000258398600034&link_type=ISI) 

16. Kramer, M., Pigott, D., Xu, B., Hill, S., Gutierrez, B., and Pybus, O. (2020). Epidemiological data from the nCov-2019 outbreak: Early descriptions from publicly available data. [https://virological.org/t/epidemiological-data-from-the-ncov-2019-outbreak-early-descri\ptions-from-publicly-available-data/337](https://virological.org/t/epidemiological-data-from-the-ncov-2019-outbreak-early-descri\ptions-from-publicly-available-data/337).
    
    
17. Li, R., Pei, S., Chen, B., Song, Y., Zhang, T., Yang, W., and Shaman, J. (2020). Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV2). Science, 368(6490):489–493.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEyOiIzNjgvNjQ5MC80ODkiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMC8xMi8xMS8yMDIwLjEyLjA4LjIwMjQ2MjY0LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 

18. Ng, J. and Orav, E. J. (1990). A generalized chain binomial model with application to HIV infection. Mathematical Biosciences, 101(1):99–119.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/0025-5564(90)90104-7&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=2134482&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F12%2F11%2F2020.12.08.20246264.atom) 

19. Ng, T. W., Turinici, G., and Danchin, A. (2003). A double epidemic model for the SARS propagation. BMC Infectious Diseases, 3(1):1–16.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/1471-2334-3-1&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=12659657&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F12%2F11%2F2020.12.08.20246264.atom) 

20. Osthus, D., Hickmann, K. S., Caragea, P. C., Higdon, D., and Del Valle, S. Y. (2017). Forecasting seasonal influenza with a state-space SIR model. The annals of applied statistics, 11(1):202–224.
    
    
21. Pei, S., Kandula, S., Yang, W., and Shaman, J. (2018). Forecasting the spatial transmission of influenza in the United States. Proceedings of the National Academy of Sciences, 115(11):2752–2757.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czoxMToiMTE1LzExLzI3NTIiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMC8xMi8xMS8yMDIwLjEyLjA4LjIwMjQ2MjY0LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 

22. Peng, L., Yang, W., Zhang, D., Zhuge, C., and Hong, L. (2020). Epidemic analysis of COVID-19 in China by dynamical modeling. medRxiv.
    
    
23. Reis, J. and Shaman, J. (2016). Retrospective parameter estimation and forecast of respiratory syncytial virus in the United States. PLOS Computational Biology, 12(10):1–15.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pcbi.1005140&link_type=DOI) 

24. Shah, N. H. and Gupta, J. (2013). SEIR model and simulation for vector borne diseases. Applied Mathematics, 4(8A):13–17.
    
    
25. Shaman, J., Yang, W., and Kandula, S. (2014). Inference and forecast of the current West African Ebola outbreak in Guinea, Sierra Leone and Liberia. PLOS Currents, 6.
    
    
26. Süli, E. and Mayers, D. F. (2003). An Introduction to Numerical Analysis. Cambridge University Press.
    
    
27. The Canadian Press (2020). Coronavirus: Here is a timeline of COVID-19 cases in Canada. [https://globalnews.ca/news/6627505/coronavirus-covid-canada-timeline/](https://globalnews.ca/news/6627505/coronavirus-covid-canada-timeline/).
    
    
28. Vogel, L. (2020). COVID-19: A timeline of Canada’s first-wave response. [https://cmajnews.com/2020/06/12/coronavirus-1095847/](https://cmajnews.com/2020/06/12/coronavirus-1095847/).
    
    
29. WHO (2020). Report of the WHO-China joint mission on coronavirus disease 2019 (COVID-19). [https://www.who.int/docs/default-source/coronaviruse/](https://www.who.int/docs/default-source/coronaviruse/) who-china-joint-mission-on-covid-19-final-report.pdf.

 [1]: /embed/graphic-2.gif
 [2]: /embed/graphic-3.gif
 [3]: /embed/graphic-4.gif
 [4]: /embed/graphic-5.gif
 [5]: /embed/graphic-6.gif
 [6]: /embed/graphic-7.gif
 [7]: /embed/graphic-8.gif
 [8]: /embed/inline-graphic-1.gif
 [9]: /embed/inline-graphic-2.gif
 [10]: /embed/inline-graphic-3.gif
 [11]: /embed/inline-graphic-4.gif
 [12]: /embed/inline-graphic-5.gif
 [13]: /embed/inline-graphic-6.gif
 [14]: /embed/inline-graphic-7.gif
 [15]: /embed/inline-graphic-8.gif
 [16]: /embed/inline-graphic-9.gif
 [17]: /embed/inline-graphic-10.gif
 [18]: /embed/inline-graphic-11.gif
 [19]: /embed/inline-graphic-12.gif
 [20]: /embed/inline-graphic-13.gif
 [21]: /embed/inline-graphic-14.gif
 [22]: /embed/inline-graphic-15.gif
 [23]: /embed/inline-graphic-16.gif
 [24]: /embed/inline-graphic-17.gif
 [25]: /embed/inline-graphic-18.gif
 [26]: /embed/inline-graphic-19.gif
 [27]: /embed/inline-graphic-20.gif
 [28]: /embed/inline-graphic-21.gif
 [29]: /embed/graphic-10.gif
 [30]: /embed/inline-graphic-22.gif
 [31]: /embed/inline-graphic-23.gif
 [32]: /embed/inline-graphic-24.gif
 [33]: /embed/inline-graphic-25.gif
 [34]: /embed/graphic-12.gif
 [35]: /embed/inline-graphic-26.gif
 [36]: /embed/inline-graphic-27.gif
 [37]: /embed/inline-graphic-28.gif
 [38]: /embed/graphic-13.gif
 [39]: /embed/inline-graphic-29.gif
 [40]: /embed/inline-graphic-30.gif
 [41]: /embed/inline-graphic-31.gif
 [42]: /embed/inline-graphic-32.gif
 [43]: /embed/graphic-14.gif
 [44]: /embed/graphic-15.gif
 [45]: /embed/inline-graphic-33.gif
 [46]: /embed/inline-graphic-34.gif
 [47]: /embed/graphic-16.gif
 [48]: /embed/inline-graphic-35.gif
 [49]: /embed/inline-graphic-36.gif
 [50]: /embed/inline-graphic-37.gif
 [51]: /embed/inline-graphic-38.gif
 [52]: /embed/inline-graphic-39.gif
 [53]: /embed/graphic-17.gif
 [54]: /embed/inline-graphic-40.gif
 [55]: /embed/inline-graphic-41.gif
 [56]: /embed/inline-graphic-42.gif
 [57]: /embed/inline-graphic-43.gif
 [58]: /embed/inline-graphic-44.gif
 [59]: /embed/inline-graphic-45.gif
 [60]: /embed/inline-graphic-46.gif
 [61]: /embed/graphic-18.gif
 [62]: /embed/inline-graphic-47.gif
 [63]: /embed/inline-graphic-48.gif
 [64]: /embed/graphic-19.gif
 [65]: /embed/graphic-20.gif
 [66]: /embed/inline-graphic-49.gif
 [67]: /embed/inline-graphic-50.gif
 [68]: /embed/inline-graphic-51.gif
 [69]: /embed/inline-graphic-52.gif
 [70]: /embed/inline-graphic-53.gif
 [71]: /embed/graphic-21.gif
 [72]: /embed/inline-graphic-54.gif
 [73]: /embed/inline-graphic-55.gif
 [74]: /embed/inline-graphic-56.gif
 [75]: /embed/graphic-22.gif
 [76]: /embed/inline-graphic-57.gif
 [77]: /embed/graphic-23.gif
 [78]: /embed/inline-graphic-58.gif
 [79]: /embed/inline-graphic-59.gif
 [80]: /embed/inline-graphic-60.gif
 [81]: /embed/inline-graphic-61.gif
 [82]: /embed/inline-graphic-62.gif
 [83]: /embed/graphic-28.gif
 [84]: /embed/inline-graphic-63.gif
 [85]: /embed/inline-graphic-64.gif
 [86]: /embed/inline-graphic-65.gif
 [87]: /embed/inline-graphic-66.gif
 [88]: /embed/inline-graphic-67.gif
 [89]: /embed/inline-graphic-68.gif
 [90]: /embed/inline-graphic-69.gif
 [91]: /embed/inline-graphic-70.gif
 [92]: /embed/graphic-29.gif
 [93]: /embed/inline-graphic-71.gif
 [94]: /embed/inline-graphic-72.gif
 [95]: /embed/inline-graphic-73.gif
 [96]: /embed/inline-graphic-74.gif
 [97]: /embed/inline-graphic-75.gif
 [98]: /embed/graphic-34.gif
 [99]: /embed/graphic-35.gif
 [100]: /embed/graphic-36.gif
 [101]: /embed/graphic-37.gif
 [102]: /embed/inline-graphic-76.gif
 [103]: /embed/inline-graphic-77.gif
 [104]: /embed/inline-graphic-78.gif
 [105]: /embed/inline-graphic-79.gif
 [106]: /embed/inline-graphic-80.gif
 [107]: /embed/inline-graphic-81.gif
 [108]: /embed/inline-graphic-82.gif
 [109]: /embed/inline-graphic-83.gif
 [110]: /embed/inline-graphic-84.gif
 [111]: /embed/inline-graphic-85.gif
 [112]: /embed/graphic-38.gif
 [113]: /embed/graphic-39.gif
 [114]: /embed/graphic-40.gif
 [115]: /embed/graphic-41.gif
 [116]: /embed/graphic-42.gif
 [117]: /embed/graphic-43.gif
 [118]: /embed/graphic-44.gif
 [119]: /embed/inline-graphic-86.gif
 [120]: /embed/inline-graphic-87.gif
 [121]: /embed/graphic-45.gif
 [122]: /embed/graphic-46.gif
 [123]: /embed/graphic-47.gif
 [124]: /embed/inline-graphic-88.gif
 [125]: /embed/inline-graphic-89.gif
 [126]: /embed/graphic-48.gif
 [127]: /embed/inline-graphic-90.gif
 [128]: /embed/inline-graphic-91.gif
 [129]: /embed/inline-graphic-92.gif
 [130]: /embed/graphic-49.gif
 [131]: /embed/graphic-50.gif
 [132]: /embed/graphic-51.gif
 [133]: /embed/graphic-52.gif
 [134]: /embed/graphic-53.gif
 [135]: /embed/graphic-54.gif
 [136]: /embed/inline-graphic-93.gif
 [137]: /embed/graphic-55.gif
 [138]: /embed/graphic-56.gif
 [139]: /embed/graphic-57.gif
 [140]: /embed/graphic-58.gif
 [141]: /embed/graphic-59.gif
 [142]: /embed/graphic-60.gif
 [143]: /embed/graphic-61.gif
 [144]: /embed/graphic-62.gif
 [145]: /embed/graphic-63.gif
 [146]: /embed/graphic-64.gif
 [147]: /embed/inline-graphic-94.gif
 [148]: /embed/inline-graphic-95.gif
 [149]: /embed/inline-graphic-96.gif
 [150]: /embed/graphic-65.gif
 [151]: /embed/inline-graphic-97.gif
 [152]: /embed/graphic-66.gif
 [153]: /embed/inline-graphic-98.gif
 [154]: /embed/inline-graphic-99.gif
 [155]: /embed/graphic-67.gif