Estimation of COVID-19 spread curves integrating global data and borrowing information
======================================================================================

* Se Yoon Lee
* Bowen Lei
* Bani K. Mallick

## Abstract

Currently, novel coronavirus disease 2019 (COVID-19) is a big threat to global health. The rapid spread of the virus has created pandemic, and countries all over the world are struggling with a surge in COVID-19 infected cases. There are no drugs or other therapeutics approved by the US Food and Drug Administration to prevent or treat COVID-19: information on the disease is very limited and scattered even if it exists. This motivates the use of data integration, combining data from diverse sources and eliciting useful information with a unified view of them. In this paper, we propose a Bayesian hierarchical model that integrates global data for real-time prediction of infection trajectory for multiple countries. Because the proposed model takes advantage of borrowing information across multiple countries, it outperforms an existing individual country-based model. As fully Bayesian way has been adopted, the model provides a powerful predictive tool endowed with uncertainty quantification. Additionally, the proposed model uses countrywide covariates to adjust infection trajectories in curve fitting. A joint variable selection technique has been integrated into the proposed modeling scheme, which aimed to identify possible country-level risk factors for severe disease due to COVID-19.

Keywords
*   Novel Coronavirus
*   COVID-19
*   Infection Trajectories
*   Data Integration

## 1 Introduction

Since Thursday, March 26, 2020, the US leads the world in terms of the cumulative number of infected cases for a novel coronavirus, COVID-19. On this day, a dashboard provided by the Center for Systems Science and Engineering (CSSE) at the Johns Hopkins University ([https://systems.jhu.edu/-](https://systems.jhu.edu/)) (Dong et al., 2020) reported that the numbers of the confirmed, death, and recovered from the virus in the US are 83,836, 1,209, and 681, respectively. Figure 1 displays daily infection trajectories describing the cumulative numbers of infected cases for eight countries (US, Russia, UK, Brazil, Germany, China, India, and South Korea), spanning from January 22nd to May 14th, which accounts for 114 days. The dotted vertical lines on the panel mark certain historical dates that will be explained. As seen from the panel, the US has been a late-runner until March 11th in terms of the infected cases, but the growth rate of the cases had suddenly skyrocketed since the day, and eventually excelled the forerunner, China, just in two weeks, on March 26th. Figure 2 shows the cumulative infected cases for 40 countries on May 14th: on the day, the number of cumulative infected cases for the US was 1,417,774 which is more than five times of that of Russia, 252,245.

![Figure 1:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/05/19/2020.04.23.20077065/F1.medium.gif)

[Figure 1:](http://medrxiv.org/content/early/2020/05/19/2020.04.23.20077065/F1)

Figure 1: 
Daily trajectories for cumulative numbers of COVID-19 infections for eight countries (US, Russia, UK, Brazil, Germany, China, India, and South Korea) from January 22nd to May 14th. (Data source: Johns Hopkins University CSSE)

Since the COVID-19 outbreak, there have been numerous research works to better understand the pandemic in different aspects (Gao et al., 2020; Jia et al., 2020; Liu et al., 2020; Peng et al., 2020; Qiang Li, 2020; Remuzzi and Remuzzi, 2020; Sheng Zhang, 2020; Yang et al., 2020). Some of the recent works from statistics community are as follows. Sheng Zhang (2020) focused on a serial interval (the time between successive cases in a chain of transmissions) and used the gamma distribution to study the transmission on Diamond Princess cruise ship. Peng et al. (2020) proposed the generalized susceptible exposed infectious removed model to predict the inflection point for the growth curve, while Yang et al. (2020) modified the proposed model and considered the public health interventions in predicting the trend of COVID-19 in China. Liu et al. (2020) proposed a differential equation prediction model to identify the influence of public policies on the number of patients. Qiang Li (2020) used a symmetrical function and a long tail asymmetric function to analyze the daily infections and deaths in Hubei and other places in China. Remuzzi and Remuzzi (2020) used an exponential model to study the number of infected patients and patients who need intensive care in Italy. One of the major limitations of these works is that the researches are confined by analyzing data from a single country, thereby neglecting the global nature of the pandemic.

One of the major challenges in estimating or predicting an infection trajectory is the heterogeneity of the country populations. It is known that there are four stages of a pandemic: visit [economictimes.indiatimes.com/-](https://economictimes.indiatimes.com/news/politics-and-nation/coronavirus-the-four-stages-of-a-global-pandemic/pandemic/slideshow/74884293.cms). The first stage of the pandemic contains data from people with travel history to an already affected country. In stage two, we start to see data from local transmission, people who have brought the virus into the country transmit it to other people. In the third stage, the source of the infection is untraceable. In stage four the spread is practically uncontrollable. In most of the current literature, estimation or prediction of the infection trajectory is based on a single country data where the status of the country falls into one of these four stages. Hence, such estimation or prediction may fail to capture some crucial changes in the shape of the infection trajectory due to a lack of knowledge about the other stages. This motivates the use of data integration (Huttenhower and Troyanskaya, 2006; Lenzerini, 2002) which combines data from different countries and elicits a solution with a unified view of them. This will be particularly useful in the current context of the COVID-19 outbreak.

![Figure 2:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/05/19/2020.04.23.20077065/F2.medium.gif)

[Figure 2:](http://medrxiv.org/content/early/2020/05/19/2020.04.23.20077065/F2)

Figure 2: 
Cumulative numbers of infected cases for 40 countries on May 14th. (*x*-axis are scaled with squared root for visualization purpose.) The red dashed vertical lines represents 1,417,774 cases.

Recently, there are serious discussions all over the world to answer the crucial question: “even though the current pandemic takes place globally due to the same virus, why infection trajectories of different countries are so diverse?” For example, as seen from Figure 1, the US, Italy, and Spain have accumulated infected cases within a short period of time, while China took a much longer time since the onset of the COVID-19 pandemic, leading to different shapes of infection trajectories. It will be interesting to find a common structure in these infection trajectories for multiple countries, and to see how these trajectories are changing around this common structure. Finally, it is significant to identify the major countrywide covariates which make infection trajectories of the countries behave differently in terms of the spread of the disease.

## 2 Significance

The rapid spread of coronavirus has created pandemic, and countries all over the world are struggling with a surge in COVID-19 infected cases. Scientists are working on estimating the infection trajectory for future prediction of cases, which will be useful for future planning and policymaking. We propose a hierarchical model that integrates worldwide data to estimate COVID-19 infection trajectories. Due to information borrowing across multiple countries, the proposed growth curve model will be a powerful predictive tool endowed with uncertainty quantification. Additionally, we use countrywide covariates to adjust curve fitting for the infection trajectory. A joint variable selection technique has been integrated into the modeling scheme, which will identify the possible reasons for diversity among the country-specific infection curves.

## 3 Our Contribution

There are three major classes of infectious disease prediction models: (i) differential equation models, (ii) time series models, and (iii) the statistical models. The differential equation models describe the dynamic behavior of the disease through differential equations allowing the laws of transmission within the population. The popular models include the SI, SIS, SIR, and SEIR models (Hethcote, 2000; Korobeinikov, 2004; Tiberiu Harko, 2014). These models are based on assumptions related to S (susceptible), E (exposed), I (infected), and R (remove) categories of the population. Time series based prediction models such as ARIMA, Grey Model, Markov Chain models have been used to describe dependence structure over of the disease spread over time (Hu et al., 2006; Reza Yaesoubi, 2011; Rushton et al., 2006; Shen X, 2013; Zhirui He, 2018). On the other hand, statistical models, so-called phenomenological models, which follow certain laws of epidemiology (Clayton and Hills, 2013; Thompson et al., 2006) are widely used in real-time forecasting for infection trajectory or size of epidemics in early stages of pandemic (Fineberg and Wilson, 2009; Hsieh, 2009; Pell et al., 2018). Statistical models can be easily extended to the framework of hierarchical models (multilevel models) to analyze data within a nested hierarchy, eventually harnessing the data integration (Browne et al., 2006; Hill, 1965; Stone and Springer, 1965; Tiao and Tan, 1965). In this paper, we use Bayesian hierarchical models so that data integration and uncertainty analysis (Malinverno and Briggs, 2004) are possible in a unified way.

Specifically, we use the Richards growth curve model (Richards, 1959). The novelties of our method are as follows: we (i) use a flexible hierarchical growth curve model to global COVID-19 data, (ii) integrate information from multiple countries for estimation and prediction purposes, (iii) adjust for country-specific covariates, and (iv) perform covariate selection to identify the important reasons to explain the differences among the country-wise infection trajectories. We demonstrate that our proposed models perform better than an individual country-based model.

### 3.1 Richards growth curve models

Richards growth curve model (Richards, 1959), so-called the generalized logistic curve (Nelder, 1962), is a growth curve model for population studies in situations where growth is not symmetrical about the point of inflection (Anton and Herr, 1988; Seber and Wild, 2003). The curve was widely used to describe various biological processes (Werker and Jaggard, 1997), but recently adapted in epidemiology for real-time prediction of outbreak of diseases; examples include SARS (Hsieh, 2009; Hsieh et al., 2004), dengue fever (Hsieh and Chen, 2009; Hsieh and Ma, 2009), pandemic influenza H1N1 (Hsieh, 2010), and COVID-19 outbreak (Wu et al., 2020).

There are variant reparamerized forms of the Richards curve in the literature (Birch, 1999; Cao et al., 2019; Causton, 1969; Kahm et al., 2010), and we shall use the following form in this research ![Formula][1]</img>  where *θ*1, *θ*2, and *θ*3 are real numbers, and *ξ* is a positive real number. The utility of the Richards curve (1) is its ability to describe a variety of growing processes, endowed with strong flexibility due to the shape parameter *ξ* (Birch, 1999): analytically, the Richards curve (1) (i) becomes the logistic growth curve (Tsoularis and Wallace, 2002) when *ξ* = 1, and (ii) converges to Gompertz growth curve (Gompertz, 1825) as the *ξ* converges to zero from positive side of real numbers. (Gompertz curve is *g*(*t*; *θ*1, *θ*2, *θ*3) = *θ*1 · exp [− exp {−*θ*2 · (*t* − *θ*3)}].) But it is also known that estimation of *ξ* is a complicated problem (Wang et al., 2016), and we resort to a modern sampling scheme, elliptical slice sampler (Murray et al., 2010), to estimate the *ξ*. (See SI Appendix for more detail.)

Figure 3 illustrates roles of the four parameters of the Richards curve (1). The curves on left panel is obtained when (*θ*1, *θ*2, *θ*3) = (10000,0.2,40), while varying the *ξ* to be 1 × 10−13(≈ 0), 0.5, and 1, respectively. The right panel pictorially describes the roles of (*θ*1, *θ*2, *θ*3): *θ*1 represents the asymptote of the curve; *θ*2 is related to a growth rate (analytically, the derivative of logarithm of the curve (1) at *t* = *θ*3 is *θ*2/2.); and *θ*3 sets the displacement along the x-axis. (For more technical detail for the parameters, refer to (Birch, 1999).)

![Figure 3:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/05/19/2020.04.23.20077065/F3.medium.gif)

[Figure 3:](http://medrxiv.org/content/early/2020/05/19/2020.04.23.20077065/F3)

Figure 3: 
Description of the Richards growth curve model. The curve is obtained when *(θ***1***, θ*2*, θ*3*)* = (10000, 0.2,40). The left panel is obtained by changing the ξ to be 1 × 10−13, 0.5, and 1, respectively. The right panel describes the roles the three parameters in epidemiological modeling: *θ*1 represents final epidemic size; *θ*2 is an infection rate; and *θ*3 is a lag phase.

In epidemiological modeling, the Richards curve (1) can be used as a parametric curve describing infection trajectories shown in the Figure 1. In this context, each of the parameters can be interpreted as follows: *θ***1** represents the final epidemic size (that is, the maximum cumulative number of infected cases across the times); *θ*2 represents infection rate; and *θ***3** represents a lag phase of the trajectory. (The shape parameter *ξ* seems to have no clear epidemiological meaning (Wang et al., 2012).) We shall revisit more detailed interpretations of the parameter in Subsection 4.5.

## 4 Results

### 4.1 Benefits from the information borrowing

We investigate the predictive performance of three Bayesian models based on the Richards growth curve. We start with the individual country-based model (here we use only the single country data) which has been widely used in the literature (![Graphic][2]</img>). Next, we extend the previous model to a hierarchical model by utilizing the infection trajectories of all the 40 countries (![Graphic][3]</img>). A limitation of ![Graphic][4]</img> is that it lacks certain countrywide adjustments in estimating the trajectories where the borrowing information takes place uniformly across all the countries although those countries are heterogeneous in terms of aspects like socioeconomic, health environment, etc.. Next, we further upgrade this model by adding country-specific covariates in a hierarchical fashion (![Graphic][5]</img>). (For technical description for the three models, see the Subsection 6.3.) Eventually, borrowing information across the 40 countries takes place in these two hierarchical models, ![Graphic][6]</img> and ![Graphic][7]</img>, but not in the individual country-based model ![Graphic][8]</img>.

For evaluation criteria, we calculate the mean squared error (MSE) (Wasserman, 2013) associated with the extrapolated infection trajectory for each of the 40 countries. Training and test data are designated as follows: given that ***y****k* = (*yk*,1,…, *yk,T*)┬ is an infection trajectory of the *k*-th country spanning for *T* days since January 22nd, and *d* is the chosen test-day, then (i) the training data is set by the trajectory spanning for *T* − *d* days since January 22nd (that is, (*yk*,1,…, *yk,T−d*)), and (ii) the test data is set by the d recent observations (that is, (*yk,T−d*+1,…,*yk,T*)).

For the two hierarchical models ![Graphic][9]</img> and ![Graphic][10]</img>, the MSE is averaged over the 40 countries: ![Formula][11]</img>  where *yk,r* is the actual value for the cumulative confirmed cases of the *k*-th country at the *r*-th time point, and ![Graphic][12]</img> is the forecast value: more concretely, ![Graphic][13]</img> is the posterior predictive mean given the information from 40 countries. For the non-hierarchical model ![Graphic][14]</img>, the ![Graphic][15]</img> in the MSE*d* is acquired by using the data from the *k*-th country.

For each of the short-term test-days (*d* = 5,6,7,8,9,10) and long-term test-days (*d* = 20, 22, 24, 26, 28), we report the median of the MSE*d*’s from 20 replicates. The results are shown in Figure 4. From the panel, we see that (1) the predictive performances of two hierarchical models, ![Graphic][16]</img> and ![Graphic][17]</img>, are universally better than that of ![Graphic][18]</img> across the number of test-days; and (2) the performance of ![Graphic][19]</img> is marginally better than ![Graphic][20]</img>. Based on the outcomes, we shall conclude that information borrowing has improved the predictive accuracy in terms of MSE. We present all the results in the consequent subsections based on the model ![Graphic][21]</img>. A similar result is found in the *Clemente problem* from (Efron, 2010) where the James-Stein estimator (James and Stein, 1992) better predicts then an individual hitter-based estimator in terms of the total squared prediction error.

![Figure 4:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/05/19/2020.04.23.20077065/F4.medium.gif)

[Figure 4:](http://medrxiv.org/content/early/2020/05/19/2020.04.23.20077065/F4)

Figure 4: 
Comparison of the MSE obtained by the three models, ![Graphic][22]</img>, ![Graphic][23]</img>, and ![Graphic][24]</img>, averaged over the 40 countries: short-term (left) and long-term predictions (right). A smaller value for the MSE indicates a better predictive performance.

### 4.2 COVID-19 travel recommendations by country

Centers for Disease Control and Prevention (CDC) categorizes countries into three levels by assessing the risk of COVID-19 transmission, used in travel recommendations by country (Visit [www.cdc.gov/-](http://www.cdc.gov/)): Level 1, Level 2, and Level 3 indicate the Watch Level (Practice Usual Precautions), Alert Level (Practice Enhanced Precautions), and Warning Level (Avoid Nonessential Travel), respectively.

We categorize the 40 countries into the three levels according to their posterior means for the final epidemic size (that is, *θ*1 of the Richards curve (1)). Grouping criteria are as follows: (1) Level 1 (estimated total number is no more than 10,000 cases); (2) Level 2 (estimated total number is between 10,000 and 100,000 cases); and (3) Level 3 (estimated total number is more than 100,000 cases).

Figure 5 displays results of posterior inference for the *θ*1 by country, based on the model ![Graphic][25]</img>. Countries on the *y*-axis are ordered from the severest country (US) to the least severe country (Malaysia) in the magnitude of the posterior means. The red horizontal bars on the panel represent the 95% credible intervals, describing uncertainty for the estimation. Base on the results, there are 14 countries categorized as Level 3 (US, Russia, Brazil, Pakistan, UK, Spain, Italy, India, France, Germany, Peru, Iran, Chile, and Canada). There are 21 countries categorized as Level 2 (from Saudi Arabia to South Korea), and 5 countries categorized as Level 1 (from Czechia to Malaysia).

![Figure 5:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/05/19/2020.04.23.20077065/F5.medium.gif)

[Figure 5:](http://medrxiv.org/content/early/2020/05/19/2020.04.23.20077065/F5)

Figure 5: 
Estimation results for the final epidemic size for 40 countries. Grey dots (•) represent the cumulative numbers of infected cases for 40 countries on May 14th; red dots (•) and horizontal bars (−) represent the posterior means and 95% credible intervals for the *θ*1 of the 40 countries. Vertical red dotted line indicates the 1, 760, 569 cases, the posterior mean for the US.

![Figure 6:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/05/19/2020.04.23.20077065/F6.medium.gif)

[Figure 6:](http://medrxiv.org/content/early/2020/05/19/2020.04.23.20077065/F6)

Figure 6: 
Illustration of flat time point. The exemplary infection trajectory is obtained by the Richards curve when (*θ*1, *θ*2 *θ*3, *ξ*) = (10000, 0.2, 40, 0.5). A flat time point *t*flat,*ϵ* is approximately 63 (vertical red dashed line). The vertical difference between the *θ*1 and the function value of Richards curve evaluated at *t*flat,ϵ is *ϵ* = 100 (cases).

### 4.3 Extrapolated infection trajectories and flat time points

Figure 7 displays the extrapolated infection trajectory (posterior mean for the Richards curve (1)) for the US. The posterior mean of the final epidemic size is 1,760,569 cases. The scenario that ‘millions’ of Americans could be infected was also warned by a leading expert in infectious diseases (Visit a related news article [www.bbc.com/-](http://www.bbc.com/)). It is known that prediction of an epidemic trend from limited data during early stages of the epidemic is often futile and misleading (Hsieh et al., 2004). Nevertheless, estimation of a possible severity havocked by the COVID-19 outbreak is an important task when considering the seriousness of the current pandemic situation.

A crucial question is when this trajectory gets flattened. To that end, we approximate a time point where an infection trajectory levels off its value, showing a flattening pattern after the time point. The following is the definition of the *flat time point* which we use in this paper:

#### Definition 4.1

Given the Richards curve *f*(t; *θ*1, *θ*2, *θ*3, *ξ*) (1) and some small *ϵ* > 0, the *flat time point t*flat,ϵ is defined as the solution of the equation *θ*1 − *ϵ* = *f*(*t*; *θ*1, *θ*2, *θ*3, *ξ*):

![Formula][26]</img> ![Figure 7:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/05/19/2020.04.23.20077065/F7.medium.gif)

[Figure 7:](http://medrxiv.org/content/early/2020/05/19/2020.04.23.20077065/F7)

Figure 7: 
Extrapolated infection trajectory for the US based on the model ![Graphic][27]</img>. Posterior mean of the maximum number of cumulative infected cases is 1,474,526 cases. Posterior means for the flat time points are *t*flat,*ϵ*=100,000=May 26th, *t*flat,*ϵ*=10,000=July 4th, *t*flat,*ϵ*=1,000=August 11th, and *t*flat,*ϵ*=100=September 18th.

Specifically speaking, the flat time point *t*flat,*ϵ* is the time point whereat only *ϵ* number of infected cases can maximally take place to reach the final epidemic size *θ*1, after the time point *t*flat,*ϵ*. Figure 6 depicts an exemplary infection trajectory obtained by the Richards curve (1) with (*θ*1, *θ*2, *θ*3, *ξ*) = (10000, 0.2, 40, 0.5). In this case, a flat time point *t*flat,*ϵ* is approximately 63 when *ϵ* = 100. The choice of *ϵ* > 0 depends on the situation of a country considered: for China which already shows flattening phase (refer to Figure 1) in the infection trajectory, *ϵ* = 1 (case) can be safely used, but for US one may use *ϵ* = 1,000 (cases) or larger numbers.

For the US, the posterior means of the flat time points *t*flat,*ϵ* are June 11th, July 27th, September 10th, and October 26th when corresponding *ϵ*’s are chosen by 100,000, 10,000, 1,000, and 100, respectively. It is important to emphasize that the extrapolated infection trajectory is *real-time prediction* of COVID-19 outbreaks (Fineberg and Wilson, 2009; Wang et al., 2012) based on observations tracked until May 14th. Certainly, incorporation of new information such as compliance with social distancing or advances in medical and biological sciences for this disease will change the inference outcomes.

Figure 8 show the extrapolated infection trajectories for Russia, UK, and Brazil. Posterior means of the final epidemic size are as follows: (1) for the Russia, 648,190 cases; (2) for the UK, 303,715 cases; and (3) for the Brazil, 503,271 cases. Extrapolated infection trajectory for the Russia (top), UK (middle), and Brazil (bottom). Flat time points are estimated by: (1) for the Russia, *t*flat,*ϵ*=1000,000=June 18th, *t*flat,*ϵ*=10,000=August 2nd, *t*flat,*ϵ*=1,000=September 15th, and *t*flat,*ϵ*=100=October 29th; (2) for the UK, *t*flat,*ϵ*=10,000=June 25th, *t*flat,*ϵ*=1,000=August 10th, and *t*flat,*ϵ*=100=September 25th; and (3) for the Brazil, *t*flat,*ϵ*=100,000=June 6th, *t*flat,*ϵ*=10,000=July 7th, *t*flat,*ϵ*=1,000=August 6th, and *t*flat,*ϵ*=100=September 4th. Results for other countries are included in the SI Appendix.

![Figure 8:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/05/19/2020.04.23.20077065/F8.medium.gif)

[Figure 8:](http://medrxiv.org/content/early/2020/05/19/2020.04.23.20077065/F8)

Figure 8: 
Extrapolated infection trajectory for the Russia (top), UK (middle), and Brazil (bottom). Flat time points are estimated by: (1) for the Russia, *t*flat,*ϵ*=100,000=June 18th, *t*flat,*ϵ=*10,000=August 2nd, *t*flat,*ϵ=*1,000=September 15th, and *t*flat,*ϵ=*100=October 29th; (2) for the UK, *t*flat,*ϵ=*10,000=June 25th, *t*flat,*ϵ=*1,000=August 10th, and *t*flat,*ϵ=*100=September 25th; and (3) for the Brazil, *t*flat,*ϵ=*100,000=June 6th, *t*flat,*ϵ=*10,000=July 7th, *t*flat,*ϵ=*1,000=August 6th, and *t*flat,*ϵ=*100=September 4th.

### 4.4 Global trend for the COVID-19 outbreak

Figure 9 displays the extrapolated infection trajectory for grand average over 40 countries obtained from the model ![Graphic][28]</img>. Technically, this curve is acquired by extrapolating the Richards curve by using the intercept terms in linear regressions (3). The grey dots on the panel are historical infection trajectories for 40 countries. Posterior means for the final epidemic size is 145,497 cases. Posterior means for the at time points are *t*flat,*ϵ*=1,000=June 12th and *t*flat,*ϵ*=100=July 9th.

![Figure 9:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/05/19/2020.04.23.20077065/F9.medium.gif)

[Figure 9:](http://medrxiv.org/content/early/2020/05/19/2020.04.23.20077065/F9)

Figure 9: 
Extrapolated infection trajectory for grand average over 40 countries obtained from the model ![Graphic][29]</img>. Grey dots are historical infection trajectories for 40 countries spanning from January 22nd to May 14th. Posterior means for the at time points are *t*flat,*ϵ*=1,000=June 12th and *t*flat,*ϵ*=100=July 9th.

### 4.5 Identifying risk factors for severe disease due to COVID-19

COVID-19 is a new disease and there is very limited information regarding risk factors for this severe disease. There is no vaccine aimed to prevent the transmission of the disease because there is no specific antiviral agent is available. It is very important to find risk factors relevant to the disease. Reliable and early risk assessment of a developing infectious disease outbreak allow for policymakers to make swift and well-informed decisions that would be needed to ensure epidemic control.

CDC described High-Risk Conditions based on currently available information and clinical expertise (For more detail, visit [www.cdc.gov/-](http://www.cdc.gov/)): those at higher risk for infection, severe illness, and poorer outcomes from COVID-19 include

*   People 65 years and older;

*   People who live in a nursing home or long-term care facility;

*   People with chronic lung disease or moderate to severe asthma;

*   People who are immunocompromised, possibly caused by cancer treatment, smoking, bone marrow or organ transplantation, immune deficiencies, poorly controlled HIV or AIDS, and prolonged use of corticosteroids and other immune weakening medications;

*   People with severe obesity (body mass index of 40 or higher);

*   People with diabetes;

*   People with chronic kidney disease undergoing dialysis;

*   People with liver disease.

The model ![Graphic][30]</img> involves three separated linear regressions whose response and coefficient vector are given by *θl* and *βl*, respectively (*l* = 1, 2, 3). (See the equation (3)) The sparse horseshoe prior (Carvalho et al., 2009, 2010) is imposed for each of the coefficient vectors which makes the model equipped with covariates analysis. That way, we can identify key predictors explaining the heterogeneity of shapes existing in infection trajectories across 40 countries, and this can be further used in finding risk factors for severe disease due to COVID-19. The results are in table 1 1.

View this table:
[Table 1:](http://medrxiv.org/content/early/2020/05/19/2020.04.23.20077065/T1)

Table 1: 
Important predictors explaining *θl*, *l* = 1, 2, 3

The followings are general guidelines on how covariates on the Table 1 can be interpreted in the current context of pandemic.

*   The parameter *θ*1 represents *final epidemic size*. A larger number of *θ*1 indicates that a country has (can have) more COVID-19 infected patients in the country. A covariate with plus sign (+) (or minus sign (−)) is a factor associated with an increase (or decrease) of the total infected cases.

*   The parameter *θ*2 represents *infection rate*. A larger number of *θ*2 implies a faster spread of the virus around the country. A covariate with plus sign (+) (or minus sign (−)) is a factor associated with a rapid (or slow) spread of the virus.

*   The parameter *θ*3 represents *lag phase* of the infection trajectory. The larger the value of *θ*3 the later the trajectory begins to accumulate infected cases, leading to a later onset of the accumulation. A covariate with plus sign (+) (or minus sign (−)) is a factor associated with delaying (or bring forward) the onset of the accumulation.

Now, based on the aforementioned guideline, we shall interpret the Table 1 in detail. (The reasoning reflects our subjectivity, and disease expert should decipher precisely.)

As for the parameter *θ*1, insufficient physical activity has been selected as one of the important risk factors which may increase the final epidemic size of a country: this implies that certain government policies such as social distancing or remote work system can help decrease the final epidemic size. Additionally, intense immunization coverage on measles, Polio, and Haemophilus Influenzae type B can reduce the final epidemic size. Poor general health status of a population (Jennifer Beam Dowd, 2020) such as overweight and alcohol addiction can increase the epidemic size. (visit related news article [www.cidrap.umn.edu/-](http://www.cidrap.umn.edu/).) Certain testing information is also associated with the epidemic size, which can be further researched in retrospective studies in swift policymaking for a future pandemic. Finally, the average temperature is negatively related to the epidemic size. (See a WHO report for the relationship between climate change and infectious diseases [www.who.int/-](http://www.who.int/).)

Turning to the parameter *θ*2, a rigorous fulfillment of general obligations at point of entry is chosen as one of the significant predictors in reducing the infection rate. Additionally, poor smoking and alcoholic behaviors of a country population are risk factors that may increase the infection rate. Demographically, it has been found that densely populated countries or countries where life expectancy is relatively high are more venerable to the rapid disease transmission among people. Among national environmental status, poor air condition which may negatively influence people’s respiratory system is found to be a risk factor increasing the infection rate.

Finally, moving to the parameter *θ*3, geological distance from China is an important covariate delaying the onset of the infected cases. The lag of onset is also graphically observed from the Figure 1: time point whereat South Korea begins to accumulate the infected cases is relatively earlier than those of the US, UK, etc. Similar to *θ*2, heavier alcohol drinking and tobacco use may result in an earlier onset of the accumulation of the infected patients, thereby bringing forward the infection trajectory. Having larger numbers of median age and elderly people of a population can shorten the lag phase. Finally, conducting frequent testing for the COVID-19 helps detect infected patients, followed by the earlier accumulation for the confirmed cases.

## 5 Discussions

It is important to emphasize that, while medical and biological sciences are on the front lines of beating back COVID-19, the true victory relies on advance and coalition of almost every academic field. However, information about COVID-19 is limited: there are currently no vaccines or other therapeutics approved by the US Food and Drug Administration to prevent or treat COVID-19 (on April 13, 2020). Although numerous research works are progressed by different academic field, the information about COVID-19 is scattered around different disciplines, which truly requires interdisciplinary research to hold off the spread of the disease.

The real-time forecast during early stages of the pandemic may results in premature inference outcomes (Hsieh et al., 2004), but it should not demoralize predictive analysis as the entire human race is currently threatened by unprecedented crisis due to COVID-19 pandemic. To improve the predictive accuracy, data integration from multiple countries is a key notion, which is closely related to borrowing information. The motivation of using the borrowing information is to make use of *indirect evidence* (Efron, 2010) to enhance the predictive performance: for example, to extrapolate the infection trajectory for the US, the information not only from the US *(direct evidence*) but also from other countries *(indirect evidence*) are utilized to better predict the trajectory for the US. Further, to render the information borrowing endowed with uncertainty quantification, Bayesian argument is inevitable, inducing sensible inferences and decisions for users (Lindley, 1972).

The results demonstrated the superiority of our approach compared to an existing individual country-based model. Our research outcomes can be thought even more insightful given that we have not employed information about disease-specific covariates. That being said, using more detailed information such as social mixing data, precise hospital records, or patient-specific information will further improve the performance of our model. Moreover, integration of epidemiological models with these statistical models will be our future topic of research.

## 6 Materials and Methods

### 6.1 Research data

In this research, we analyze global COVID-19 data ![Graphic][31]</img>, obtained from *N* = 40 countries. (Meanings for the vector notations, **y***i* and **x***i*, will be explained shortly later.) These countries are most severely affected by the COVID-19 in terms of the confirmed cases on May 14th, and listed on Table 2: each country is contained in the table with format “country name (identifier)”, and this identifier also indicates a severity rank, where a lower value indicates a severer status. The order of the ranks thus coincides with the order of the countries named on the *y*-axis of the Figure 2.

View this table:
[Table 2:](http://medrxiv.org/content/early/2020/05/19/2020.04.23.20077065/T2)

Table 2: 
40 countries on the research

For each country *i* (*i* = 1,…, *N*), let *yit* denotes the number of accumulated confirmed cases for COVID-19 at the *t*-th time point (*t* = 1,…, *T*). Here, the time indices *t* = 1 and *t* = *T* correspond to the initial and end time points, January 22nd and May 14th, respectively, spanning for *T* = 114 (days). The time series data **y***i* = (*yi*1,…,*yit*, …, *yiT*)┬ is referred to as an *infection trajectory* for the country *i*. Infection trajectories for eight countries (US, Russia, UK, Brazil, Germany, China, India, and South Korea) indexed by *i* = 1, 2, 4, 6, 8, 10, 11, and 33, respectively, are displayed in the Figure 1. We collected the data from the Center for Systems Science and Engineering at the Johns Hopkins University.

For each country *i*, we collected 45 covariates, denoted by x*i* = (*xi*1, …,*xij*,…, *xip*)┬ (*p* = 45). The 45 predictors can be further grouped by 6 categories: *the 1st category*: general country and population distribution and statistics; *the 2nd category*: general health care resources; *the 3rd category*: tobacco and alcohol use; *the 4th category*: disease and unhealthy prevalence; *the 5th category*: testing and immunization statistics; and *the 6th category*: international health regulations monitoring. The data sources are the World Bank Data ([https://data.worldbank.org/-](https://data.worldbank.org/)), World Health Organization Data ([https://apps.who.int/-](https://apps.who.int/)), and National Oceanic and Atmospheric Administration ([https://www.noaa.gov/-](https://www.noaa.gov/)). Detailed explanations for the covariates are described in SI Appendix.

### 6.2 Bayesian hierarchical Richards model

We propose a Bayesian hierarchical model based on the Richards curve (1), which is referred to as Bayesian hierarchical Richards model (BHRM), to accommodate the COVID-19 data ![Graphic][32]</img>. (Although the model is based on the Richards curve, the idea can be generalized to any choice for growth curves.) Ultimately, a principal goal of the BHRM is to establish two functionalities:

1.  [Extrapolation] uncover a hidden pattern from the infection trajectory for each country *i*, that is, **y***i* = (*yi*1,…, *yiT*) ┬, through the Richards growth curve *f*(*t*; *θ*1, *θ* 2, *θ*3, *ξ*) (1), and then extrapolate the curve.

2.  [Covariates analysis] identify important predictors among the p predictors **x** = (*x*1,…, *xp*)┬ that largely affect on the shape the curve *f*(*t*; *θ*1, *θ*2, *θ*3, *ξ*) in terms of the three curve parameters.

A hierarchical formulation of the BHRM is given as follows. First, we introduce an additive independently identical Gaussian error to each observation ![Graphic][33]</img>, leading to a likelihood part: ![Formula][34]</img>  where *f*(*t*; *θ*1*i*, *θ*2*i*, *θ*3*i*, *ξi*) is the Richards growth curve (1) which describes a growth pattern of infection trajectory for the i-th country. Because each of the curve parameters (*θ*1, *θ*2, *θ*3) has its own epidemiological interpretations, we construct three separate linear regressions: ![Formula][35]</img>  where *βl* = (*βl*1,…,*βlj*,…,*βlp*)┬ is a p-dimensional coefficient vector corresponding to the *l*-th linear regression.

For the shape parameter *ξ*, we assume the standard log-normal prior:

![Formula][36]</img> 
The motivation of choosing the log-normal prior (4) for the *ξi* is that the prior puts effectively enough mass on the region (0, 3) where most of the estimates for the *ξi* (*i* = 1,…, *N*) concentrated on. Additionally, Gaussianity prior assumption makes it possible to employ the elliptical slice sampler (Murray et al., 2010) in sampling from the full conditional posterior distribution of the *ξi*.

To impose a continuous shrinkage effect (Bhadra et al., 2019) on each of the coefficient vectors, we adopt to use the horseshoe prior (Carvalho et al., 2009, 2010):

![Formula][37]</img> 
Finally, improper priors (Gelman et al., 2004) are used for the intercept terms and error variances terms in the model:

![Formula][38]</img> 
See SI Appendix for a posterior computation for the BHRM (2) – (6).

### 6.3 Technical expressions for three models ![Graphic][39]</img>, ![Graphic][40]</img>, and ![Graphic][41]</img>

Technical expressions for the three models, ![Graphic][42]</img>, ![Graphic][43]</img>, and ![Graphic][44]</img>, compared in Subsection 4.1 are given as follows:

![Graphic][45]</img> is an individual country-based model (nonhierarchical model) that uses infection trajectory for a single country **y** = (*y*1,…, yT)T. The model is given by ![Formula][46]</img>  where *f*(*t*; *θ*1, *θ*2, *θ*3) is the Richards growth curve (1), and improper priors (Gelman et al., 2004) are used for error variances and intercept terms as (6).

![Graphic][47]</img> is a Bayesian hierarchical model without using covariates, which uses infection trajectories from *N* countries, ![Graphic][48]</img>. This model is equivalent to BHRM (2) - (6) with removed covariates terms in (3).

![Graphic][49]</img> is the BHRM (2) – (6).

## Data Availability

Data used in the research is publicly available.

## Supporting Information Appendix

### Appendix A Tables for covariates

View this table:
[Table 3:](http://medrxiv.org/content/early/2020/05/19/2020.04.23.20077065/T3)

Table 3: 
Category of covariates.

View this table:
[Table 4:](http://medrxiv.org/content/early/2020/05/19/2020.04.23.20077065/T4)

Table 4: 
General country and population distribution and statistics.

View this table:
[Table 5:](http://medrxiv.org/content/early/2020/05/19/2020.04.23.20077065/T5)

Table 5: 
Health care resources.

View this table:
[Table 6:](http://medrxiv.org/content/early/2020/05/19/2020.04.23.20077065/T6)

Table 6: 
Tobacco and alcohol use.

View this table:
[Table 7:](http://medrxiv.org/content/early/2020/05/19/2020.04.23.20077065/T7)

Table 7: 
Disease and unhealthy prevalence.

View this table:
[Table 8:](http://medrxiv.org/content/early/2020/05/19/2020.04.23.20077065/T8)

Table 8: 
Testing and immunization statistics.

View this table:
[Table 9:](http://medrxiv.org/content/early/2020/05/19/2020.04.23.20077065/T9)

Table 9: 
International health regulations (IHR) monitoring framework.

### Appendix B Posterior computation

We illustrate a full description of a posterior computation for the BHRM (2) – (6) by using a Markov chain Monte Carlo (MCMC) simulation (Robert and Casella, 2013). To start with, for illustration purpose, we shall use vectorized notations for the likelihood part (2), regression part (3), and its coe_cients part (5):

![Formula][50]</img> 
The *T*-dimensional vector *yi* = (*yi*1,…,*yit*,…,*yiT*)┬ (*i* = 1,…,*N*) is the observed infection trajectory for the country *i* across the times. The notation f(*θ*1*i*, *θ*2*i, θ*3*i, ξ*i) (*i* = 1,…,*N*) is *T*-dimensional vector that describes the Richard curves across the times:

![Formula][51]</img> 
The vectors ***θ****l* = (*θl*1,…,*θlN*)┬ (*l* = 1, 2, 3) and ***ξ*** = (*ξ*1,…,*N*)┬ are *N*-dimensional vectors for the four parameters of the Richards curve (1) across the *N* countries.

The matrix **X** is *N*-by-*p* design matrix whose *i*-th row vector is given by the *p* predictors xi = (*xi*1,…,*xip*)┬ ∊ *ℝp*, (*i* = 1,…,*N*). The notation **I** stands for an identity matrix. Before implementing, it is recommended that each of column vectors of the design matrix **X** is standardized (Armagan et al., 2013; Tibshirani, 1996): that is, each column vector has been centered, and then columnwisely scaled so that each column vector has mean zero and unit *l*2 Euclidean norm.

The *p*-dimensional vector *βl* = (*βl*1,…,*βlp*)┬ (*l* = 1, 2, 3) denotes *p* coefficients from the *l*-th regression. The vector *λl* = (*λl*1,…,*λlp*)┬ (*l* = 1, 2, 3) is *p*-dimensional vector for the local-scale parameters, and the matrix **Λ***l* is *p*-by-*p* diagonal matrix ![Graphic][52]</img> (*l* = 1, 2, 3). The *τl* (*l* = 1; 2; 3) is referred to as the global-scale parameter (Carvalho et al., 2010).

Under the formulation of BHRM (2) – (6), our goal is to sample from the full joint posterior distribution *π*(***θ***1, ***θ***2, ***θ***3, ***ξ***,σ2,Ω1, Ω2, Ω3|y1:*N*) where ![Graphic][53]</img> (*l* = 1, 2, 3), whose proportional part is given by

![Formula][54]</img> 
To sample from the full joint density, we use a Gibbs sampler (Casella and George, 1992) to exploit conditional independences among the latent variables induced by the hierarchy. The following algorithm describes a straightforward Gibbs sampler

***Step 1***. Sample ***θ***1 from its full conditional distribution ![Formula][55]</img>  where ![Graphic][56]</img>. Here, the vector **r** is a *N*-dimensional vector which is given by ![Graphic][57]</img> such that the *T*-dimensional vector **h**(*θ*2i, *θ*3*i*, *ξi*) (*i* = 1,…,*N*) is obtained by ![Formula][58]</img>  where *h*(*t*; *θ*2, *θ*3, *ξ*) = [1 + *ξ·*exp{−*θ*2·(*t* − *θ*3)}]−1/*ξ*

***Step 2***. Sample *θ*2*i* and *θ*3*i*, *i* = 1,⋯,*N*, independently from their full conditional distributions. Proportional parts of the distributions are given by

![Formula][59]</img> 
Here, ||·||2 indicates the *l*2-norm. Note that the two conditional densities are not known in closed forms because two parameters, *θ*2*i* and *θ3i*, participate to the function *f*(*θ*1*i*, *θ*2*i*, *θ*3*i*, *ξi*) in a nonlinear way. We use the Metropolis algorithm (Andrieu et al., 2003) with Gaussian proposal densities within this Gibbs sampler algorithm.

***Step 3***. Sample *ξi*, *i* = 1,⋯,*N*, independently from its full conditional distribution. Proportional parts of the distributions are given by

![Formula][60]</img> 
Note that the density (A.1) is not expressed in a closed form distribution. Because the shape parameter *ξi* is supported on (0, ∞) and participates in the Richards curve (1) as an exponent, sampling from the density needs a delicate care, where by we employed the elliptical slice sampler (Murray et al., 2010).

***Step 4***. Sample *σ*2 from its full conditional distribution

![Formula][61]</img> 
***Step 5***. Sample *αl*, *l* = 1, 2, 3, independently from their full conditional distributions

![Formula][62]</img> 
***Step 6***. Sample *βl*, *l* = 1, 2, 3, independently from conditionally independent posteriors ![Formula][63]</img>  where ![Graphic][64]</img>, ![Graphic][65]</img>, and **Λ****l* = *τ*2 ***Λ****l*.

***Step 7***. Sample *λlj*, *l* = 1, 2, 3, *j* = 1,⋯,*p*, independently from conditionally independent posteriors

![Formula][66]</img> 
Note that the densities *π*(*λlj* |−) (*l* = 1, 2, 3, *j* = 1,⋯, *p*) are not expressed in closed forms: we use the slice sampler (Neal, 2003).

***Step 8***. Sample *πl*, *l* = 1, 2, 3, independently from conditionally independent posteriors

![Formula][67]</img> 
Note that the densities *π*(*τl|−*) (*l* = 1, 2, 3) are not expressed in closed forms: we use the slice sampler (Neal, 2003).

***Step 9***. Sample ![Graphic][68]</img>, *l* = 1; 2; 3, independently from their full conditionally distributions

![Formula][69]</img> #### B.1 Elliptical slice sampler for Step 3

To start with we shall use the variable change (*η* = log *ξ*) to the right hand side of (A.1): ![Formula][70]</img>  such that ![Graphic][71]</img> corresponds to a likelihood part.

Now, we use the elliptical slice sampler (ESS) (Murray et al., 2010; Nishihara et al., 2014) to sample from ![Graphic][72]</img> that exploits the Gaussian prior measure. Conceptually, ESS and the Metropolis-Hastings (MH) algorithm (Chib and Greenberg, 1995) are similar: both methods are comprised of two steps: *proposal step and criterion step*. A difference between the two algorithms arises in the *criterion step*. If the new candidate does not pass the criterion, then MH takes the current state as the next state: whereas, ESS re-proposes a new candidate until rejection does not take place, rendering the algorithm rejection-free. Further information for ESS is referred to the original paper (Murray et al., 2010). After ESS has been employed, the realized *ηi* needs to be transformed back through *ξ* = *eη*. Algorithm 1 illustrates the full description of algorithms: to avoid notation clutter, the index *i* is omitted.

### Algorithm 1

Elliptical slice sampler to sample from *π*(*ξ|−*) 

![Figure10](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/05/19/2020.04.23.20077065/F10.medium.gif)

[Figure10](http://medrxiv.org/content/early/2020/05/19/2020.04.23.20077065/F10)

### Appendix C Infection trajectories for the top 20 countries

The file includes extrapolated infection trajectories for the top 20 countries that are most severely affected by the COVID-19.

## Footnotes

*   1 the first author

*   seyoonlee{at}stat.tamu.edu, bowenlei{at}stat.tamu.edu

*   Received April 23, 2020.
*   Revision received May 15, 2020.
*   Accepted May 19, 2020.


*   © 2020, Posted by Cold Spring Harbor Laboratory

The copyright holder for this pre-print is the author. All rights reserved. The material may not be redistributed, re-used or adapted without the author's permission.

## References

1.  Andrieu, C.,  N. De Freitas,  A. Doucet, and  M. I. Jordan (2003). An introduction to mcmc for machine learning. Machine learning 50 (1-2), 5–43.
    
    
2.  Anton, H. and  A. Herr (1988). Calculus with analytic geometry. Wiley New York.
    
    
3.  Armagan, A.,  D. B. Dunson, and  J. Lee (2013). Generalized double pareto shrinkage. Statistica Sinica 23(1), 119.
    
    
4.  Bhadra, A.,  J. Datta,  N. G. Polson,  B. Willard, et al. (2019). Lasso meets horseshoe: A survey. Statistical Science 34 (3), 405–427.
    
    
5.  Birch, C. P. (1999). A new generalized logistic sigmoid growth equation compared with the richards growth equation. Annals of Botany 83(6), 713–723.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1006/anbo.1999.0877&link_type=DOI) 

6.  Browne, W. J.,  D. Draper, et al. (2006). A comparison of bayesian and likelihood-based methods for fitting multilevel models. Bayesian analysis 1 (3), 473–514.
    
    
7.  Cao, L.,  P.-J. Shi,  L. Li, and  G. Chen (2019). A new flexible sigmoidal growth model. Symmetry 11 (2), 204.
    
    
8.  Carvalho, C. M.,  N. G. Polson, and  J. G. Scott (2009). Handling sparsity via the horseshoe. In Artificial Intelligence and Statistics, pp. 73–80.
    
    
9.  Carvalho, C. M.,  N. G. Polson, and  J. G. Scott (2010). The horseshoe estimator for sparse signals. Biometrika 97(2), 465–480.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/biomet/asq017&link_type=DOI) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000280559700015&link_type=ISI) 

10. Casella, G. and  E. I. George (1992). Explaining the gibbs sampler. The American Statistician 46(3), 167–174.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2307/2685208&link_type=DOI) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1992JE81900001&link_type=ISI) 

11. Causton, D. (1969). A computer program for fitting the richards function. Biometrics, 401–409.
    
    
12. Chib, S. and  E. Greenberg (1995). Understanding the metropolis-hastings algorithm. The american statistician 49(4), 327–335.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2307/2684568&link_type=DOI) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1995TK39900001&link_type=ISI) 

13. Clayton, D. and  M. Hills (2013). Statistical models in epidemiology. OUP Oxford.
    
    
14. Dong, E.,  H. Du, and  L. Gardner (2020). An interactive web-based dashboard to track covid-19 in real time. The Lancet infectious diseases.
    
    
15. Efron, B. (2010). The future of indirect evidence. Statistical science: a review journal of the Institute of Mathematical Statistics 25(2), 145.
    
    
16. Fineberg, H. V., and  M. E. Wilson (2009). Epidemic science in real time.
    
    
17. Gao, J.,  Z. Tian, and  X. Yang (2020). Breakthrough: Chloroquine phosphate has shown apparent efficacy in treatment of covid-19 associated pneumonia in clinical studies. Bioscience trends.
    
    
18. Gelman, A.,  J. B. Carlin,  H. S. Stern, and  D. B. Rubin (2004). Bayesian data analysis. Chapman and Hall/CRC.
    
    
19. Gompertz, B. (1825). Xxiv. on the nature of the function expressive of the law of human mortality, and on a new mode of determining the value of life contingencies. in a letter to francis baily, esq. frs &c. Philosophical transactions of the Royal Society of London (115), 513–583.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1098/rstl.1825.0026&link_type=DOI) 

20. Hethcote, H. W. (2000). The mathematics of infectious diseases. SIAM Review 42(4), 599–653.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1137/s0036144500371907&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=WOS:00016567&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F05%2F19%2F2020.04.23.20077065.atom) 

21. Hill, B. M. (1965). Inference about variance components in the one-way model. Journal of the American Statistical Association 60(311), 806–825.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2307/2283247&link_type=DOI) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1965CKX1500008&link_type=ISI) 

22. Hsieh, Y.-H. (2009). Richards model: a simple procedure for real-time prediction of outbreak severity. In Modeling and dynamics of infectious diseases, pp. 216–236. World Scientific.
    
    
23. Hsieh, Y.-H. (2010). Pandemic influenza a (h1n1) during winter influenza season in the southern hemisphere. Influenza and Other Respiratory Viruses 4 (4), 187–197.
    
    
24. Hsieh, Y.-H., and  C. Chen (2009). Turning points, reproduction number, and impact of climato-logical events for multi-wave dengue outbreaks. Tropical Medicine & International Health 14 (6), 628–638.
    
    
25. Hsieh, Y.-H.,  J.-Y. Lee, and  H.-L. Chang (2004). Sars epidemiology modeling. Emerging infectious diseases 10 (6), 1165.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.3201/eid1006.031023&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=15224675&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F05%2F19%2F2020.04.23.20077065.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000221749300036&link_type=ISI) 

26. Hsieh, Y.-H., and  S. Ma (2009). Intervention measures, turning point, and reproduction number for dengue, singapore, 2005. The American journal of tropical medicine and hygiene 80(1), 66–71.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoidHJvcG1lZCI7czo1OiJyZXNpZCI7czo3OiI4MC8xLzY2IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMDUvMTkvMjAyMC4wNC4yMy4yMDA3NzA2NS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

27. Hu, W.,  S. Tong,  K. Mengersen, and  B. Oldenburg (2006). Rainfall, mosquito density and the transmission of ross river virus: A time-series forecasting model. Ecological modelling 196(3–4), 505–514.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ecolmodel.2006.02.028&link_type=DOI) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000238994800018&link_type=ISI) 

28. Huttenhower, C. and  O. G. Troyanskaya (2006). Bayesian data integration: a functional perspective. In Computational Systems Bioinformatics, pp. 341–351. World Scientific.
    
    
29. James, W. and  C. Stein (1992). Estimation with quadratic loss. In Breakthroughs in statistics, pp. 443–460. Springer.
    
    
30. Jennifer Beam Dowd, Liliana Andriano, D. M. B. V. R. P. B. X. D. Y. L. M. C. M. (2020). Demographic science aids in understanding the spread and fatality rates of covid-19. Proc. Natl. Acad. Sci. U.S.A..
    
    
31. Jia, L.,  K. Li,  Y. Jiang,  X. Guo, and  T. zhao (2020). Prediction and analysis of coronavirus disease 2019.
    
    
32. Kahm, M.,  G. Hasenbrink,  H. Lichtenberg-Fraté,  J. Ludwig, and  M. Kschischo (2010). grofit: fitting biological growth curves with rj stat. softw. 33: 1–21.
    
    
33. Korobeinikov, A. (2004). Lyapunov functions and global properties for seir and seis epidemic models. Mathematical medicine and biology: a journal of the IMA 21 (2), 75–83.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/imammb/21.2.75&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=15228100&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F05%2F19%2F2020.04.23.20077065.atom) 

34. Lenzerini, M. (2002). Data integration: A theoretical perspective. In Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 233–246.
    
    
35. Lindley, D. V. (1972). Bayesian statistics, a review, Volume 2. SIAM.
    
    
36. Liu, Z.,  P. Magal,  O. Seydi, and  G. Webb (2020). Predicting the cumulative number of cases for the covid-19 epidemic in china from early data.
    
    
37. Malinverno, A. and  V. A. Briggs (2004). Expanded uncertainty quantification in inverse problems: Hierarchical bayes and empirical bayes. Geophysics 69 (4), 1005–1016.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NToiZ3NncHkiO3M6NToicmVzaWQiO3M6OToiNjkvNC8xMDA1IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMDUvMTkvMjAyMC4wNC4yMy4yMDA3NzA2NS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

38. Murray, I.,  R. Prescott Adams, and  D. J. MacKay (2010). Elliptical slice sampling.
    
    
39. Neal, R. M. (2003). Slice sampling. Annals of statistics, 705–741.
    
    
40. Nelder, J. A. (1962). 182. note: An alternative form of a generalized logistic equation. Biometrics 18 (4), 614–616.
    
    
41. Nishihara, R.,  I. Murray, and  R. P. Adams (2014). Parallel mcmc with generalized elliptical slice sampling. The Journal of Machine Learning Research 15(1), 2087–2112.
    
    
42. Pell, B.,  Y. Kuang,  C. Viboud, and  G. Chowell (2018). Using phenomenological models for forecasting the 2015 ebola challenge. Epidemics 22, 62–70.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.epidem.2016.11.002&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27913131&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F05%2F19%2F2020.04.23.20077065.atom) 

43. Peng, L.,  W. Yang,  D. Zhang,  C. Zhuge, and  L. Hong (2020). Epidemic analysis of covid-19 in china by dynamical modeling.
    
    
44. Qiang Li, Wei Feng, Y.-H. Q. (2020). Trend and forecasting of the covid-19 outbreak in china. Journal of Infection 80, 469–496.
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32092392&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F05%2F19%2F2020.04.23.20077065.atom) 

45. Remuzzi, A. and  G. Remuzzi (2020). Covid-19 and italy: what next? The Lancet.
    
    
46. Reza Yaesoubi, T. C. (2011). Generalized markov models of infectious disease spread: A novel framework for developing dynamic health policies. European Journal of Operational Research 215(3), 679–687.
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21966083&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F05%2F19%2F2020.04.23.20077065.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000295301200018&link_type=ISI) 

47. Richards, F. (1959). A flexible growth function for empirical use. Journal of experimental Botany 10(2), 290–301.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/jxb/10.2.290&link_type=DOI) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1959XE91800011&link_type=ISI) 

48. Robert, C. and  G. Casella (2013). Monte Carlo statistical methods. Springer Science & Business Media.
    
    
49. Rushton, S.,  P. Lurz,  J. Gurnell,  P. Nettleton,  C. Bruemmer,  M. Shirley, and  A. Sainsbury (2006). Disease threats posed by alien species: the role of a poxvirus in the decline of the native red squirrel in britain. Epidemiology & Infection 134 (3), 521–533.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1017/S0950268805005303&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=16238822&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F05%2F19%2F2020.04.23.20077065.atom) 

50. Seber, G. A., and  C. J. Wild (2003). Nonlinear regression. hoboken. New Jersey: John Wiley & Sons 62, 63.
    
    
51. Shen X, Ou L, C. X. Z. X. T. X. (2013). The application of the grey disaster model to forecast epidemic peaks of typhoid and paratyphoid fever in china. PLOS ONE 8(4).
    
    
52.  Sheng Zhang, MengYuan Diao, W. Y. L. P. Z. L. D. C. (2020). Estimation of the reproductive number of novel coronavirus (covid-19) and the probable outbreak size on the diamond princess cruise ship: A data-driven analysis. International Journal of Infectious Diseases 93, 201–204.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ijid.2020.02.033&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32097725&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F05%2F19%2F2020.04.23.20077065.atom) 

53. Stone, M. and  B. Springer (1965). A paradox involving quasi prior distributions. Biometrika 52(3/4), 623–627.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/biomet/52.3-4.623&link_type=DOI) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A19657091600028&link_type=ISI) 

54. Thompson, W. W.,  L. Comanor, and  D. K. Shay (2006). Epidemiology of seasonal influenza: use of surveillance data and statistical models to estimate the burden of disease. The Journal of infectious diseases 194 (Supplement_2), S82–S91.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1086/507558&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=17163394&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F05%2F19%2F2020.04.23.20077065.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000241503000005&link_type=ISI) 

55. Tiao, G. C., and  W. Tan (1965). Bayesian analysis of random-effect models in the analysis of variance. i. posterior distribution of variance-components. Biometrika 52(1/2), 37–53.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/biomet/52.1-2.37&link_type=DOI) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A19656542700003&link_type=ISI) 

56.  Tiberiu Harko, Francisco S.N. Lobo, M. M. (2014). Exact analytical solutions of the susceptible-infected-recovered (sir) epidemic model and of the sir model with equal death and birth rates. Applied Mathematics and Computation 236(1), 184–194.
    
    
57. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58(1), 267–288.
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1996TU31400017&link_type=ISI) 

58. Tsoularis, A. and  J. Wallace (2002). Analysis of logistic growth models. Mathematical biosciences 179(1), 21–55.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0025-5564(02)00096-2&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=12047920&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F05%2F19%2F2020.04.23.20077065.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000176531300002&link_type=ISI) 

59. Wang, X.,  S. Liu, and  Y. Huang (2016). A study on the rapid parameter estimation and the grey prediction in richards model. Journal of Systems Science and Information 4 (3), 223–234.
    
    
60. Wang, X.-S.,  J. Wu, and  Y. Yang (2012). Richards model revisited: Validation by and application to infection dynamics. Journal of Theoretical Biology 313, 12–19.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.jtbi.2012.07.024&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22889641&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F05%2F19%2F2020.04.23.20077065.atom) 

61. Wasserman, L. (2013). All of statistics: a concise course in statistical inference. Springer Science & Business Media.
    
    
62. Werker, A. and  K. Jaggard (1997). Modelling asymmetrical growth curves that rise and then fall: applications to foliage dynamics of sugar beet (beta vulgaris l.). Annals of Botany 79(6), 657–665.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1006/anbo.1997.0387&link_type=DOI) 

63. Wu, K.,  D. Darcet,  Q. Wang, and  D. Sornette (2020). Generalized logistic growth modeling of the covid-19 outbreak in 29 provinces in china and in the rest of the world. *arXiv preprint arXiv:2003.05681*.
    
    
64. Yang, Z.,  Z. Zeng,  K. Wang,  S.-S. Wong,  W. Liang,  M. Zanin,  P. Liu,  X. Cao,  Z. Gao,  Z. Mai,  J. Liang,  X. Liu,  S. Li,  Y. Li,  F. Ye,  W. Guan,  Y. Yang,  F. Li,  S. Luo,  Y. Xie,  B. Liu,  Z. Wang,  S. Zhang,  Y. Wang,  N. Zhong, and  J. He (2020). Modified seir and ai prediction of the epidemics trend of covid-19 in china under public health interventions. Journal of Thoracic Disease 12(3).
    
    
65. Zhirui He, H. T. (2018). Epidemiology and arima model of positive-rate of influenza viruses among children in wuhan, china: A nine-year retrospective study. International Journal of Infectious Diseases 74, 61–70.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/i.iiid.2018.07.003&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F05%2F19%2F2020.04.23.20077065.atom)

 [1]: /embed/graphic-3.gif
 [2]: /embed/inline-graphic-1.gif
 [3]: /embed/inline-graphic-2.gif
 [4]: /embed/inline-graphic-3.gif
 [5]: /embed/inline-graphic-4.gif
 [6]: /embed/inline-graphic-5.gif
 [7]: /embed/inline-graphic-6.gif
 [8]: /embed/inline-graphic-7.gif
 [9]: /embed/inline-graphic-8.gif
 [10]: /embed/inline-graphic-9.gif
 [11]: /embed/graphic-5.gif
 [12]: /embed/inline-graphic-10.gif
 [13]: /embed/inline-graphic-11.gif
 [14]: /embed/inline-graphic-12.gif
 [15]: /embed/inline-graphic-13.gif
 [16]: /embed/inline-graphic-14.gif
 [17]: /embed/inline-graphic-15.gif
 [18]: /embed/inline-graphic-16.gif
 [19]: /embed/inline-graphic-17.gif
 [20]: /embed/inline-graphic-18.gif
 [21]: /embed/inline-graphic-19.gif
 [22]: F4/embed/inline-graphic-20.gif
 [23]: F4/embed/inline-graphic-21.gif
 [24]: F4/embed/inline-graphic-22.gif
 [25]: /embed/inline-graphic-23.gif
 [26]: /embed/graphic-9.gif
 [27]: F7/embed/inline-graphic-24.gif
 [28]: /embed/inline-graphic-25.gif
 [29]: F9/embed/inline-graphic-26.gif
 [30]: /embed/inline-graphic-27.gif
 [31]: /embed/inline-graphic-28.gif
 [32]: /embed/inline-graphic-29.gif
 [33]: /embed/inline-graphic-30.gif
 [34]: /embed/graphic-15.gif
 [35]: /embed/graphic-16.gif
 [36]: /embed/graphic-17.gif
 [37]: /embed/graphic-18.gif
 [38]: /embed/graphic-19.gif
 [39]: /embed/inline-graphic-31.gif
 [40]: /embed/inline-graphic-32.gif
 [41]: /embed/inline-graphic-33.gif
 [42]: /embed/inline-graphic-34.gif
 [43]: /embed/inline-graphic-35.gif
 [44]: /embed/inline-graphic-36.gif
 [45]: /embed/inline-graphic-37.gif
 [46]: /embed/graphic-20.gif
 [47]: /embed/inline-graphic-38.gif
 [48]: /embed/inline-graphic-39.gif
 [49]: /embed/inline-graphic-40.gif
 [50]: /embed/graphic-28.gif
 [51]: /embed/graphic-29.gif
 [52]: /embed/inline-graphic-41.gif
 [53]: /embed/inline-graphic-42.gif
 [54]: /embed/graphic-30.gif
 [55]: /embed/graphic-31.gif
 [56]: /embed/inline-graphic-43.gif
 [57]: /embed/inline-graphic-44.gif
 [58]: /embed/graphic-32.gif
 [59]: /embed/graphic-33.gif
 [60]: /embed/graphic-34.gif
 [61]: /embed/graphic-35.gif
 [62]: /embed/graphic-36.gif
 [63]: /embed/graphic-37.gif
 [64]: /embed/inline-graphic-45.gif
 [65]: /embed/inline-graphic-46.gif
 [66]: /embed/graphic-38.gif
 [67]: /embed/graphic-39.gif
 [68]: /embed/inline-graphic-47.gif
 [69]: /embed/graphic-40.gif
 [70]: /embed/graphic-41.gif
 [71]: /embed/inline-graphic-48.gif
 [72]: /embed/inline-graphic-49.gif