A Novel Intervention Recurrent autoencoder for real time forecasting and non-pharmaceutical intervention selection to curb the spread of Covid-19 in the world ============================================================================================================================================================== * Qiyang Ge * Zixin Hu * Shudi Li * Wei Lin * Li Jin * Momiao Xiong ## ABSTRACT As the Covid-19 pandemic soars around the world, there is urgent need to forecast the number of cases worldwide at its peak, the length of the pandemic before receding and implement public health interventions to significantly stop the spread of Covid-19. Widely used statistical and computer methods for modeling and forecasting the trajectory of Covid-19 are epidemiological models. Although these epidemiological models are useful for estimating the dynamics of transmission od epidemics, their prediction accuracies are quite low. To overcome this limitation, we formulated the real-time forecasting and evaluating multiple public health intervention problem into forecasting treatment response problem and developed recurrent neural network (RNN) for modeling the transmission dynamics of the epidemics and Counterfactual-RNN (CRNN) for evaluating and exploring public health intervention strategies to slow down the spread of Covid-19 worldwide. We applied the developed methods to the real data collected from January 22, 2020 to May 8, 2020 for real-time forecasting the confirmed cases of Covid-19 across the world. Key Words * Covid-19 * recurrent neural networks * artificial intelligence * time series * causal inference * forecasting ## Introduction As of May 11, 2020, global confirmed cases of Covid-19 passed 4,152,670 and has spread to 212 countries, causing fear globally (Anastassopoulou et al. 2020). The serious public health threat of Covid-19 has never been seen for more than one century. The government officers and people around the world are desperately trying to slow the spread of Covid-19 (Irfan 2020). We must change our policies to deal with increased mobility of citizens and immediately implement the public health interventions and expanding the virus testing to stop the spread of Covid-19 across the world. How computer modeling of Covid-19’s transmission dynamics could help governments to quickly and strongly move to slow down the spread of Covid-19? Widely used statistical and computer methods for modeling of Covid-19 simulate the transmission dynamics of epidemics to understand their underlying mechanisms, forecast the trajectory of epidemics, and assess the potential impact of a number of public health measures on curbing the spread speed of Covid-19 (Li et al. 2020, Wu et al. 2020, Zhao et al. 2020, Kucharski et al. 2020, Tuite et al. 2020, Hellewell et al. 2020, Li et al. 2020). Although these epidemiological models are useful for estimating the dynamics of transmission, and evaluating the impact of intervention strategies, they have some serious limitations (Funk et al. 2018, Johansson et al. 2019). First, the epidemiological models consist of ordinary differential equations that have many unknown parameters, and depend on many assumptions. It is difficult to translate public interventions to these parameters. Most analyses used hypothesized parameters, which often lead to fitting data very poor. Health officers desperately want to track the trajectory of epidemics and accurately estimate the peak time and number of cases, duration, and ending time and number of cases of Covid-19 for their health policy plan. However, the forecasting results of using the classical epidemiological models such as Susceptible-Exposed-Infectious-Removed (SEIR) models are highly unreliable. Second, the successful application of public health intervention planning highly depends on the model parameter identifiability. However, overall, the parameters in the complex compartmental dynamic models are unidentifiable (Roosa and Chowell 2019, Roda et al. 2020). The values of parameters cannot be uniquely determined from the real data (Gábor et al. 2017). The variances of the estimators of these parameters are very high. To overcome limitations of the epidemiological model approach, and assist public health planning and policy making, we formulated the real-time forecasting and evaluating multiple public health intervention problem into off-policy evaluation (OPE) and forecasting treatment response problem where the aim is to estimate the response of a new public health intervention policy, given historical data that may have been generated by a different public health intervention policies (Bibaut et al. 2019). We viewed the interventions as treatments where multiple interventions were administered at different time points. The number of new cases were taken as treatment responses. The ability to accurately estimate effects of public health interventions over time would allow health officers to determine what intervention strategies should be used and the optimal time at which to implement them (Lim et al. 2018). Recurrent Intervention Network (RIN) (Lim et al. 2018) where a recurrent neural network architecture for forecasting a nation’s response (number of new cases) to a sequence of planned interventions were used to forecast and evaluate multiple public health interventions for Covid-19 worldwide. Potential outcomes of RIN were trajectory of the spread of Covid-19. Public health interventions including locking down residential buildings and compounds, strict self-quarantine for families, door-to-door inspection for suspected cases, maintaining social distancing, stopping mass gatherings, closure of schools and universities, vacating hotels and university dormitories. To quantify comprehensive intervention strategies, an intervention variable that comprehensively and abstractly measures mobility activities and social distancing was used as an input variable for each block of RIN (detailed description of the intervention measure was summarized in Training Procedures and Loss function Section in the Supplementary Note A). We cluster all the countries in the world into several groups. For each group, a value (weight) was assigned to each group such that the average prediction error of counterfactual recurrent network (CRN) was small. The RIN is taken as a general framework for investigating how Covid-19 evolves under different intervention plans, how individual nation responds to intervention over time, but also which are optimal timings for assigning interventions. Therefore, this approach will provide new tools to improve public health planning and policy making. The RIN was applied to the surveillance data of lab confirmed Covid-19 cases in the world up to May 8, 2020. Data on the number of confirmed, new and death cases of Covid-19 from January 22, 2020 to May 8, 2020 were obtained from John Hopkins Coronavirus Resource Center ([https://coronavirus.jhu.edu/MAP.HTML](https://coronavirus.jhu.edu/MAP.HTML)). ## Methods ### RIN as a Framework for modeling and forecasting the spread of Covid-19 over time with multiple interventions The RIN uses sequence-to-sequence multi-input/output recurrent neural network (RNN) architectures to model health intervention plan and make multi-step prediction of the response trajectory of Covid-19 over time with multiple interventions (Lim et al., 2018). The RNN can learn the complex dynamics within the temporal ordering of input time series of Covid-19 and use an internal memory to remember. The health intervention plan has multiple intervention regimens. As shown in Figure 1, the RIN determines the intervention response (similar to counterfactual outputs) for a given set of planned interventions and evaluates the impact of different intervention strategies and their implementation times on the curbing the spread of Covid-19 and provides timely selection of optimal sequence of intervention strategies. ![Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/07/31/2020.05.05.20091827/F1.medium.gif) [Figure 1.](http://medrxiv.org/content/early/2020/07/31/2020.05.05.20091827/F1) Figure 1. Forecasting intervention response to curbing the spread of Covid-19 under a sequence of interventions. The RIN is a RNN autoencoder. It consists of two RNNs: the encoder RNN (vanilla RNN (Figure S1) or long short-term memory (LSTM) (Figure S2) is used as encoder) and the decoder RNN (vanilla RNN or LSTM is used as the decoder). The RNN encoder models input time series (past history of the number of cases of Covid-19 over time) and predicts future response time series (number of cases of Covid-19 in the future with a planned sequence of interventions) (Srivastava et al., 2015). The latent state of the RNN encoder after reading in the entire input time series (past trajectory of Covid-19), is the representation (compressed latent features of the entire input time series) of the input trajectory of Covid-19. Unlike the standard decoder where the decoder reconstructs back the input time series from the latent representation, the RNN decoder uses the learned features of the dynamics of Covid-19 in the RNN encoder to forecast the potential response time series, given a sequence of planned public health interventions as an input to the RNN decoder. The feature vector learned in the RNN encoder is then provided as an input to the RNN decoder which initiate prediction of the future dynamics of Covid-19 under the future interventions (Figure 2). Detailed description of the RIN was listed in Supplementary Note A. ![Figure S1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/07/31/2020.05.05.20091827/F8.medium.gif) [Figure S1.](http://medrxiv.org/content/early/2020/07/31/2020.05.05.20091827/F8) Figure S1. Architecture of vanilla RNN. ![Figure S2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/07/31/2020.05.05.20091827/F9.medium.gif) [Figure S2.](http://medrxiv.org/content/early/2020/07/31/2020.05.05.20091827/F9) Figure S2. Architecture of LSTM. ![Figure 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/07/31/2020.05.05.20091827/F2.medium.gif) [Figure 2.](http://medrxiv.org/content/early/2020/07/31/2020.05.05.20091827/F2) Figure 2. Architecture of recurrent intervention network. ### Training the RIN RIN training consisted of RNN encoder training and RNN decoder training (Supplementary Note A). We first fitted a model of the system’s dynamics of Covid-19 to the data from past experimental interventions to learn representations of the states of the dynamics of Covid-19 (encoder training), and then used the learned fit to extrapolate and forecast the response to the alternative interventions (decoder training). The RNN encoder training procedures were briefly introduced here. For details, please see the Supplementary Note A. The basic RNN unit in the RIN consisted of input layer, hidden layer and output layer (Figure S1). The input variables in the RNN encoder can include covariates *X**t* such as density of population, traffic flow, health facility resources, GDP, and social-economic status although this study did not include these quantities, intervention variable *A**t* and the numbers of cases (potential outcomes) *Y**t* at the time *t*. The state in the hidden layer at the time *t* was denoted by *h**t*. The output layer had the output variable *Y**t*+1. A nonlinear activation function was exponential linear unit (ELU) (Clevert et al., 2015) which was defined as ![Formula][1] where *α* > 0. ELU is similar to ReLU when *x* > 0. ELUs diminish the vanishing gradient effect as ReLUs. The vanishing gradient problem is alleviated because the positive part of these functions is the identity, therefore their derivative is one and not contractive. However, tanh and sigmoid activation functions are contractive almost everywhere. In contrast to ReLUs, ELUs have negative values (ReLU does not have), which pushes the mean of the activations closer to zero. Mean activations that are closer to zero enable faster learning as they bring the gradient closer to the natural gradient. The input data were divided into several batches with length of 7 days. Each batch was used to train the RNN encoder which forecasted standard one-step-ahead intervention response *Ŷ**t*+1 as close to the observed intervention response *Y**t*+1 as possible via the nonlinear mapping (Supplementary note A) ![Formula][2] The mean-squared error was used as loss function for training the RNN encoder. The training was performed via the standard propagation algorithms (Supplementary A). After training was completed, the RNN encoder extracted the hidden state *h**t*that captured the internal features of the transmission dynamics of Covid-19 via performing a feed-forward pass over the training data on the RNN encoder (Supplementary A). After the RNN encoder training was completed, we began to train the RNN decoder. An RNN unit in the RNN decoder consisted of input layer with intervention variable *A**t*+*τ*, (*τ* = 1,2, …), hidden layer with hidden state *Z**t*+*τ*−1 and output *Y**t*+*τ*+1. For a given country, observations, intervention *A**t* and the number of cases *Y**t* were randomly divided into short batches of up to *τ**b* time steps. Each batch of short sequence starting at time *t* and ending at time *t* + *τ**b* − 1 consisted of ![Graphic][3]. The mean square errors were still used as the RNN decoder loss function. The goal of RNN decoder training is to make its training loss function smallest (Supplementary A). ### Forecasting Procedures After completion of the training, the trained RIN was used to forecast the future number of cumulative cases of Covid-19 under some planned interventions for each country. During evaluation, we do not have access to ground-truth outcomes. Therefore, we used the trained decoder to make one step ahead forecasting. The outcomes forecasted by the decoder (*Ŷ**t*+1, …,*Ŷ**t*+*τ*−1) were recursively used as inputs. The recursive multiple-step forecasting involved using a one-step model multiple times where the prediction for the preceding time step and intervention strategy were used as an input for making a prediction on the following time step. For example, for forecasting the number of new confirmed cases for the one more next day, the predicted number of new cases in one-step forecasting would be used as an observational input in order to predict day 2. Repeat the above process to obtain the two-step forecasting. The summation of the final forecasted number of new confirmed cases for each country was taken as the prediction of the total number of new confirmed cases of Covid-19 worldwide under the intended intervention. By running the decoder, we can select starting and ending time of different interventions and the optimal or appropriate interventions to give over time to obtain the best outcomes of controlling the spread of Covid-19 for each country. ### Data Collection The analysis is based on surveillance data of confirmed cumulative and new COVID-19 cases worldwide as of May 8, 2020. Data on the number of cumulative and new cases and COVID-19-attributed deaths across 186 countries from January 22, 2020 to May 8, 2020 were obtained from John Hopkins Coronavirus Resource Center ([https://coronavirus.jhu.edu/MAP.HTML](https://coronavirus.jhu.edu/MAP.HTML)). ### Data Pre-processing log2 was used to transform the original data: ![Graphic][4]. The intervention measure was calculated as follows. Set the intervention measure at the final time ![Graphic][5] for China, ![Graphic][6] for Korea South, Switzerland, United Kingdom, Spain, US, Italy, Germany, Iran, and France, and ![Graphic][7] for all other countries. Assume that the intervention measure curve is an exponential function starting at 0 and ends at ![Graphic][8]. The intervention measure *A**t* is given by ![Graphic][9], where *p* > 0, *p* ≠ 1 is the curve shape factor and *t* takes values in evenly sliced numbers of interval [0, 1], *A* is the intervention measure at the initial time *t*. When *p* = 1, *A**t* is a linear function ![Graphic][10]. In this study, we set *p* = 0.01. We randomly picked *k* = 64 countries with *l* = 7 length of Covid-19 time series (the number of new cases over time) data staring from the same day to generate *k* time series with *l* length for a minibatch to be used for backpropagation training through time. Calculate the mean value of each time series in the batch. The values of each time series were divided by their mean values to normalize the data. ## Results ### Prediction accuracy of dynamics of Covid-19 using RIN Accurate prediction of the spread of Covid-19 is important for health intervention plan for the future. To demonstrate that the RIN is an accurate forecasting method, the RIN was applied to confirmed accumulated cases of COVID-19 across 186 countries. Figure 3 plotted reported and one-step ahead predicted time-case curves of Covid-19 where blue dotted curve was the number of estimated cumulative cases after the analysis completion. To further reliably evaluate the forecasting accuracy, we reported 10-step ahead forecasted numbers of cumulative cases and errors of Covid-19 of 8 countries in Table 1 starting with April 29, 2020. The forecasting errors were quite small. View this table: [Table 1.](http://medrxiv.org/content/early/2020/07/31/2020.05.05.20091827/T1) Table 1. Forecasting errors of worldwide and eight countries, where 0.5, 0.55 and 0.6 were intervention measures ![Figure 3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/07/31/2020.05.05.20091827/F3.medium.gif) [Figure 3.](http://medrxiv.org/content/early/2020/07/31/2020.05.05.20091827/F3) Figure 3. Reported and predicted time-case curves of Covid-19 worldwide where blue dotted curve was the number of reported cumulative cases after the analysis completion. Inputting a sequence of hypothesized public health intervention strategies, the outputs of the RIN decoder were counterfactual numbers of cases of Covid-19 to respond to the intervention strategies. Interventions were measured by number in the interval [0, 1], where 1 indicated the strictest comprehensive public health intervention, 0 indicated no intervention and the values between 0 and 1 indicated the various less strict interventions. To intuitively illustrate the impact of the measure of intervention on the spread of Covid-19, we presented Figure S3. Figure S3 plotted counterfactual numbers of new cases of Covid-19 over time worldwide to respond the interventions with values 0, 0.3, 0.5, 0.7 and 1. We observed that if the measure of intervention was 1, the number of new cases was dramatically deceased to zero. However, when the measure of intervention was 0.3, the number of new cases exponentially increased. The measure of intervention had big effect on the spread of Covid-19. ![Figures S3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/07/31/2020.05.05.20091827/F10.medium.gif) [Figures S3.](http://medrxiv.org/content/early/2020/07/31/2020.05.05.20091827/F10) Figures S3. Counterfactual numbers of cases of Covid-19 over time worldwide to respond the interventions with values 0, 0.3, 0.5, 0.7 and 1. The number of cases of Coid-19 was a function of the past history and the measure of intervention. Forecasting also depended on the measure of intervention. In Table 1, we also listed the measures of the interventions which provided information on the degrees of current interventions in the country. The measure of interventions in the most countries was 0.6. However, the current measure of interventions in UK was 0.5, the smallest in 8 countries. These results showed that the RIN for forecasting the trajectory of Covid-19 was accurate and reliable. Similar to causal inference, the RIN can be used to evaluate the impact of a sequence of multiple intervention strategies on the curbing the spread of Covid-19 if the interventions were viewed as treatments. ### The number of cases of Covid-19 grows exponentially without additional intensive interventions To investigate how Covid-19 pandemic surges around the world, we presented Figure 4 that showed the forecasted number of cumulative cases of Covid-19 worldwide over time, assuming that the current intervention measure remains. We observed that the number of cumulative cases of Covid-19 exponentially grown and would reach extremely high number 8,491,301 on July 1, 2020 if none of additional comprehensive public health intervention was implemented. Similarly, Figure 5 and Figure S3 plotted time-case curves of Covid-19 of eight countries: Italy, Spain, Iran, Germany, USA, France, Belgium and UK, and worldwide with unchanged intervention strategies in the future, respectively. We also observed exponentially growth of the numbers of cases of Coid-19 for many countries without additional intervention. ![Figure 4.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/07/31/2020.05.05.20091827/F4.medium.gif) [Figure 4.](http://medrxiv.org/content/early/2020/07/31/2020.05.05.20091827/F4) Figure 4. The numbers of cumulative and new cases of Covid-19 worldwide over time, assuming the current intention remains unchanged. The curves in blue color and red color represented the number of cumulative cases and the number of new cases, respectively. ![Figure 5.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/07/31/2020.05.05.20091827/F5.medium.gif) [Figure 5.](http://medrxiv.org/content/early/2020/07/31/2020.05.05.20091827/F5) Figure 5. Forecasted number of cumulative cases of Covid-19 of eight countries Italy (A), Spain (B), Iran (C), Germany (D), USA (E), France (F), UK (G) and Belgium (H) over time without additional interventions. ### We are in the eve to successfully curb the spread of Covid-19 As Covid-19 Accelerates and exponentially grows, how to slow down the spread of Covid-19 is an urgent task for every country around world. To demonstrate that when the additional intervention was implemented, the number of new cases of Covid-19 would decrease, we presented Figures 6 and 7. Figures 6 and 7 plotted the number of cumulative case and new case curves of Covid-19 over time for 12 countries: US, Italy, Spain, Germany, France, Iran, UK, Switzerland, Belgium, South Korea, Japan and Singapore under three invention scenarios, respectively. Scenario 1 started with the intervention measure of 0.50 for one week, then transitioned to the intervention with a measure of 0.70 in two weeks. The scenario 2 started with the intervention measure of 0.5 for the first week, and then transitioned to the intervention with a measure of 0.70 in three weeks. The scenario 3 started with the intervention measure for one week, changed to 0.60 in three weeks and finally transitioned to the intervention with a measure of 0.70 in three weeks. Figures 6 and 7 showed that when the countries moved to intervention with measure of 0.70, the spread of Covid-19 in all 12 countries was curbed. Now the measures of interventions in the most of 12 countries were closer to 0.60 (Table 1). These countries were closing to stopping the spread of Covid-19 if additional interventions such as wearing face masks were implemented. ![Figure 6.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/07/31/2020.05.05.20091827/F6.medium.gif) [Figure 6.](http://medrxiv.org/content/early/2020/07/31/2020.05.05.20091827/F6) Figure 6. Number of cumulative case curves of Covid-19 over time for 12 countries under three invention scenarios. (A) Time-case plot for US, (B) Time-case plot for Italy, (C) Time-case plot for Spain, (D) Time-case plot for Germany, (E) Time-case plot for France, (F) Time-case plot for Iran, (G) Time-case plot for UK, (H) Time-case plot for Switzerland, (I) Time-case plot for Belgium, (J) Time-case plot for South Korea, (K) Time-case plot for Japan, and (L) Time-case plot for Singapore. ![Figure 7.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/07/31/2020.05.05.20091827/F7.medium.gif) [Figure 7.](http://medrxiv.org/content/early/2020/07/31/2020.05.05.20091827/F7) Figure 7. Number of new case curves of Covid-19 over time for 12 countries under three invention scenarios. (A) Time-new case plot for US, (B) Time-new case plot for Italy, (C) Time-new case plot for Spain, (D) Time-new case plot for Germany, (E) Time-new case plot for France, (F) Time-new case plot for Iran, (G) Time-new case plot for UK, (H) Time-new case plot for Switzerland, (I) Time-new case plot for Belgium, (J) Time-new case plot for South Korea, (K) Time-new case plot for Japan, and (L) Time-new case plot for Singapore. Next we investigated how various intervention strategies reduced the peak time and cumulative case numbers, and the final total number of cases. Table 2 showed the forecasted results of COVID-19 worldwide and in 11 countries under three sequences of interventions (Scenarios 1-3). Figure S4 plotted the time-case curves of Covid-19 worldwide under three invention scenarios. We can see that under all three scenarios, the peak times worldwide and in all 11 countries were May 18, 2020 and April 24 or before April 24, 2020, respectively; and the spread of COVID-19 worldwide and in all 11 countries would be stopped by the end of June, 2020. ![Figure S4.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/07/31/2020.05.05.20091827/F11.medium.gif) [Figure S4.](http://medrxiv.org/content/early/2020/07/31/2020.05.05.20091827/F11) Figure S4. Time-case plot of Covid-19 worldwide under three invention scenarios View this table: [Table 2.](http://medrxiv.org/content/early/2020/07/31/2020.05.05.20091827/T2) Table 2. Covid-19 dynamics in 11 countries and worldwide for three scenarios. ## Conclusion/Discussion As an alternative to the epidemiologic transmission models, we formulated the real-time forecasting and evaluating multiple public health intervention problem into a novel causal inference problem. We viewed the interventions as treatments where multiple interventions were administered at different time points. The number of new cases were taken as treatment responses. The RIN uses sequence-to-sequence multi-input/output recurrent neural network as a tool for modeling the real-time trajectory of the transmission dynamics of Covid-19, health intervention planning and making multi-step prediction of the response trajectory of Covid-19 over time with multiple interventions. The RNN can learn the complex dynamics within the temporal ordering of input time series of Covid-19 and use an internal memory to remember the hidden features. This AI and causal inference-inspired approach allows us to address three important questions. The first question is the prediction accuracy. Unlike other dynamic systems where the parameters in the systems and control variables are, in general, independent, the epidemic systems have intervention and system dependent parameters. We designed the intervention variable that quantified comprehensive intervention strategies and had close relationships with the parameters in the epidemic systems. Therefore, the RINs could take the parameters in the epidemic dynamic systems as input control variables that can be estimated in the RIN training. The RIN models were closer to real epidemic dynamic systems than the epidemiological models. Therefore, our results showed that the RIN substantially improved the accuracies of prediction and subsequently multiple-step forecasting. The second question is how important is the intervention time. Since interventions are complicated and are difficult to quantify, we designed three intervention scenarios to represent the degrees and delays of interventions. Since the proposed methods combine the real data and models, they allowed us to evaluate the consequences of multiple intervention strategies, while maintaining the analysis as close to the real data as possible. The RIN investigated the impact of multiple public intervention plans and intervention measures on the size, duration and time of the virus outbreak and recommended the appropriate intervention times. We estimated the duration, peak time and ending time, peak number of new cases and cumulative cases, and maximum number of cumulative cases of COVID-19 under three intervention scenarios for 184 countries in the world. We observed that the number of cumulative cases of Covid-19 would exponentially grow and reach extremely high number 199,554,596 on July 6, 2020 if none of additional comprehensive public health intervention was implemented. However, we also found that top 12 countries with the largest number of the lab confirmed cumulative cases of COVID-19 were closing to stopping the spread of Covid-19 if additional interventions such as wearing face masks were implemented. We can see that under all three scenarios, the peak times were before April 15, 2020 and the spread of COVID-19 would be sopped before the end of May, 2020. ## Data Availability Data on the number of cumulative and new cases and COVID-19-attributed deaths across 184 countries from January 22, 2020 to April 7 were obtained from John Hopkins Coronavirus Resource Center ([https://coronavirus.jhu.edu/MAP.HTML](https://coronavirus.jhu.edu/MAP.HTML)). ## Conflict of interest We have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. ## Supplementary Materials ### Supplementary Note A #### Recurrent Intervention Network (RIN) ##### Simple RNN Unit RIN consisted of an encoder RNN and a decoder RNN. The RNN is a mapping from a sequence space (or time series) to another sequence space (or time series space) and the current output depends on both current input and the whole observation history (whole time series). The RNN extracts short-term local dependency patterns among variables and to discover long-term patterns for time series trends (Lai et al. 2017). However, the traditional autoregressive methods do not distinguish the two types of patterns and capture their interactions explicitly and dynamically. Basic unit of RNN in both encoder and decoder had three layers: input, recurrent hidden and output layers (Figure S1). The input layer consisted of three types of variables: covariates ![Graphic][11], a scaler intervention variable A*t* and the numbers of cases (potential outcomes) *Y**t* at the time *t*. The covariates can include the rate of virus test, Google mobility indexes. Define the input vector *V**t* as ![Formula][12] Let ![Graphic][13] be a *m* dimensional hidden state vector where *m* is set to be 100 in this study. The data *V**t* is inputted into the input layer. The linear transformation *W**vh**V**t* of the data *V**t* is then sent to the hidden layer, where *W**vh* is a *m* × (*k* + *l*+ 1) dimensional matrix. The hidden layer receives information from the input layer and hidden layer at the previous time point. The state is determined by the following nonlinear transformation of its received information: ![Formula][14] where *W**hh* is a *m* × *m* dimensional weight matrix that connect the previous state to the current state, and ![Graphic][15] is a *m* dimensional bias vector that corrects the bias, and *f**h* is a element-wise nonlinear activation function and is often defined as the following “tanh” function: ![Formula][16] The neurons in hidden layer are connected to the output layer via a *m* dimensional weight vector *W**hy*. The output *Ŷ**t*+1 is determined by ![Formula][17] where *f**o* is an activation function and *b**o* is the bias vector of the output neurons. #### Long Short Term Memory Network Unit Long Short Term Memory network (LSTM) is a special kind of RNN, capable of learning long-term dependencies (Hochreiter and Schmidhuber, 1997; Srivastava et al. 2015). The LSTM attempts to preserve long term useful information and skip short-term irrelevant information. Each LSTM unit has a cell that is a memory unit (Figure S2). The key to LSTMs is the cell state. The cell state is determined by four gates: forget gate, input gate, input modulation, and output gate. At each time step, the unit receives input from two external sources: the current input variables ![Graphic][18] and the previous hidden states *h**t*−1. The total input activates the gates through sigmoid function and the tanh nonlinear function. The forget gate layer decides what information we’re going to throw away from the cell state: ![Formula][19] The next step, the input gate layer decides which values we’ll update: ![Formula][20] Next, input modulation gate creates a vector of new candidate values, *g**t*, that could be added to the state: ![Formula][21] The old cell state is updated by ![Formula][22] where ° denotes element-wise multiplication. Output gate decides what parts of the cell state we’re going to output: ![Formula][23] We update the hidden state *h**t* by tanh transformation of cell state, multiplied by the output of the sigmoid output gate. When the output gate is 1, we pass all memory information in the cell state through to the predictor. In contrast, when the output is 0, we keep all the information only within the cell. #### Potential Outcome Framework for Evaluation of Public Health Interventions Learning intervention policies is an extremely challenge problem. We employ the Counterfactually-Guided Policy Evaluation (CF-GPS) principle to evaluate the effect of public health interventions on controlling the spread of Covid-19 (Buesing et al. 2018). We view the public health intervention as treatment and the number of cases as the outcome. Counterfactual treatment outcome estimation is essentially a causal problem. Most methods for causal inference are designed for the statistic setting and cannot be applied to evaluating the effects of the sequence of public health interventions on (e.g. sequential application of intervention A followed by intervention B) the transmission dynamics of Covid-19 over time. The models that can deal with varying-length transmission dynamic histories of Covid-19 are needed for estimating intervention effects over time. Potential outcome framework is our basic model to evaluate the impact of the public health interventions on the spread of Covid-19. The potential outcome framework is often referred to the Neyman-Rubin model (Rubin 1974). Potential outcomes consist of actual (or observed) and counterfactual (hypothesized) outcomes. We are interested in number of cases of Covid-19 under some specific intervention. We observed the number of cases of Covid-19 (actual observation) without intervention or known intervention. However, we want to know what number of cases of Covid-19 (counterfactual, unobserved) would be if an alternative intervention was implemented. To evaluate the effect of intervention, we should compare the difference between the observed actual number of cases of Covid-19 and the counterfactual number of cases of Covid-19. Our aim is to learn the counterfactual outcomes of Coid-19 under a sequence of public health intervention options and evaluate the impact of the intervention strategies on the spread of Covid-19. The Recurrent Intervention Network (RIN), a novel sequence-to-sequence recurrent neural network is an architecture for estimating the effects of intervention on the spread of Covid-19 over time. A basic block of the RIN is the RNN. The RNN avoids the need for specifying any explicit model, using the networks to learn the relationships among variables directly from the data. The RIN consists of an encoder and a decoder. Both of the encoder and decoder use the RNN as their basic architectures (Bica et al. 2020). The encoder network uses vanilla RNN, or LSTM to model the history of the covariates, interventions and outcomes (number of cases), learn the causal structure from the retrospective data and predict one-step-ahead outcomes, given observations of covariates and actual interventions. The decoder network uses the hidden state computed by the encoder to initialize the state of an RNN in the decoder which predicts the counterfactual outcomes for a sequence of hypothesized future interventions (Bica et al. 2020). The decoder attempts to propagate the encoder representation forwards in time, using only the planned interventions and avoiding the covariates. #### Problem Mathematical Formulation Consider *N* countries in the world. Each country is viewed as a sample. For the *i**th* country and time point *t*, let ![Graphic][24] be a set of covariates and ![Graphic][25] be an intervention. The intervention variable ![Graphic][26] can be a binary variable: ![Graphic][27] indicates that intervention is (not) implemented. The intervention variable ![Graphic][28] can also be continuous variable taking values in the interval [0, 1]. If ![Graphic][29] is a continuous variable, the value of ![Graphic][30] represents the intensity of intervention. ![Graphic][31] indicates that the intervention is the most strict and comprehensive public health intervention. ![Graphic][32] indicates that no intervention is taken. Let ![Graphic][33] be the outcome (number of cases) of the *i**th* country at the time point *t*. The observed dataset is denoted by ![Graphic][34] where ![Graphic][35] is prespecified. The potential outcome framework for treatment effect estimation which accounts for the time varying treatments (Bica et al. 2020) is extended to evaluation of the effects of the interventions on the spread of Covid-19. Let ![Graphic][36] be the history of the outcomes (number of cases) ![Graphic][37], the covariates ![Graphic][38] and interventions *Ã**t* = (*A*1, …, *A**t*). Let *Y*[*ã*] be the potential outcomes that can be either observed or counterfactual, under each possible sequence of intervention *ã*. The potential outcome framework assumes the existence of the hypothetical outcome with some interventions which is not observed in the data. The hypothetical outcome under hypothetical intervention is called counterfactual outcome. Given the history ![Graphic][39] and a sequence of planned interventions *Ã*(*t, t* + *τ* − 1) = (*A*, …, *A**t*+*τ*−1), the counterfactual outcome *Y**t*+*τ*[*Ã*(*t,t* + *τ* − 1)] can be predicted by ![Formula][40] which define the future dynamic trajectory of the Covid-19 under the planned sequence of interventions, given the previous history of Covid-19 and its environments. To make the prediction of the dynamic trajectory of the Covid-19 under the potential outcome framework to be identifiable, we need to make the following assumptions (Bica et al. 2020). #### Assumptions We introduce assumptions in the Neyman-Rubin model (Bica et al., 2020). Assumption 1. ### (Consistency) If a nation receives an intervention A*t* = *a**t*, then the potential outcome for the intervention *a**t* which can be counterfactual is equal to the observed (factual) outcome *Y**t*+1(*a**t*) = *Y**t*+1. Assumption 2. ### (Overlap) For all (*a*1, *x*1, …, *a**t*−1, *x**t*−1), we have 0 < *P*(A*t* = *a**t*|A1 = *a*1, *X*1 = *x*1, …, A*t*−1 = *a**t*−1, *X**t*−1 = *x**t*−1) < 1. In other words, at each time step, each intervention has non-zero probability of being implemented. Assumption 3. ### Sequential strong ignorability Conditional on A1 = *a*1, *X*1 = *x*1, …, A*t*−1 = *a**t*−1, *X**t*−1 = *x**t*−1, the potential outcomes *Y**t*+1 are independent of A*t*, ![Formula][41] Assumption 3 implies that there is no confounders which affect both outcomes and interventions. #### Training Procedures and Loss function As we discussed in the previous section, the counterfactual policy evaluation (CPE) was a powerful tool to address the pressing issue of evaluating the performance of public health interventions. The CPE problem was formulated as a potential outcome or counterfactual estimation problem (Bibaut et al. 2019). The RIN training consisted of the encoder training and decoder training. We first fitted a model of the system’s dynamics of Covid-19 to the data from past experimental interventions to learn representations of the states of the dynamics of Covid-19 (encoder training), and then used the learned fit to extrapolate and forecast the response to the alternative interventions (decoder training). Since the output is continuous, the mean square errors are used as the loss function for the encoder: ![Formula][42] where *n* is the number of countries with outbreak of Covid-19 and *T**i* is the length of the observed country’s trajectory of Covid-19. For the decoder, we assumed that observations were batched into shorter sequences of up to *τ**b*, a prediction horizon. The loss function for the decoder is defined as ![Formula][43] The major steps for encoder and decoder training were given below. #### Step 1: Data pre-processing Data were split into training dataset (01/22-04/27, 2020) and validation dataset (04/28-05/08/2020). All the input number of lab-confirmed cumulative cases *Y**t* was pre-processed by the following transformation:![Graphic][44]. The number of new cases was calculated as ![Graphic][45]. #### Step 2: Generate the intervention measure curve The intervention measure was calculated as follows. Set the intervention measure at the final time ![Graphic][46] for China, ![Graphic][47] for Korea South, Switzerland, United Kingdom, Spain, US, Italy, Germany, Iran, and France, and ![Graphic][48] for all other countries. Assume that The intervention measure curve is an exponential function starting at 0 and ends at ![Graphic][49] The intervention measure *A**t* is given by ![Graphic][50], where *p* > 0, *p* ≠ 1 is the curve shape factor and *t* takes values in evenly sliced numbers of interval [0, 1], *A* is the intervention measure at the initial time *t*. When *p* = 1, *A**t* is a linear function ![Graphic][51]. In this study, we set *p* = 0.01. ##### Step 3: minibatches and normalization Randomly picked *k* = 64 countries with *l*= 7 length of Covid-19 time series data staring from the same day to generate *k* time series with *l* length for a minibatch to be used for backpropagation training through time. Calculate the mean value of each time series in the batch. The values of each time series were divided by their mean values. #### Step 4: Encoder Training The randomly grouped minibatch data with *k* = 64 countries and *l* = 7 length of the number of lab confirmed new cases of Covid-19 and sequence of interventions were used as input to train the encoder. The standard back-propagation method and the Adam Optimizer were used to minimize the loss function of each batch, defined in equation (A1). Upon completion, the encoder was used to perform a feed-forward pass over the training data for extraction of the internal states *h**t* that were used to train the decoder and for prediction of one-step-ahead outcomes *Y**t*+1. #### Step 5: Decoder Training During decoder training, the decoder uses as input the extracted internal states *h**t* computed by the encoder, the outcomes (*Y**t*+2, …, *Y**t*+*τ*) from the observational data that were batched into a shorter sequences of up to *τ* steps, and the planned sequence of interventions (*A**t*+1, …, *A**t*+*τ*−1). Similar to encoder training, the standard back-propagation method and the Adam Optimizer were used to minimize the loss function of each batch, defined in equation (A2). The initial learning rate in the updating parameters in both encoder and decoder training was 0.02 and learning rate decay was 0.0001. #### Step 6: Evaluation of the Intervention Strategies and Forecasting After completion of the training, the decoder can be used to evaluation of interventions. During evaluation, we do not have access to ground-truth outcomes. Therefore, we used the decoder to make one step ahead forecasting. The outcomes forecasted by the decoder (*Ŷ**t*+1, …, *Ŷ**t*+*τ*−1) were recursively used as inputs. For each country, by running the decoder that was trained in step 5, with a sequence of planned interventions and recursively forecasted outcomes, we forecasted the response dynamics, i.e., the number of new cases of Covid-19 of the country over time under a sequence of interventions and evaluate the effects of various intervention strategies on controlling the spread of Covid-19. By running the decoder, we can select starting and ending time of different interventions and the optimal or appropriate interventions to give over time to obtain the best outcomes of controlling the spread of Covid-19 for each country. ## Acknowledgements Dr. Li Jin was partially supported by National Natural Science Foundation of China (91846302) Dr. Wei Lin is supported by the National Key R&D Program of China (Grant no. 2018YFC0116600), by the National Natural Science Foundation of China (Grant no. 11925103) and by the STCSM (Grant no. 18DZ1201000). Thank the reviewers for their helpful comments that substantially improve the paper. ## Footnotes * Corrected misprint of the title and updated results. * Received May 5, 2020. * Revision received July 31, 2020. * Accepted July 31, 2020. * © 2020, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at [http://creativecommons.org/licenses/by-nc-nd/4.0/](http://creativecommons.org/licenses/by-nc-nd/4.0/) ## REFERENCES 1. Anastassopoulou, R., Tsakris, S. (2020). Data-based analysis, modelling and forecasting of the COVID-19 outbreak. 2020. doi:[https://doi.org/10.1101/2020.02.11.20022186](https://doi.org/10.1101/2020.02.11.20022186). 2. Irfan, U. (2020). The math behind why we need social distancing, starting right now. [https://www.vox.com/2020/3/15/21180342/coronavirus-covid-19-us-social-distancing](https://www.vox.com/2020/3/15/21180342/coronavirus-covid-19-us-social-distancing). 3. Li, Q., Guan, X., Wu, P., Wang, X., Zhou, L., Tong Y. et al. (2020). Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia. N Engl J Med 382 1199–1207. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMoa2001316&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F07%2F31%2F2020.05.05.20091827.atom) 4. Wu, J. T., Leung, K., and Leung, G. M. (2020). Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study. Lancet 395 689–697. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0140-6736(20)30260-9&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32014114&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F07%2F31%2F2020.05.05.20091827.atom) 5. Zhao, S., Musa, S. S., Lin, Q. Y. et al. (2020). Estimating the unreported number of novel Coronavirus (2019-nCoV) cases in China in the first half of January 2020: A data-driven modelling analysis of the early outbreak. J Clin Med 9 pii: E388. 6. Kucharski, A. J., Russell, T. W., Diamond, C., CMMID nCov WORKING GROUP, Edmunds, J. W., Funk, SN., and Eggo R. M. (2020). Analysis and projections of transmission dynamics of nCoV in Wuhan (2020) [https://cmmid.github.io/ncov/wuhan\_early_dynamics/index.html](https://cmmid.github.io/ncov/wuhan_early_dynamics/index.html). 7. Tuite, A. R. and Fisman, D. N. (2020). Reporting, epidemic growth, and reproduction numbers for the 2019 novel coronavirus (2019-nCoV) epidemic. Ann Intern Med. 2020 Apr 21;172(8):567–568. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F07%2F31%2F2020.05.05.20091827.atom) 8. Hellewell, J., Abbott, S., Gimma, A. et al. (2020). Feasibility of controlling COVID-19 outbreaks by isolation of cases and contacts. Lancet Glob Health 8 e488–e496. 9. Li, R., Pei, S., Chen, B., Song, Y., Zhang, T., Yang, W., Shaman, J. Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV2). Science 368 489–493. 10. Roosa, K., and Ghowell, G. (2019). Assessing parameter identifiability in compartmental dynamic models using a computational approach: application to infectious disease transmission models. Theor Biol Med Model 16 1. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s12976-018-0097-6&link_type=DOI) 11. Funk, S., Camacho, A., Kuchrski, A. J., Eggo, R. M., Edmunds, W. J. (2018). Real-time forecasting of infectious disease dynamics with a stochastic semi-mechanistic model. Epidemics 22 56–61. 12. Johansson, M. A., Apfeldorf, K. M., Dobson, S. et al. (2019). An open challenge to advance probabilistic forecasting for dengue epidemics. Proc Natl Acad Sci U S A. 116 24268–24274. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czoxMjoiMTE2LzQ4LzI0MjY4IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMDcvMzEvMjAyMC4wNS4wNS4yMDA5MTgyNy5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 13. Gabor, A., Villaverde, A. F., and Banga, J. R. (2017). Parameter identifiability analysis and visualization in large-scale kinetic models of biosystems. BMC Syst Biol 11 54. 14. Bibaut, A., Malenica, I., Vlassis, N., Van Der Laan, M. (2019). More efficient off-policy evaluation through regularized targeted learning. 15. Lim, B., Alaa, A. M., and Van Der Schaar, M. (2018). Forecasting treatment responses over time using recurrent marginal structural networks. In Advances in Neural Information Processing Systems. 2018 7493–7503. 16. Clevert, D. A., Unterthiner, T., Hochreiter, S. (2015). Fast and accurate deep network learning by exponential linear units (ELUs) arxiv:1511.07289. 17. Bica, I., Alaa, A. M., Jordon, J., and Van Der Schaar, M. (2020). Estimating counterfactual treatment outcomes over time through adversarially balanced representations. arxiv:2002.04083. ## References 1. Lai, G., Chang, WC., Yang, Y., Liu, H. (2017). Modeling long- and short-term temporal patterns with deep neural networks. arxiv:1703.07015. 2. Bibaut, A. Malenica, I., Vlassis N., Laan MVD. (2019). More efficient off-policy evaluation through regularized targeted learning. Proceedings of the 36th International Conference on Machine Learning PMLR 97: 654–663. 3. Hochreiter, S., and Schmidhuber, J. (1997). Long short-term memory. Neural computation 9 1735–1780. 4. Buesing, L., Weber, T., Zwols, Y., Racaniere, S., Guez, A., Lespiau, JB., Heess, N. (2018). Woulda, coulda, shoulda: Counterfactually-guided policy search. arxiv:1811.06272. 5. Srivastava, N., Mansimov, E., Salakhutdinov, R. (2015). Unsupervised learning of video representations using LSTMs. arxiv:1502.04681. 6. Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and non-randomized studies. Journal of Educational Psychology 66 688–701. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1037/h0037350&link_type=DOI) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1974U496900007&link_type=ISI) 7. Lim, B., Alaa, A. M., and Van Der Schaaar, M. (2018). Forecasting treatment responses over time using recurrent marginal structural networks. In Advances in Neural Information Processing Systems 2018: 7493–7503. 8. Bica, I., Alaa, A. M., Jordon, J., and Van Der Schaaar, M. (2020). Estimating counterfactual treatment outcomes over time through adversarially balanced representations. arXiv e-prints, page arxiv:2002.04083. [1]: /embed/graphic-3.gif [2]: /embed/graphic-4.gif [3]: /embed/inline-graphic-1.gif [4]: /embed/inline-graphic-2.gif [5]: /embed/inline-graphic-3.gif [6]: /embed/inline-graphic-4.gif [7]: /embed/inline-graphic-5.gif [8]: /embed/inline-graphic-6.gif [9]: /embed/inline-graphic-7.gif [10]: /embed/inline-graphic-8.gif [11]: /embed/inline-graphic-9.gif [12]: /embed/graphic-15.gif [13]: /embed/inline-graphic-10.gif [14]: /embed/graphic-16.gif [15]: /embed/inline-graphic-11.gif [16]: /embed/graphic-17.gif [17]: /embed/graphic-18.gif [18]: /embed/inline-graphic-12.gif [19]: /embed/graphic-19.gif [20]: /embed/graphic-20.gif [21]: /embed/graphic-21.gif [22]: /embed/graphic-22.gif [23]: /embed/graphic-23.gif [24]: /embed/inline-graphic-13.gif [25]: /embed/inline-graphic-14.gif [26]: /embed/inline-graphic-15.gif [27]: /embed/inline-graphic-16.gif [28]: /embed/inline-graphic-17.gif [29]: /embed/inline-graphic-18.gif [30]: /embed/inline-graphic-19.gif [31]: /embed/inline-graphic-20.gif [32]: /embed/inline-graphic-21.gif [33]: /embed/inline-graphic-22.gif [34]: /embed/inline-graphic-23.gif [35]: /embed/inline-graphic-24.gif [36]: /embed/inline-graphic-25.gif [37]: /embed/inline-graphic-26.gif [38]: /embed/inline-graphic-27.gif [39]: /embed/inline-graphic-28.gif [40]: /embed/graphic-24.gif [41]: /embed/graphic-25.gif [42]: /embed/graphic-26.gif [43]: /embed/graphic-27.gif [44]: /embed/inline-graphic-29.gif [45]: /embed/inline-graphic-30.gif [46]: /embed/inline-graphic-31.gif [47]: /embed/inline-graphic-32.gif [48]: /embed/inline-graphic-33.gif [49]: /embed/inline-graphic-34.gif [50]: /embed/inline-graphic-35.gif [51]: /embed/inline-graphic-36.gif