Recurrent Neural Reinforcement Learning for Counterfactual Evaluation of Public Health Interventions on the Spread of Covid-19 in the world =========================================================================================================================================== * Qiyang Ge * Zixin Hu * Kai Zhang * Shudi Li * Wei Lin * Li Jin * Momiao Xiong ## Abstract As the Covid-19 pandemic soars around the world, there is urgent need to forecast the expected number of cases worldwide and the length of the pandemic before receding and implement public health interventions for significantly stopping the spread of Covid-19. Widely used statistical and computer methods for modeling and forecasting the trajectory of Covid-19 are epidemiological models. Although these epidemiological models are useful for estimating the dynamics of transmission of epidemics, their prediction accuracies are quite low. Alternative to the epidemiological models, the reinforcement learning (RL) and causal inference emerge as a powerful tool to select optimal interventions for worldwide containment of Covid-19. Therefore, we formulated real-time forecasting and evaluation of multiple public health intervention problems into off-policy evaluation (OPE) and counterfactual outcome forecasting problems and integrated RL and recurrent neural network (RNN) for exploring public health intervention strategies to slow down the spread of Covid-19 worldwide, given the historical data that may have been generated by different public health intervention policies. We applied the developed methods to real data collected from January 22, 2020 to June 28, 2020 for real-time forecasting the confirmed cases of Covid-19 across the world. We forecasted that the number of laboratory confirmed cumulative cases of Covid-19 will pass 26 million as of August 14, 2020. Key Words * Covid-19 * reinforcement learning * recurrent neural networks * artificial intelligence * time series * causal inference ## Introduction As of June 30, 2020, global confirmed cases of Covid-19 passed 10,475,817, including 511,251 deaths and has spread to 213 countries, causing an immense public health crisis. The government officers and people around the world have implemented various nonpharmaceutical interventions to slow the spread of Covid-19 [1]. These public health interventions include cessation of public gatherings, traffic restriction, stay-at-home orders, closures of schools and nonessential businesses, face mask ordinances, maintaining social distancing, quarantine, isolation and expanding virus testing. However, implementing public health interventions will cause substantial economic losses and social damage. Now the critical question is how to reopen the economy, while containing the Covid-19 pandemic? A key to correctly answering this question is to reconstruct the complex epidemic dynamic systems from the data, precisely predict the extent or duration of Covid-19, and develop algorithms to evaluate the effects of public health intervention on the transmission dynamics of Covid-19 and devise practical implementable public health interventions to control the spread of Covid-19 in the world. Widely used statistical and computer methods for modeling of Covid-19 simulate the transmission dynamics of epidemics to understand their underlying mechanisms, forecast the trajectory of epidemics, and assess the potential impact of a number of public health measures on curbing the spread speed of Covid-19 [2-8]. Covid-19 Forecast Hub collected 48 models for Covid-19 forecasts [9]. The majority of these models are epidemiological models. Although these epidemiological models are useful for estimating the dynamics of transmission, they have some critical limitations [10,11]. First, most epidemiological models assume that the reproduction number *R* is constant. However, in the real world, the reproduction number *R*is affected by various interventions such as lockdown of the epidemic areas, travel restrictions, population mobility, social distancing, and climate factors [12]. Therefore, the reproduction number R often changes over time. The assumptions that the parameters in the model are constant will dramatically limit our ability to simulate interventions and improve prediction accuracy. Second, the epidemiological models consist of ordinary differential equations that have many unknown parameters and depend on many assumptions. Most analyses used hypothesized parameters, which often lead to poorly fitting data. Third, the successful application of public health intervention planning highly depends on the model parameter identifiability. However, some researchers show that the parameters in the complex compartmental dynamic models are unidentifiable [13]. The values of parameters cannot be uniquely determined from the real data [14]. The variances of the estimators of these parameters are very high. Fourth, the intervention measures are not explicitly included in the epidemiological models. These models lack the mechanisms to evaluate the actual effects of public health interventions on infection rates in the ongoing Covid-19 [2]. An essential issue for overcoming these limitations is to explicitly incorporate counterfactual evaluation mechanisms into the models. Reinforcement learning (RL) and counterfactual outcome can be used as a general framework for evaluating the dynamic response of Covid-19 to the intervention measures and optimizing the intervention strategy [15-22]. RL is learning actions or interventions. It arises from solving optimal control problems of partially observed Markov Decision Processes by learning an intervention policy [23]. The control problem consists of identifying the dynamic systems and optimal control design. We can view the transmission dynamics of Covid-19 as a dynamic system or Markov Decision Process. A typical dynamic system is usually modeled by nonlinear state space equations, which can in turn be transformed into recurrent neural networks (RNN) [24]. The RNN is an ideal tool to learn a partially observed Markov Decision Process. After the dynamic system or Markov Decision Process is learned from historical data, we can use RL or optimal control theory (dynamic programming for a discrete system or pontryagin’s maximum principle for a continuous system) to infer control signal or actions, which transforms the system to the desired state [25]. RL provides a wealth of information about the consequences of actions, or information about cause and effect. The goal of public health interventions is to contain the Covid-19 as soon as possible. However, the set of actions or health interventions for stopping the spread of Covid-19 is limited. The environments that determine the transition dynamics of Covid-19 may change rapidly over time. The future environments of Covid-19 may be substantially different from the previous one. The actions or interventions cannot be only inferred from the historical data. To fully design optimal actions or interventions in the RL may not be feasible. Therefore, we formulated the real-time forecasting and evaluating multiple public health intervention problem into off-policy evaluation (OPE) and counterfactual outcome forecasting problem within the RL framework where the aim is to estimate the response of a new public health intervention policy, given historical data that may have been generated by different public health intervention policies [26]. We interpreted the interventions as treatments where multiple interventions were implemented at different time points and the number of new cases as treatment responses. The accurate estimation of effects of public health interventions over time would allow health officers to make plans on what intervention strategies should be used and at what times to implement interventions [27]. Public health interventions including virus testing, isolation and contact tracing, travel restriction, strict self-quarantine for families, maintaining social distancing, stopping mass gatherings, closure of schools and nonessential business and vacating hotels. To quantify comprehensive intervention strategies, an intervention variable that comprehensively and abstractly measures virus testing, mobility activities and social distancing was used as an action variable in the RL. Recurrent neural reinforcement learning (RNRL) is taken as a general framework for investigating how Covid-19 evolves under different interventions, how individual nations respond to the interventions over time, and what are optimal timings for implementing interventions. Therefore, the RNRL will provide new tools to forecast the trajectory of Covid-19 under interventions and improve public health planning and decision making. The RNRL was applied to the surveillance data of lab confirmed Covid-19 cases in the world up to June 28, 2020. Data on the number of confirmed and new cases of Covid-19 from January 22, 2020 to June 28, 2020 were obtained from the John Hopkins Coronavirus Resource Center ([https://coronavirus.jhu.edu/MAP.HTML](https://coronavirus.jhu.edu/MAP.HTML)). ## Methods ### RNRL as a framework for modeling and evaluating the effect of the interventions on the spread of Covid-19 Markov Decision Process (MDP) is a theoretic process for the RL. RL has three components: state, action and reward and consists of system identification and optimal control of design [28]. The RNRL combines the RL with RNN [23]. The RL can be viewed as an open dynamic system with a correspondent reward function (or loss function). The dynamic system can be a discrete time or continuous time dynamic system. Here we focus on discrete time dynamic systems and partially observed MDP. Let *h**t* ∈ *R**m*be a hidden state and *y**t* be the observed variable (the number of new cases) at the time *t*. Let *A**t* be an intervention variable or action variable at time *t* and *x**t* be a vector of covariates. Consider the following dynamic system underlying the transmission dynamics of Covid-19: ![Formula][1] ![Formula][2] where equation (1) is the system equation, equation (2) is the observation equation, and *f, g* are two nonlinear functions. System equation (1) states that the next hidden state *h**t*+1 is transitioned from the current hidden state *h**t* and influenced by the current action or intervention *A**t*. The corresponding reward function is defined as *R*: *A* → *R*, which is a function of the current action. The reward at time *t* is defined as *R**t* = *R*(*A**t*). Since the current reward may make a small contribution to the total reward in the long run, an accumulated reward over time with a possible discount factor *γ* ∈ [0,1] is defined as ![Formula][3] The MDP and agent (learner) generate a sequence: *h*, *A*, *R*1, *h*1, *A*1, *R*2, …. The RL consists of two step learning: (1) system identification and (2) optimal intervention policy learning. The reward functions in two step learning are different. ### Reward function for system identification The system identification serves two purposes. First, since the dynamics of Covid-19 is partially observed, the hidden states should be estimated from the historical data. Second, to learn the optimal control (intervention) policy, we need to identify the system underlying the dynamics of Covid-19. It serves as a basis for the second step, optimal intervention policy learning. For the convenience of discussion, equation (2) is modified to ![Formula][4] Our goal is to minimize the reward (loss) function: ![Formula][5] where *A* = [*A*, *A*1, …, *A**T*−1]*T* are estimated from the data, *g, h* functions are implemented by RNN (See Supplementary Note A) ### Reward function for optimal intervention policy learning Inferring the optimal intervention (control) policy depends on the model identified in the previous step. In the second step, we search an optimal intervention (control) policy that minimizes the number of cumulated cases or the number of deaths. Therefore, the reward function at time *t* is defined as ![Formula][6] In other words, we want to make the number of new cases at time *t* as small as possible. Let *π* be the action selection policy which determines the model’s next action *A**t*. The action selection policy *π* which depends on the hidden state, observed data and covariates is given by ![Formula][7] We attempt to minimize the reward function: ![Formula][8] ### RNN for system identification System identification is to learn a model underlying the dynamics of Covid-19 from available historical data. The historical data includes the number of cases (new or cumulative) *y**t*, the covariates *x**t* such as age, sex, race, the action or intervention *A**t*. The model captures the main developments of the underlying system and explains the system evolvement beyond the observed data region. Recurrent neural networks (RNN) are a powerful tool for system identification [29]. The RNN can learn the complex dynamics within the temporal ordering of input time series of Covid-19 and use an internal memory to remember. The RNN consists of two types of inputs and outputs: (1) internal input and output and (2) external input and output (Figure 1). The internal output of RNN can be viewed as “system state” *h**t* which is passed to the next timestep. An RNN cell receives a prior internal state *h**t*−1 and a current external input: the number of cases *y**t*, …, *y**t*−*l*+1, action (intervention) *A**t* and covariates *x**t*, and generates a current internal state *h**t* and an external current output *y**t*+1 (the number of cases) at time *t* + 1. The RNN models input the time series (past history of the number of cases of Covid-19 over time) and predicts future response time series (number of cases of Covid-19 in the future with a planned sequence of interventions). ![Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/07/10/2020.07.08.20149146/F1.medium.gif) [Figure 1.](http://medrxiv.org/content/early/2020/07/10/2020.07.08.20149146/F1) Figure 1. Architecture of RNN encoder. Define the input vector *V**t* as ![Formula][9] The RNN model a state transition and an output equation of the dynamic system underlying Covid-19 as follows: ![Formula][10] ![Formula][11] where *W**hh* is a *m*× *m* dimensional weight matrix that connects the previous state to the current state. *W**vh* is a *m*× *l* dimensional matrix, ![Graphic][12] is the *k**th* iteration of intervention measure at the time *t. W**ah* is a *m* dimensional vector. *W**xh* is a *m*× *k* dimensional matrix, *x**t* is a *k* dimensional vector of covariates, and ![Graphic][13] is a *m* dimensional bias vector that corrects the bias, and *f**h* is a element-wise nonlinear activation function. *W**hy* is a *m* dimensional weight vector, *f**y* is an activation function and *b**y* is the bias vector of the output neurons. In summary, using RNN to identify the system underlying the dynamics of Covid-19 can be formulated as the following optimization problem: ![Formula][14] s.t. ![Formula][15] ![Formula][16] ![Formula][17] where ![Graphic][18] is the (*k* + 1)*th* iteration of intervention measure at time *t, π* is a nonlinear activation function, *W**ah* is a 1 × *m* dimensional matrix, the parameters *θ* are the weight matrices and bias vectors. The above minimization problem will be solved by a backpropagation method and forward dynamic programming [27]. The detailed algorithm for training is summarized in the Supplementary Note A. ### RNN for learning actions The main purpose of the RL is to make the best decision from historical data. The second part of the typical RL is to learn optimal control policy (Figure 2). Learning optimal control policy is usually formulated as an optimal control problem. If the state space is discrete, dynamic programming is used to find the optimal control policy [27]. If the state space is continuous, the Hamilton-Jacobi-Bellman (HJB) equation is used to solve the optimal control problem [29]. Choices of public health interventions are restricted by multiple political, cultural, technological and economic factors. Policy optimization is often practically infeasible. Therefore, we do not attempt to design optimal control actions. ![Figure 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/07/10/2020.07.08.20149146/F2.medium.gif) [Figure 2.](http://medrxiv.org/content/early/2020/07/10/2020.07.08.20149146/F2) Figure 2. Architecture of RNN decoder. In contrast, we use off-policy methods that evaluate or improve a policy different from that used to generate the data to select suitable actions (interventions) from a set of feasible actions (interventions). We propose to use RNN-based counterfactual action evaluation as a general framework for modeling and forecasting the spread of Covid-19 over time with multiple interventions [30]. Second RNN is used for learning counterfactual actions (interventions). The RNN forecasts the intervention response (similar to counterfactual outputs) for a given set of planned counterfactual actions (interventions) and evaluates the impact of different counterfactual actions (intervention) and their implementation times on stopping the spread of Covid-19 and provides timely selection of suitable sequence of actions (intervention) [21]. The RNN for system identification is called an encoder (Figure 1) and the RNN for action selection and evaluation is called a decoder (Figure 2). The RNN encoder models input time series (past history of the number of cases of Covid-19 over time) and predicts future response time series (number of cases of Covid-19 in the future with a planned sequence of interventions). RNN encoder was explained in the previous section. Here, we focus on the RNN decoder. Unlike the standard decoder where the decoder reconstructs back the input time series from the latent representation, the RNN decoder uses the learned features of the dynamics of Covid-19 in the RNN encoder to forecast the counterfactual response time series, given a sequence of planned counterfactual public health interventions as an input to the RNN decoder. The feature vector learned in the RNN encoder is then provided as an input to the RNN decoder which initiate prediction of the future dynamics of Covid-19 under the future counterfactual interventions (Figure 2). The RNN decoder can be represented by the following set of equations: ![Formula][19] ![Formula][20] ![Formula][21] where *V**t+τ* is defined as before, τ ≥ 1. The algorithm for action (intervention) evaluation and selection are summarized in Supplementary Note A. ### Data Collection The analysis is based on surveillance data of confirmed cumulative and new Covid-19 cases worldwide as of June 28, 2020. Data on the number of cumulative and new cases and Covid-19-attributed deaths across 187 countries from January 22, 2020 to June 28, 2020 were obtained from John Hopkins Coronavirus Resource Center ([https://coronavirus.jhu.edu/MAP.HTML](https://coronavirus.jhu.edu/MAP.HTML)). ### Data Pre-processing Data were split into a training dataset (01/22-06/21, 2020) and validation dataset (06/22-06/28/2020). All the input number of lab-confirmed cumulative cases *Y**t* was pre-processed by the following transformation: ![Graphic][22]. The number of new cases was calculated as ![Graphic][23]. ### Minibatches, Normalization and RNRL Flowchart The RNRL algorithm flowchart was shown in Figure S1. We first randomly picked *k* = 64 countries with *l* + τ length of Covid-19 time series data staring from the same day to generate *k* time series with *l* + τ length for a minibatch that was used for backpropagation training through time. The *l* length of time series were taken to train the RNN encoder and the τ length of time series were taken to train the RNN decoder. Repeat the above training processes *n* times. After the RNN encoder and decoder were trained, the trained RNN encoder and decoder were used for forecasting and evaluation. The time series *y**t*−*l*+1, … *Y**t* were inputted into the trained RNN encode, while the RNN decoder were used to forecast the time series *y**t*+1, …, *y**t*+τ−1. Calculate the mean value of each time series in the batch. The values of each time series were divided by their mean values. ### Forecasting Procedures The trained RNN decoder was used to forecast the future number of new or cumulative cases of Covid-19 worldwide and for each country. The recursive multiple-step forecasting involved using a one-step model multiple times where the prediction for the preceding time step and intervention strategy were used as an input for making a prediction on the following time step. For example, for forecasting the number of new confirmed cases for one more next day, the predicted number of new cases and intervention measure in one-step forecasting would be used as an observational input in order to predict day 2. Repeat the above process to obtain the two-step forecasting. The summation of the final forecasted number of new or cumulative confirmed cases for each country was taken as the prediction of the total number of new or cumulative confirmed cases of Covid-19 worldwide. ## Results ### Prediction accuracy of the dynamics of Covid-19 using RNRL Accurate prediction of the transmission dynamics of Covid-19 is important for health decision making. To demonstrate that the RNRL was an accurate forecasting method, the RNRL was applied to the lab confirmed accumulated cases of Covid-19 across 187 countries. Figures 3 and 4 plotted reported and one-step ahead predicted time-case curves of Covid-19 in the world and top fifteen most-affected countries where blue and red curves were the number of reported and predicted cumulative cases, respectively. The top fifteen most-affected countries included US, Brazil, Russia, India, United Kingdom, Spain, Italy, Peru, France, Iran, Germany, Turkey, Chile, Mexico, and Pakistan. The average non-absolute and absolute of the one-step ahead prediction error in the world were 0.0572 and 0.0592, respectively. The average non-absolute and absolute of the one-step ahead prediction error in fifteen countries were 0.0213 and 0.0277, respectively. To further reliably evaluate the forecasting accuracy, we reported 7-step ahead forecasted numbers of cumulative cases and errors of Covid-19 worldwide and in 15 countries in Table 1 starting with June 22, 2020. The average forecasting error was 0.0197, ranging from 0.000016 to 0.087. View this table: [Table 1.](http://medrxiv.org/content/early/2020/07/10/2020.07.08.20149146/T1) Table 1. Forecasting errors of worldwide and 15 countries. ![Figure 3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/07/10/2020.07.08.20149146/F3.medium.gif) [Figure 3.](http://medrxiv.org/content/early/2020/07/10/2020.07.08.20149146/F3) Figure 3. Reported and predicted time-case curves of Covid-19 worldwide where blue curve and red curve were the number of reported and predicted cumulative cases, respectively. ![Figure 4.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/07/10/2020.07.08.20149146/F4.medium.gif) [Figure 4.](http://medrxiv.org/content/early/2020/07/10/2020.07.08.20149146/F4) Figure 4. Reported and predicted time-case curves of Covid-19 in top fifteen most-affected countries where blue curve and red curve were the number of reported and predicted cumulative cases, respectively. ### Transmission Dynamics of Top Fifteen Most-affected Countries Figures 5 and 6 plotted the reported and forecasted trajectory of the new and cumulative cases of Covid-19 in the top fifteen most-affected countries, respectively. Tables S1 and S2 listed one month forecasted number of new and cumulative cases of Covid-19 in the top fifteen most-affected countries, respectively. We observed several remarkable features. First, keeping the current intervention measure, all the top 15 most-affected countries have passed the peak. Second, the spread of Covid-19 in all the top most-affected countries except for Brazil and Chile was curbed. The forecasted number of new cases in 7 countries on July 14, 2020 was less than 1,000 (France: 143, Germany: 218, Iran 710, Italy: 56, Spain: 101, Turkey: 313, United Kingdom: 308), the number of cases in 6 countries was less than 10,000 (India: 5,502, Mexico: 1,835, Pakistan: 1,955, Peru: 2,177 and US: 8,676), and the number of two countries was larger than 10,000 (Brazil: 12,357 and Chile: 29,667). Third, the number of new cases in most of these countries decreased. The first derivatives of the new cases of Covid-19 in the 15 topmost-affected countries, starting from May 17, 2020 to July 14, 2020, were listed in Table S3. The number of new cases in ten countries decreased, the average decrease rates ranged from -175 (US) to -4 (Mexico). Although the average increase rates in Chile and Brazil were 148.946479 and 462.9452744, respectively, they decreased quickly from the peak. ![Figure 5.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/07/10/2020.07.08.20149146/F5.medium.gif) [Figure 5.](http://medrxiv.org/content/early/2020/07/10/2020.07.08.20149146/F5) Figure 5. The trajectory of the new cases of Covid-19 in the top fifteen most-affected countries ![Figure 6.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/07/10/2020.07.08.20149146/F6.medium.gif) [Figure 6.](http://medrxiv.org/content/early/2020/07/10/2020.07.08.20149146/F6) Figure 6. The trajectory of the cumulative cases of Covid-19 in the top fifteen most-affected countries. ![Figure 7.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/07/10/2020.07.08.20149146/F7.medium.gif) [Figure 7.](http://medrxiv.org/content/early/2020/07/10/2020.07.08.20149146/F7) Figure 7. The reported and forecasted curve of number of cumulative cases of Covid-19 worldwide where blue curve and red curve were the number of reported and predicted cumulative cases, respectively. ### Outbreak of Covid-19 worldwide continues to grow exponentially Although most European countries have almost stopped the spread of Covid-19 infections, outbreaks in Brazil, Chile, Russia, India, Peru, Mexico, and Pakistan are still growing fast. The spread of Covid-19 worldwide has not slowed down. The reported and forecasted curve of the number of cumulative cases of Covid-19 worldwide was shown in Figure 8. Table S4 summarized the number of cumulative and new cases of Covid-19 worldwide, starting from June 16, 2020 to August 14, 2020. We observed that the outbreak of Covid-19 worldwide is growing exponentially. ![Figure 8.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/07/10/2020.07.08.20149146/F8.medium.gif) [Figure 8.](http://medrxiv.org/content/early/2020/07/10/2020.07.08.20149146/F8) Figure 8. The estimated intervention measures of the top fifteen most-affected countries. On August 14, 2020, the number of new cases of Covid-19 worldwide increased from 192,343 to a frightening number of 313,380 and the number of cumulative cases of Covid-19 is exponentially growing from 8,226,804 to 26, 058,423 with an exponential growth rate 0.0195. The Covid-19 pandemic is a serious global health threat. Intervention strategies are urgently needed to stop the spread of Covid-19 in the countries where Coronavirus cases are rapidly growing. ### Intervention Measure Traditionally, the effects of the interventions on the transmission dynamics of Covid-19 can be investigated by the reproduction number *R**t* which measures the average number of individuals one affected individual will transmit the disease to. The reproduction number *R**t* is often used to determine the dynamic behavior of epidemics. Similar to the reproduction number, we defined an intervention measure *A**t* to control the spread of Covid-19. Intervention measure was a matric to quantify the degree of control of the intervention action. Figure 8 plotted the estimated intervention measure curves of the top fifteen most-affected countries as a function of time and Table S5 summarized the estimated intervention measures of the top fifteen most-affected countries. These results showed some patterns of dynamic changes in intervention measures. France, Germany, Iran, Italy, Spain, Turkey, and United Kingdom have flattened the curves of Covid-19 infections. The shape of the intervention measure curves of these seven countries characterized the trajectory of Covid-19 in these countries. The common feature of the seven curves was that both the intervention curve and the number of new case-time curve shared a similar trend. As the number of new cases of Covid-19 increased to peak values, the intervention measure also increased to peak value. When the number of new cases fluctuated around the peak, the intervention measure also stayed at the plateau for a short time. Then, when the number of new cases decreased toward a small number or zero, the intervention measure decreased and converged to a small stationary value (close to 0.2). Intervention measure and the number of new cases of Covid-19 were highly correlated. We observed that the peak values of intervention measures in Chile, Mexico, Pakistan and India were less than 0.8 and intervention measure curves of these four countries stayed for much longer time than the previously discussed seven countries which have almost stopped the spread of Covid-19. This indicated that the intervention measures to contain the spread of Covid-19 in Chile, Mexico and Pakistan were weak. The current intervention measures in Brazil, Chile, Peru, Mexico, Pakistan, Russia, US, and India were larger than 0.4. The Outbreak of Covid-19 cases across these countries gained stream. These countries still have a long way to go to contain the spread of Covid-19. To compare the intervention measure *A**t* with the reproduction number *R**t*, we downloaded the estimated reproduction number *R**t* from [https://github.com/lin-lab/COVID19-Rt/tree/master/initial\_estimates](https://github.com/lin-lab/COVID19-Rt/tree/master/initial_estimates), and presented Figure S2 that plotted the number curves as a function of time in the top fifteen most-affected countries. In general, the reproduction curves were fluctuated decreasing function. When outbreak of Covid-19 began, the reproduction number was in the top of the curve and much larger than 1. As time increased, the reproduction number decreased. When the reproduction number was less than 1, the number of new cases quickly converge to a very small number or to zero. Although the shape of the reproduction number curves are quite different from the intervention measure curves, the Spearman correlation coefficients between the intervention measures and reproduction number were large except for the countries where the spread of Covid-19 had not been well controlled (Table 2). View this table: [Table 2.](http://medrxiv.org/content/early/2020/07/10/2020.07.08.20149146/T2) Table 2. Correlation coefficients between intervention measure and reproduction number. ### Clustering Intervention Patterns of the Countries across the World Clustering algorithm and geographical information system GIS were used to analyze the intervention strategies of all 187 countries across the world. Clustering results would provide information about the spread pattern of the coronavirus across the countries and how to best combat Covid-19. All 187 countries were grouped into 10 clusters using k-means clustering algorithms and intervention measure time curves of the 187 countries across the world (Figure 9 and Table S6). ![Figure 9.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/07/10/2020.07.08.20149146/F9.medium.gif) [Figure 9.](http://medrxiv.org/content/early/2020/07/10/2020.07.08.20149146/F9) Figure 9. All 187 countries were grouped into ten clusters. The first, fourth and seventh clusters were the group of countries where the outbreak of Covid-19 was under control. The first cluster included 11 countries: Austria, Belgium, Canada, France, Germany, Iran, Israel, South Korea, Portugal, Switzerland, and Turkey. The fourth cluster included 15 countries: Australia, Bangladesh, Belarus, Denmark, Ecuador, Finland, Ireland, Japan, Kuwait, Luxembourg, Malaysia, Netherlands, Qatar, Singapore and Sweden. The seventh cluster included 3 countries: Italy, Spain, and the United Kingdom. The ninth cluster was the group of countries where the outbreak of Covid-19 was not well controlled but was attempted to be controlled. The ninth cluster (Brazil, India, Russia, and US) were the top 4 most-affected countries. The eighth and third clusters were the group of countries where the outbreak of Covid-19 was recently gaining steam. The eighth cluster included 18 countries: Chile, Colombia, Czechia, Dominican Republic, Egypt, Ghana, Indonesia, Mexico, Moldova, Norway, Pakistan, Philippines, Poland, Romania, Saudi Arabia, Serbia, Tajikistan, and Ukraine. The third cluster included 16 countries: Afghanistan, Argentina, Armenia, Azerbaijan, Bahrain, Bolivia, Cameroon, Guatemala, Honduras, Iraq, Kazakhstan, Nepal, Nigeria, Oman, Panama, and South Africa. Tables S7 and S8 showed that the number of cumulative cases of Covid-19 in these countries recently surged. Countries in the second and fifth clusters were less affected. ## Discussion When the cases of Covid-19 still surge worldwide and the coronavirus gains steam in some countries, planning and implementing strong public health interventions are urgently needed. As an alternative to the epidemiologic transmission models, we developed the RNRL method to help health officers plan public health interventions and combating the spread of Covid-19. We viewed interventions to stop the spread of Covid-19 as actions to control the states of dynamic system and intervention plan as the design of optimal control. A key step for optimal control design was identification of the dynamic system. Therefore, we integrated the identification of the dynamic system underlying Covid-19 and formulated a planning intervention strategy problem as a novel RNRL problem which included recurrent neural network-based reinforcement learning. The RNRL can learn the complex dynamics within the temporal ordering of input time series of Covid-19 and develop suitable interventions for containing the Covid-19. In this study, we presented a new concept of intervention measure. To improve interpretation of the intervention measure, we compared the intervention measure with the reproduction number. In general, the correlation coefficients between the intervention measure and reproduction number was high except for the less controlled countries. Intervention measure quantified the strength of intervention (control action), while reproduction number measured the state of the spread of Covid-19 being controlled, i.e., measures how well the spread of Covid-19 was curbed. In other words, intervention measure is to quantify how strong the action is, while the reproduction number is to study the effect or the response of intervention. Intervention measure is complimentary to the reproduction number. The world is in the crossroad of combating the rapid spread of Covid-19. The RNRL provided a powerful tool for fighting the surge of Covid-19 worldwide. The dynamic system consists of two essential components. One is the state of the system and the second is action taken. The evolution of the dynamic system highly depends on a sequence of actions. Actions influencing the dynamics of Covid-19 cannot be directly measured or observed. In this report, we proposed to use an intervention measure to quantify the actions. The intervention measure was estimated. The intervention measure curve characterized the dynamics of Covid-19 and can be used to assess the stages of the spread of Covid-19 and strength of the control. The intervention measure curves were used to cluster 187 countries into five basic groups: the well-controlled group (31 countries), being controlled group (4 countries), newly surged group (34 countries), and less affected group (119 countries). Although the number of cumulative cases of Covid-19 worldwide passed 10 million, if the less controlled and newly surged groups of countries continuously strengthen interventions, our analysis demonstrated that the spread of Covid-19 worldwide will be finally stopped. We are confident that we will win the combat to contain the Covid-19. Since the politics and economics strongly affect the dynamics of Covid-19, the evolutionary trajectories of Covid-19 in most countries will be uncertain. The accuracy of long-term forecasting of Covid-19 may not be very high. However, accuracy of short-term estimation of the number of new cases can be quite good. We suggest that every 10 days we update the data and run the RNRL to forecast the trajectory of Covid-19 in 15 days or one month. ## Data Availability Data on the number of confirmed and new cases of Covid-19 from January 22, 2020 to June 28, 2020 were obtained from the John Hopkins Coronavirus Resource Center (https://coronavirus.jhu.edu/MAP.HTML). [https://coronavirus.jhu.edu/map.html](https://coronavirus.jhu.edu/map.html) ## Conflict of interest We have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. ## Supplementary Figure Legend **Figure S1**. RNRL algorithm flowchart. **Figure S2**. The reproduction number curves as a function of time in the top fifteen most-affected countries. ## Acknowledgements Dr. Li Jin was partially supported by National Natural Science Foundation of China (91846302). Dr. Wei Lin is supported by the National Key R&D Program of China (Grant no. 2018YFC0116600), the National Natural Science Foundation of China (Grant no. 11925103), and by the STCSM (Grant no. 18DZ1201000). Many thanks to Ms. Sara A. Barton for editing and the Texas Advanced Computing Center for computation support. * Received July 8, 2020. * Revision received July 8, 2020. * Accepted July 10, 2020. * © 2020, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution 4.0 International), CC BY 4.0, as described at [http://creativecommons.org/licenses/by/4.0/](http://creativecommons.org/licenses/by/4.0/) ## References 1. 1.Hartley DM, Perencevich EN. Public healthiInterventions for COVID-19 merging evidence and implications for an evolving public health crisis. JAMA. 2020;323(19):1908–1909. 2. 2.Hsiang S, Allen D, Annan-Phan S, Bell K et al. The effect of large-scale anti-contagion policies on the COVID-19 pandemic. medRxiv 2020.03.22.20040642. [https://www.medrxiv.org/content/10.1101/2020.03.22.20040642v4.full.pdf](https://www.medrxiv.org/content/10.1101/2020.03.22.20040642v4.full.pdf). 3. 3.Li Q, Guan X, Wu P, Wang X, Zhou L, Tong Y et al. Early transmission dynamics in Wuhan, China, of novel Coronavirus-infected pneumonia. N Engl J Med. 2020; 382(13):1199–1207. doi:10.1056/NEJMoa2001316. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMoa2001316&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=31995857&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F07%2F10%2F2020.07.08.20149146.atom) 4. 4.Wu JT, Leung K, Leung GM. Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study. Lancet. 2020; 395(10225):689–697. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0140-6736(20)30260-9&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32014114&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F07%2F10%2F2020.07.08.20149146.atom) 5. 5.Zhao S, Musa SS, Lin Q et al. Estimating the unreported number of novel Coronavirus (2019-nCoV) cases in China in the first half of January 2020: A data-driven modelling analysis of the early outbreak. J Clin Med. 2020;9(2):388. doi:10.3390/jcm9020388. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.3390/jcm9020388&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32024089&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F07%2F10%2F2020.07.08.20149146.atom) 6. 6.Kucharski A, Russell T, Diamond C, Liu Y, CMMID nCoV working group, Edmunds J, Funk S, Eggo R. Analysis and projections of transmission dynamics of nCoV in Wuhan. 2020; [https://cmmid.github.io/ncov/wuhan\_early\_dynamics/index.html](https://cmmid.github.io/ncov/wuhan_early_dynamics/index.html). 7. 7.Tuite AR, Fisman DN. Reporting, epidemic growth, and reproduction numbers for the 2019 novel coronavirus (2019-nCoV) epidemic. Ann Intern Med. 2020; 172(8):567–568. doi:10.7326/M20-0358. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.7326/M20-0358&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32023340&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F07%2F10%2F2020.07.08.20149146.atom) 8. 8.Hellewell J, Abbott S, Gimma A, et al. Centre for the Mathematical Modelling of Infectious Diseases COVID-19 Working Group, Funk S1, Eggo RM2. Feasibility of controlling COVID-19 outbreaks by isolation of cases and contacts. Lancet Glob Health 2020; 8(4):e488–e496. doi:10.1016/S2214-109X(20)30074-7. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S2214-109X(20)30074-7&link_type=DOI) 9. 9.Project: COVID-19 Forecasts. [https://zoltardata.com/project/44](https://zoltardata.com/project/44). 10. 10.Funk S, Camacho A, Kucharski AJ, Eggo RM, Edmunds WJ. Real-time forecasting of infectious disease dynamics with a stochastic semi-mechanistic model. Epidemics. 2018; 22:56–61. 11. 11.Johansson MA, Apfeldorf KM, Dobson S et al. An open challenge to advance probabilistic forecasting for dengue epidemics. Proc Natl Acad Sci U S A. 2019; 116: 24268–24274. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czoxMjoiMTE2LzQ4LzI0MjY4IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMDcvMTAvMjAyMC4wNy4wOC4yMDE0OTE0Ni5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 12. 12.He Z. What Further Should Be Done to Control COVID-19 Outbreaks in addition to cases isolation and contact tracing measures? BMC Med. 2020; 18(1):80. doi:10.1186/s12916-020-01551-8. PMID: 32164708. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s12916-020-01551-8&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32164708&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F07%2F10%2F2020.07.08.20149146.atom) 13. 13.Roosa K, and Chowell G. Assessing parameter identifiability in compartmental dynamic models using a computational approach: application to infectious disease transmission models. Theor Biol Med Model. 2019; 16: 1. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s12976-018-0097-6&link_type=DOI) 14. 14.Gábor A, Villaverde AF, Banga JR. Parameter identifiability analysis and visualization in large-scale kinetic models of biosystems. BMC Syst Biol. 2017;11(1):54. 15. 15.Yin M, Tucker G, Zhou M, Levine S, Finn C. Meta-learning without memorization. 2020; arxiv:1912.03820. 16. 16.Kaiser L, Babaeizadeh M, Milos P. et al. Model-based reinforcement learning for Atari. 2020; arxiv:1903.00374. 17. 17.Zhu H, Yu J, Gupta A. et al. The ingredients of real-world robotic reinforcement learning. 2020; arxiv:2004.12570. 18. 18.Chandak Y, Theocharous G, Kostas J, Jordan S, Thomas PS. Learning action representations for reinforcement learning. 2019; arxiv:1902.00183. 19. 19.Tang S, Modi A, Sjoding M, Wiens J. Clinician-in-the-loop Decision Making: Reinforcement Learning with Near-Optimal Set-Valued Policies. ICML 2020. 20. 20.Ge Y, Zhu F, Ling X and Liu Q. Safe Q-Learning Method Based on Constrained Markov Decision Processes. 2019; IEEE Access. 7:165007-165-17. doi:10.1109/ACCESS.2019.2952651. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1109/ACCESS.2019.2952651&link_type=DOI) 21. 21.Lim B, Alaa AM, and van der Schaar M. Forecasting treatment responses over time using recurrent marginal structural networks. Adv Neural Inf Process Syst. 2018; 7493–7503. 22. 22.Bica I, Alaa AM, Jordon J, van der Schaar M. Estimating Counterfactual Treatment Outcomes over Time Through Adversarialy Balanced Representations. arXiv. 2020; 2002.04083. 23. 23.Schäfer AM. Reinforcement learning with recurrent neural networks. 2008; pdfs.semanticscholar.org. 24. 24.Schüssler M, Münker T, Nelles O. Deep Recurrent Neural Networks for Nonlinear System Identification. IEEE Symposium Series on Computational Intelligence (SSCI), Xiamen, China, 2019; pp. 448–454, doi:10.1109/SSCI44817.2019.9003133. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1109/SSCI44817.2019.9003133&link_type=DOI) 25. 25.Liu GH and Theodorou EA. Deep learning heory review: An optimal control and dynamical systems perspective. arXiv. 2019;1908.10920. 26. 26.Bibaut A, Malenica I, Vlassis N, Van Der Laan M. More efficient off-policy evaluation through regularized targeted learning. 2019; Proceedings of the 36th International Conference on Machine Learning, PMLR 97:654–663. 27. 27.Sutton RS, Barto AG. Reinforcement Learning: An Introduction. Second Edition, 2018; MIT Press, Cambridge, MA. 28. 28.Salehinejad H, Sankar S, Barfett J, Colak E, Valaee S. Recent advances in recurrent neural networks. arXiv. 2018; 1801.01078. 29. 29.Oster M, Sallandt L, Schneider R. Approximating the stationary Hamilton-Jacobi-Bellman Equation by hierarchical tensor products. arXiv. 2020;1911.00279. 30. 30.Buesing L, Weber T, Zwols Y, Racaniere S, Guez A, Lespiau JB, Heess N. Woulda, coulda, shoulda: counterfactually-guided policy search. arXiv. 2018; 1811.06272. [1]: /embed/graphic-1.gif [2]: /embed/graphic-2.gif [3]: /embed/graphic-3.gif [4]: /embed/graphic-4.gif [5]: /embed/graphic-5.gif [6]: /embed/graphic-6.gif [7]: /embed/graphic-7.gif [8]: /embed/graphic-8.gif [9]: /embed/graphic-10.gif [10]: /embed/graphic-11.gif [11]: /embed/graphic-12.gif [12]: /embed/inline-graphic-1.gif [13]: /embed/inline-graphic-2.gif [14]: /embed/graphic-13.gif [15]: /embed/graphic-14.gif [16]: /embed/graphic-15.gif [17]: /embed/graphic-16.gif [18]: /embed/inline-graphic-3.gif [19]: /embed/graphic-18.gif [20]: /embed/graphic-19.gif [21]: /embed/graphic-20.gif [22]: /embed/inline-graphic-4.gif [23]: /embed/inline-graphic-5.gif