A variational model for computing the effective reproduction number of SARS-CoV-2
=================================================================================

* Luis Alvarez
* Miguel Colom
* Jean-Michel Morel

## Abstract

We propose a variational model for computing the effective reproduction number (ERN) of SARS-CoV-2 from the daily count of incident cases. This computation only requires the knowledge of the serial interval. The ERN estimate is made through the minimization of a functional that includes: (i) the adjustment of the incidence curve using an epidemiological model, (ii) the regularity of the estimation of the ERN and, (iii) the adjustment of the initial value to an initial estimate of *R* obtained from the initial exponential growth of the epidemic. The model does not assume any statistical distribution for the ERN and, more importantly, does not require truncating the serial interval when its distribution contains negative days. A comparative study has been carried out with the standard EpiEstim method. For a particular choice of the parameters of the variational model and of the serial interval, a good agreement has been obtained between the estimate provided by the variational model and a 7 days shifted estimate obtained by EpiEstim. This backward shift suggests that our estimate is closer to present than that of EpiEstim. We also examine how to forecast the value of the ERN and the number of infected in the short term by two different extrapolation techniques. An implementation of the model is available online at [www.ipol.im/ern](http://www.ipol.im/ern).

## 1 Introduction and previous work

A key epidemiological parameter to evaluate the time varying transmission rate of a disease is the effective reproduction number (ERN), (also denoted in this paper by *R**t* or *R*(*t*)), defined as the expected number of secondary cases produced by a primary case at each time *t*. The computation of an effective, or instantaneous, reproduction number is much more problematic than its global estimate, *R*0, on a large period where the pandemic runs free. In [4] for example, the reproduction number of the Spanish influenza was estimated from daily case notification data using several variants of a SEIR model. This estimate was based on a long period, was therefore not time dependent as it should be in periods where lock-down strategies or other distancing measures are being applied. We refer to [12] for a comparison of strategies to compute *R*0 and *R**t*.

The key ingredient of the estimation of *R*(*t*) is a reproduction formula linking the incidence *i*(*t*) to *R*(*t*). Here a caveat must be formulated. We shall reason as though *i*(*t*) denoted the total number of new cases. But, in practice, the detected infected are only a portion of *i*(*t*). Hence all formulas below rely on the assumption that the daily count of detected infected is actually proportional to the (unavailable) daily count of real infected. This assumption is actually not true, as it is strongly influenced by detection strategies. But if the detection policies evolve slowly, the arguments and calculations below remain valid.

The reproduction formula requires the knowledge of the serial interval function Φ(*s*), which models the time between the onset of symptoms in a primary case and the onset of symptoms in secondary cases. This formula linking *i*(*t*), *R*(*t*) and Φ(*s*) goes back at least to Nishiura 2007 [10]. It writes ![Formula][1]</img> 

The only assumption underlying the reproduction formula (1) is that the serial interval depends only on biological factors, which is reasonable. If we assume that *R*(*t*−*s*) is locally constant and equal to *R*(*t*), that is *R*(*t*−*s*) ≡ *R*(*t*) for *s* such that Φ(*s*) > 0, then the above expression becomes ![Formula][2]</img> 

This expression has been used in the literature by several authors to estimate *R*(*t*) [2], [3]. In its stochastic Poisson formulation, it is a widely used strategy to compute *R*(*t*) using (2) (see [13], [8] or [5]). In that case, the formula is given in stochastic form, assuming *i*(*t*) follows a Poisson model (see [5], [12] [13]). Then the second member of (2) is taken to be the expectation of the Poisson model.

In [1] the problem of estimating *R*(*t*) by maximum likelihood estimation of *L* is complemented by a piecewise regularity term for *R*(*t*), instead of using a Bayesian framework. This regularity term in the variational model is complemented by a spatial regularity term to ensure that neighboring French districts have similar values for *R*(*t*). One of the most widely used methods to estimate *R*(*t*) is the one proposed by Cori et al. in [5]. The authors show that if the expectation of *i*(*t*) is given by ![Graphic][3]</img> and *R*(*t*) is assumed to follow a gamma prior distribution, then an analytical expression can be obtained for the posterior distribution of *R*(*t*). To obtain a more regular estimate they compute *R*(*t*) in a time window of size *τ* ending at time *t*, assuming that *R*(*t*) is locally constant is that window. This method is implemented in the EpiEstim R package and it is also provided as a Microsoft Excel spreadsheet.

In this paper we use the epidemiological model (1) rather than the simplified version (2) used by EpiEstim. One important difference between both formulations is that the estimation of *R*(*t*) using the simplified model is shifted with respect to the estimation using the model (1). The reason is that if we replace *R*(*t*−*s*) in equation (1) by a constant value, it would be more accurate to replace *R*(*t*−*s*) by a shifted back value *R*(*t* −*µ*) than by *R*(*t*), as *t* is the end of the integration interval. In other words, it would be more accurate to replace equation (2) by ![Formula][4]</img>  where *µ* is the center of mass of the serial distribution Φ(*s*). So the assumption about the expectation of *i*(*t*) should be ![Graphic][5]</img>. It follows that we can expect to observe a shift between an *R**t* estimate using the original model (1) and the one obtained using its simplification (2). Our experiments here reveal a clear shift between our estimation and the one obtained by EpiEstim, going up to seven days. This shift suggests that our estimate is more to date than the one proposed by EpiEstim.

We now discuss what serial interval functions Φ are available for SARS-CoV-2. As we saw, the *serial interval* in epidemiology refers to the time between successive observed cases in a chain of transmission. Du et al. in [6] define this interval as follows: The serial interval is defined as the time duration between a primary case (infector) developing symptoms and secondary case (infectee) developing symptoms.

Hence, by a careful inquiry on many pairs of patients, where one is the probable cause of the infection of the other, one may obtain the distribution of the serial interval in practice, as it has been done by Du et al. in [6] on 468 cases. The authors of this paper recall that *this quantity cannot be inferred from daily case count data alone [14]*. Moreover, the observed serial distribution in [6] had a significant number of cases on negative days, meaning that the infectee had developed symptoms up to 10 days before the infector.

In [7], the serial interval is defined as the length of time a person is contagious. It can be estimated by tracking contacts (i.e., infector-infected pairs) and by counting the number of days between the dates of onset of symptoms in the infecting and infected individuals respectively.

In the experiments presented in this paper we shall use three serial intervals: the one obtained by Du et al. in [6] using 468 cases, a serial interval obtained by Nishiura et al. in [11] using 28 cases which is approximated by a log-normal distribution, and a serial interval obtained by Ma et al. in [9] using 1155 cases. As proposed by the authors this serial interval has been approximated by a shifted log-normal to take into account the cases in the negative days. In Fig. 1 we show the profile of these serial intervals. Of course, more accurate estimates of the serial interval for the SARS-CoV-2 can be expected in the future. In the online interface ([www.ipol.im/ern](http://www.ipol.im/ern)) the users can, optionally, upload their own distribution for the serial interval.

![Figure 1:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/08/04/2020.08.01.20165142/F1.medium.gif)

[Figure 1:](http://medrxiv.org/content/early/2020/08/04/2020.08.01.20165142/F1)

Figure 1: 
Serial intervals used in our experiments: the discrete one proposed by Du et al. in [6] (solid bars), a log-normal approximation of the serial interval proposed by Nishiura et al. in [11] (dotted line) and a shifted log-normal approximation of the serial interval proposed by Ma et al. in [9] (dashed line).

## 2 A new variational model to compute *R**t*

Equation (1) was originally formulated for serial interval functions Φ(*s*) satisfying Φ(*s*) = 0 for *s*≤ 0, but this is definitely not true for the SARS-CoV-2. Hence, to avoid an artificial truncation of the serial interval function, we adopt the obvious generalization of this equation as ![Formula][6]</img> 

In that way, the integration is performed on the whole support of the serial interval Φ(*s*). In practice, we deal with an incidence curve observed itself on a limited interval, up to present. Hence, boundary conditions including days in the future will be requested to apply the above formula to any point of the incidence curve. Our method requires the observation of:

*   the incidence curve, namely the daily count of new detected cases of SARS-CoV-2 infections, denoted as *i**t* = *i*(*t*) on day *t*.

*   an empirical probability distribution Φ = (Φ*f*0, …, Φ*f*) for the serial interval. We assume that a patient can show symptoms up to *f* days before the person who contaminated him/her shows symptoms himself/herself. So we have *f* = −4 for the Ma et al. serial interval, *f* = 0 for Nishiura et al. and *f* = −10 for Du et al. The discrete support of Φ is therefore contained in the interval [*f*, *f*].

We shall use the straightforward discretization of Equation (4), ![Formula][7]</img>  where *R**t* represents the discrete version of *R*(*t*), *t* = 0 is the time where the infection number starts to grow and *t**c* the current time. This equation is inserted in a variational model to estimate *R**t* by minimizing the energy ![Formula][8]</img>  where *p*90(*i*) is the 90th percentile of ![Graphic][9]</img> used to normalize the energy with respect to the size of *i**t*. The first term of *E* is a data adjustment term which forces equation (5) to be satisfied as much as possible. The second term forces *R**t* to be a smooth curve; *w**t*≥0 represents the weight of the regularization at each time *t*. The higher the value of *w**t* the smoother *R**t*. The last term of *E* forces *R* to be close to an initial estimate given by ![Graphic][10]</img>. Finally, *β* is a weight that determines the confidence we have in the initial estimate ![Graphic][11]</img>. The larger *β*, the greater this confidence.

Minimizing the energy *E* leads to satisfy approximately the epidemiological model (5) with a reasonably smooth *R**t* and a prescribed initial value *R*. The parameters *w**t* and *β* determine the importance assigned to these constraints in the estimation.

Minimizing *E* with respect to the sequence {*R**t*} yields a linear system of equations that is easily solved, if it is complemented with adequate boundary conditions for *R**t* and *i**t* on both ends of the observation interval.

### Definition of the boundary conditions

For *R*(*t*), we will always assume that *R*(*t*) = *R*(0) for *t* < 0 and *R*(*t*) = *R*(*t**c*) for *t* > *t**c* (the current time). Concerning *i*(*t*) we will assume a linear behavior when *t* > *t**c* and for *t* < 0 we will assume that the cumulative number of infected detected ![Graphic][12]</img> follows an exponential growth for *t* < 0, that is *I*(*t*) = *I*(0)*e**at*, where *a* represents the initial exponential growth rate of *I*(*t*) at the beginning of the infection spread. In summary, the extension of *i*(*t*) beyond the observed interval [0, *t**c*] is defined by ![Formula][13]</img> 

### Computation of the initial exponential growth a and ![Graphic][14]</img>

We now naturally estimate *a* by ![Formula][15]</img> 

If we assume that *I*(*t*) = *I*(0)*e**at* follows initially an exponential growth and that *R**t* is initially constant, then using equation (5) we obtain that ![Formula][16]</img> 

Hence, we can compute an approximation of *R* as ![Formula][17]</img> 

Note that this estimation strongly depends on the serial interval used. For instance for *a* = 0.2 we obtain that ![Graphic][18]</img> for the Nishiura et al. serial interval, ![Graphic][19]</img> for the Ma et al. serial interval and ![Graphic][20]</img> for the Du et al. serial interval.

We assumed here that we would compute *R*(*t*) from the beginning of the epidemic’s spread to the present day. Since the epidemic is likely to be with us for a long period of time, note that it is also possible to use our model to start calculating *R*(*t*) from any ulterior time *t*1, in which case the user can provide an initial value of *R*(*t*1).

### Management of the regularization weight *w**t*

Initially we choose *w**t* *≡ w*, where *w* is a constant value. We noticed that by minimizing the energy (6) one can obtain negative values *R**t* for some *t*. In such cases, we increase iteratively the value of the regularization weights at such points and their neighbors and we compute again the minimum of (6). More precisely for any *t**k* such that ![Graphic][21]</img> < 0, we update ![Graphic][22]</img> and then recompute the minimum of the energy (6). This operation is performed until all *R**t*’s become positive. We observed experimentally that this objective is reached in a few iterations.

### Filtering “administrative noise”

The raw data curve *i**n* is extraordinarily noisy, and the *administrative noise* has unfortunately little to do with the Poisson noise used in most aforementioned publications. Government statistics are affected by changes of testing and polling policies, political decisions, and week-end reporting delays. Here is for example a list of explanations for the undue peaks (and even negative counts) in official cases statistics in France1:

*   A new laboratory transmits data since May 4, retrospectively from March 16. The new number of cases in the last 24 hours takes this into account.

*   The increase in cases compared to data of the previous day is an aggregation of additional data from 13th May, previously not taken into account.

*   Some positive patients were counted twice, this is no longer the case, therefore the decrease in cases compared to data of the previous day.

These recording delays and consecutive rash corrections make a peculiar feature of such time series that we call *administrative noise*. They result in strong impulse noise, together with a “week-end” 7-periodic noise. These noises clearly dominate the alleged Poisson noise inherent in any counting procedure.

To remove the bulk of such impulses, we (optionally) filter the raw time series of infected with a sliding median filter with window size radius *w**R* and a linear Gaussian filter with standard deviation *σ*. The application of the optional median filter is especially important in cases where singular data appear due to strong adjustments of the infected values from one day to the next. This is the case, for example, in France, the United Kingdom or Spain, but it is less necessary for larger countries such as the United States or India. A more local Gaussian filtering is also optionnally applied. It is less aggressive and can be used to smooth the data a bit.

### Summary of the algorithm computing *R**t*

*   Step 1: Pre-processing of the input data (detected infected): filtering of the data to reduce the administrative noise and to take into account that currently, some countries are not providing new data during the week-end.

*   Step 2: Estimation of the time *t* where the accumulated data, *I**t*, starts to grow in an exponential way. We use a very basic algorithm where we impose that *I**t*+1 > 1.1*I**t* in two consecutive days. More formally, we set ![Formula][23]</img> 
    
    Once *t* is computed the previous data sequence is removed so that *t* becomes 0.

*   Step 3: Compute the initial exponential growth rate *a* and an initial estimation of *R* using (8) and (10).

*   Step 4: Initial computation of *R**t* by minimizing the energy (6). If any of the computed *R**t* is non-positive we increase locally the regularization weight *w**t* and we iterate the minimization of (6) as explained above.

## 3 Short time forecasting of *R**t* and *i**t*

Since the variational method provides a point estimate of *R**t* up to the current time *t**c*, by extrapolating the value of *R* into the future, it is possible, using equation (5), to obtain a forecast of the number of infected *i**t* in the near future. We considered two approaches to extrapolate *R**t*. In the first one we extrapolate *R**t* beyond the current time *t**c* assuming that the evolution of *R**t* is going to be similar to that of the last days and we use a basic harmonic oscillator model to fit the evolution in the last days and to extrapolate *R**t*. In the second approach we allow the user to provide the expected constant new value of *R**t* in the future and how many days are required to reach such new value.

### Extrapolation of *R**t* using the harmonic oscillator model

We use the following damped harmonic oscillator: ![Formula][24]</img>  the harmonic oscillator parameters *c, d* and ![Graphic][25]</img> are computed by fitting *R*(*t*) to the solution of the harmonic oscillator (see Appendix for technical details).

### User interactive extrapolation of *R**t*

In this case the user provides the expected new value *R*1 = *R*(*t**c* + *N**d*), where *N**d* is the number of days required to reach this new value. To obtain a smooth transition between the current value *R*(*t**c*) and *R*1 we use a basic Hermite interpolation polynomial (see Appendix for technical details).

## 4 Experiments

All of the experiments made here can be reproduced with the online demo available at [www.ipol.im/ern](http://www.ipol.im/ern).

### Summary of algorithm parameters

*   *i**t*: input data with the daily registered infected curve.

*   *r**W* : radius of the median filter window for data filtering. If the value of this parameter is zero no median filter is applied. The default value is 3, taking into account the weekly period.

*   *σ*: standard deviation of the Gaussian linear filter for data filtering. If the value of this parameter is zero no Gaussian filter is applied. Our default value is 2.

*   *w*: regularization weight in the energy (6).

*   *β*: weight in the energy (6) for the initial estimation of *R* computed using (10).

*   *R*1: expected value of *R*(*t**c*+*N**d*) for forecasting (in the case of user interactive forecasting).

*   *N**d*: Number of days to reach the value *R*1 (in the case of user interactive forecasting).

*   Serial interval used: by default we propose 3 options: the serial intervals obtained by Ma et al., by Nishiura et al. and by Du et al‥ The users can also upload their own serial interval.

We shall pay particular attention to the comparison with EpiEstim, the method proposed by Cori et al. in [5] that we have explained briefly in the introduction. It is one of the most widely used methods to estimate *R*(*t*). For the EpiEstim method we used, initially, the default parameters proposed in the Microsoft Excel spreadsheet implementation of the method provided by the authors. In particular; they use a 7 day time interval size to estimate *R*(*t*). We compared the results obtained by our variational method and the ones obtained by EpiEstim for four countries: France, the United Kingdom, Spain and the United States. We used the data of infected reported by the countries up to July 23, 2020 that we obtained from the European Centre for Disease Prevention and Control service. Since EpiEstim does not allow for negative values we replaced any negative value by zero in the data sequence. For several weeks France and Spain have not provided data during the weekends and have given instead on Mondays a cumulative count of three consecutive days. To avoid the artificial noise generated in the event that no median filter is applied to the sequence, the accumulated value of the 3 days divided by 3 was assigned to Saturday, Sunday and Monday.

We compared the results of our method and EpiEstim after applying (for both methods) a median filter and a Gaussian filter to the data, and we also compared them without applying this pre-filtering step. In Fig. 2 we show both data sequences for the four countries.

![Figure 2:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/08/04/2020.08.01.20165142/F2.medium.gif)

[Figure 2:](http://medrxiv.org/content/early/2020/08/04/2020.08.01.20165142/F2)

Figure 2: 
Data of daily infected patients used for comparison with EpiEstim: we show, for the four countries, in orange, the data we used when no filtering was applied and in blue the data after applying a median filter with *r**W* = 3 and a Gaussian filter with *σ* = 2.

EpiEstim does not allow for a serial interval distribution with positive values at 0 or on negative days, so for comparison purposes we used the serial interval of Nishiura et al., as it is the one requiring a minimal truncation when removing the non-positive days from the serial interval distribution. In Fig. 3 the results obtained by both methods are presented when a median and Gaussian pre-filter were applied. In the case of the USA we divided by 10 the number of all infected *i**t* before using EpiEstim because otherwise the method did not work (likely the values of *i**t* for USA are too high for the particular implementation in the Excel spreadsheet). The estimate of *R*(*t*) by the variational method is invariant under this kind of data transformation. Surprisingly, despite the fact that both methods are quite different, a very good fit of both estimates in the four countries was observed, after the EpiEstim estimate has been shifted 7 days.

![Figure 3:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/08/04/2020.08.01.20165142/F3.medium.gif)

[Figure 3:](http://medrxiv.org/content/early/2020/08/04/2020.08.01.20165142/F3)

Figure 3: 
Comparison with EpiEstim including data filtering. We show *R*(*t*) obtained using the proposed variational model (in blue) and EpiEstim (in orange) using the data filtered with *r**W* = 3 and *σ* = 2. We used the Nishiura et al. serial interval. For the variational method we used *w* = 10−1.2 and *β* = 105. The results of EpiEstim are shifted 7 days to fit the ones obtained by the variational technique.

It might be argued that if the data is pre-filtered, a smaller time interval could be used in EpiEstim. Fig. 4 shows the result obtained using EpiEstim with a time interval of size 1. It is observed, as expected, that the estimate of *R*(*t*) is somewhat more irregular but due to the filtering, it is quite similar to those obtained for a time interval of size 7 and in this case the translation with respect to the results of the variational method is reduced to 4 days. In other terms, as expected, the smaller the time interval, the smaller the backward shift of the estimate of *R**t* with respect to the current date. In summary, two factors intervene in the shift between the estimation of *R**t* by the variational model and by EpiEstim: a first shift, as explained in the introduction, occurs by assuming *R**t* locally constant in the model (2) and a second shift occurs when estimating *R**t* in a time interval.

![Figure 4:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/08/04/2020.08.01.20165142/F4.medium.gif)

[Figure 4:](http://medrxiv.org/content/early/2020/08/04/2020.08.01.20165142/F4)

Figure 4: 
Comparison with EpiEstim applied with a smaller time interval. We show the same results as in Fig. 3 with the filtered data but we use a time interval of size 1 to apply EpiEstim. For the variational model we use *w* = 10−1.6 (the estimation with EpiEstim is shifted 4 days).

In Fig. 5 we present the *R*(*t*) estimates obtained by both methods when no pre-filtering was applied to the data. In that situation, the weight of the regularization was increased in our method to compensate for the lack of regularity of the data. The fit of both estimates is relatively good, but less exact than after applying our pre-filtering. This is a reasonable outcome. Indeed, the more irregular the curve *R*(*t*), the less accurate the assumption that *R*(*t*) is locally constant. Hence the model (1) used in our method and its simplified version (2) used in EpiEstim are less coherent with each other. Comparing Fig. 2 and 5, we notice that for the USA the original data *i*(*t*) is more regular than in the other countries (likely because the numbers in USA are much larger than in the other countries; hence the administrative noise is smoothed out). So we find a better agreement between the estimate of *R**t* by both methods.

![Figure 5:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/08/04/2020.08.01.20165142/F5.medium.gif)

[Figure 5:](http://medrxiv.org/content/early/2020/08/04/2020.08.01.20165142/F5)

Figure 5: 
Comparison with EpiEstim without data filtering. We show the same results as in Fig. 3 but without data filtering and using *w* = 1 (the estimation of EpiEstim is shifted 7 days).

In Fig. 6 we show the results of two different forecasting strategies using an extrapolation of *R*(*t*). In the appendix we present additional experiments to illustrate the influence of the different parameters of the variational model.

![Figure 6:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/08/04/2020.08.01.20165142/F6.medium.gif)

[Figure 6:](http://medrxiv.org/content/early/2020/08/04/2020.08.01.20165142/F6)

Figure 6: 
Forecasting. We show a 14 day forecast for *R*(*t*) and *i*(*t*) (we used the data up to July 23 and we forecast from July 24 to August 6). In blue, we show a forecast using the harmonic oscillatory model. In orange, we show a forecast with an objective *R*(*t*) value given by *R*1 = *R*(*t**c* + 21) = 1.15 and in grey we use *R*1 = *R*(*t**c* + 21) = 0.5. In both cases the number of days to reach the objective value is 21. For the variational model, we used as parameters *r**W* = 3, *σ* = 2, *β* = 105 and the Du et al. serial interval.

## 5 Conclusion

In this paper we proposed a variational model given by the expression (6) for computing the effective reproduction number *R**t* of SARS-CoV-2 using the daily registered infected and the serial interval. The main advantages we found with this method are:

*   It is based on a known epidemiological model (given by the equation (1)) which establishes how *R*(*t*) and the serial interval intervene in the evolution of the number of incident cases. The method does not involve the “naive” simplification which assumes *R*(*t*) to be locally constant. Our method does not assume that the observation noise is Poisson, because it plainly isn’t.

*   The method can use serial intervals with distributions containing negative days (as it is the case for the SARS-CoV-2). Thus, we avoid an artificial truncation of the distribution.

*   The method computes a point estimate of *R**t* up to the current date. It seems to provide a more to date (by up to 7 days) estimate of *R**t* than EpiEstim, which is based on the model (2) and a time interval estimation.

*   The method does not assume any statistical distribution for *R**t*. The main assumptions are that the *R**t* estimate should follow the epidemiological model (1) but keeping *R**t* regular enough. We include this regularity hypothesis in the model using standard techniques of calculus of variations.

We have included an automatic procedure in the algorithm to deal with countries where no data is provided during the week-end, and, optionally, a pre-processing step of the data using a median filter and a Gaussian filter. This pre-processing step removes most of the administrative noise of the data and eases the *R**t* estimation. In any case, this pre-processing is optional and its application depends on the level of noise of the country data.

While our method and the standard Cori et al. (EpiEstim) method are quite different, we found experimentally that for a particular choice of the parameters of the variational method and the serial interval, a good agreement can been obtained between the estimate of *R**t* provided by the variational model and a shifted estimate of *R**t* obtained by EpiEstim.

Since the point estimation of *R**t* is obtained up to the current date, it allows us to forecast the value of *R**t* in the short term. Currently, in countries that are relaxing social distancing measures, it is hard to obtain an accurate forecast because the situation of the epidemic can change rapidly in any direction in a few days. Therefore, the forecasting techniques proposed in this paper are just an example that shows how an *R**t* estimation can be used to forecast the number of infected. We proposed two techniques to extrapolate the value of *R**t*. The first one is automatic and assumes that the behavior of *R**t* in the next future will be similar to that of the recent past that is modeled using a harmonic oscillatory model. The second technique is interactive and allows the user to supply a future target value for *R**t* and the number of days needed to reach that value. This target value can depend on the actions that a country is taking to control the epidemic, but it is reasonable to assume that any country exerts itself to reach an ERN lower than 1.

We finally notice how dependent is any estimation or forecast of the ERN on governement policies, both for gathering data and for recording them. The main hindrances to a precise estimate of the ERN are :

1.  the constant changes of detection policies, which can go from a minimal count of serious cases confirmed at hospitals to wide ranging random testing;

2.  the incredible incapacity of administrations to record cases on a daily base, in the era of internet and instant communication.

An online implementation of the method is available at [www.ipol.im/ern](http://www.ipol.im/ern) where the users can perform their own experiments using official registered data of infected or uploading their own data of daily infected and/or the serial interval distribution.

## Data Availability

We use the COVID-19 registered daily infected from the European Centre for Disease Prevention and Control service

[https://www.ecdc.europa.eu/en](https://www.ecdc.europa.eu/en) 

## 6 Appendix. Some technical issues about the proposed method

### 6.1 Pre-processing of the input data

We check for each country if, from a given date in the past, there are zero values for the incident cases on Saturdays and Sundays, and cumulative values of three days on Mondays. To identify this pattern we look, in the last 2 weeks for a time *t* = *M* (the Monday) where *i**M*−3 > 0, *i**M*−2 = *i**M*−1 = 0 and *i**M* > 1.5 *· i**M*−3. If we found that pattern we mark as Saturday and Sunday with no data all *t* = *M* − 7 *· k* − 1, *t* = *M* − 7 *· k* − 2 (with *k* = 0, 1, 2‥) while *i**M*−7*·k*−2 = *i**M*−7*·k*−1 = 0. Let us denote by *W**e* the identified set of Saturdays and Sundays with no data. In the affected week-end we update the value *i**M*−7*·k*−2 = *i**M*−7*·k*−1 = *i**M*−7*·k* = *i**M*−7*·k**/*3.

In short, we assign to Saturday, Sunday and Monday a third of the cumulative incident cases over the three dayss.

The next step in the data pre-processing is an optional median filter to remove administrative noise. Given a windows radius *r**W* we define the median filter in a time *t* ≤ *t**c* − *r**W* as ![Formula][26]</img>  when *t* is close to the current time *t**c* we have to manage a boundary effect and we define : ![Formula][27]</img> 

In other terms, we increase the window’s size when approaching the end of the sequence, to involve more points for the median.

To clarify this procedure let us just take an example: The reported daily infected data in France from July 15, 2020 to July 23, 2020 are the following

View this table:
[Table1](http://medrxiv.org/content/early/2020/08/04/2020.08.01.20165142/T1)

Using the above procedure, the results of the median filter using a window radius *w**R* = 3 are the following : ![Formula][28]</img> 

We point out that this procedure is just one option to deal with the week-end’s missing values. Of course other options are possible. Finally, we optionally perform a linear Gaussian convolution to the data.

### 6.2 Short time forecasting of *R**t* and *i**t*

#### Extrapolation of *R**t* using the harmonic oscillator model

We fix the parameters *c, d* and ![Graphic][29]</img> of the harmonic oscillator (11) by fitting *R*(*t*) to the solution of the harmonic oscillator for *t* ∈[*t**c* −*N*1, *t**c*]. ![Graphic][30]</img> is the reference value in the last days given by ![Formula][31]</img> 

In the experiments we fixed the value of *N*1 to 14. We point out that the exponential decay of the amplitude of the oscillations is given by *e*−*ct*. If (4*d*−*c*2 > 0), then the period *T*, of the oscillations is given by ![Formula][32]</img> 

To avoid spurious oscillations and a strong increase of their amplitude we impose (when computing *c* and *d*) that ![Formula][33]</img>  once the parameters of the harmonic oscillatory model are fixed we forecast the evolution of *R*(*t*) for *t* > *t**c* using the solution of the harmonic oscillatory model with the initial conditions *R*(*t**c*) and *R* (*t**c*).

### User interactive extrapolation of *R**t*

The Hermite interpolation polynomial we use to extrapolate *R**t* is given by ![Formula][34]</img>  for any *γ* (0, 1], ![Graphic][35]</img> satisfies that ![Graphic][36]</img>. So, using this basic extrapolation procedure we get a smooth transition between *R*(*t**c*) and *R*1. The values of *R*1 and *N**d* can be fixed manually accordingly with the current and/or expected social distancing measures taking by the states. For instance, in the European countries, currently, we observe that the value of *R**t* oscillates around 1. In the experiments, we fixed the default values of this parameters to *R*1 = 1 and *N**d* = 21 days. By default, we initialize *γ* = 1 and then reduce the value of *γ* automatically to avoid that ![Graphic][37]</img> has negative value using the following relation: ![Formula][38]</img>  therefore if *R*1, *R*(*t**c*) > 0 and *γ ∈* (0, 1] is small enough, then ![Graphic][39]</img>.

### Forecasting of the daily infected *i**t*

Once we have forecast *R**t* for *t* = *t**c* + 1, *t**c* + 2, ‥, *t**c* + *dT* (where *dT* is the number of new days to forecast), we can compute by iteration *i**t* from *t**c* + 1 to *t**c* + *dT* using equation (5) and the extrapolation of *i**t* given by (7). Then to reduce the effect of the extrapolation of *i**t* when *t* > *t**c*, we observe that, in fact, equation (5) can be interpreted as a fixed point equation, so we iterate equation (5) until convergence to update *i**t* for *t* = *t**c* − 9, …, *t**c* + *dT*. We start at *t**c* − 9 to smooth a bit the forecast and to reduce potential discontinuities of the forecast *i**t* at the time *t**c*.

### 6.3 Influence of the algorithm parameters

This section shows some experiments illustrating the influence of the different parameters of the variational model. In Fig. 7 we illustrate the influence of the serial interval. We see a big difference at first because the value of *R* is highly dependent on the serial interval. At the end it is observed tha1 t the estimate with the Ma et al. serial interval is significantly higher than the other two and that the estimate with the Nishiura et al. serial interval tends to come to the end flatter than the other two. In Fig. 8 we illustrate the influence of the regularization parameter. As expected, we note that the higher the value of this parameter, the smoother the estimate of *R*(*t*).

![Figure 7:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/08/04/2020.08.01.20165142/F7.medium.gif)

[Figure 7:](http://medrxiv.org/content/early/2020/08/04/2020.08.01.20165142/F7)

Figure 7: 
Influence of the serial intervals. We show, in the the case of France, the results of the *R*(*t*) computation obtained by the proposed method using the three serial intervals Nishiura et al. (in blue), Ma. et al. (in orange) and Du et al. (in grey). Il all cases we used the filtered data with *r**W* = 3, *σ* = 2, *β* = 105 and *w* = 10−1.2.

![Figure 8:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/08/04/2020.08.01.20165142/F8.medium.gif)

[Figure 8:](http://medrxiv.org/content/early/2020/08/04/2020.08.01.20165142/F8)

Figure 8: 
Influence of the regularization parameter. We show, in the the case of France, the results of the *R*(*t*) computation obtained by the proposed method using the filtered data with the Nishiura et al. serial interval, *r**W* = 3, *σ* = 2, *β* = 105 and *w* = 10−1.2 (in blue), *w* = 10−5 (in orange), and *w* = 1 (in grey).

In Fig. 9 we illustrate the influence of the weight *β* in the energy (6). The higher the value of this parameter, the more similar the value of *R*(0) is to the initial estimate obtained from the initial exponential growth rate of the epidemic.

![Figure 9:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/08/04/2020.08.01.20165142/F9.medium.gif)

[Figure 9:](http://medrxiv.org/content/early/2020/08/04/2020.08.01.20165142/F9)

Figure 9: 
Influence of the weight *β* in the energy (6). We show, in the the case of France, the results of the *R*(*t*) computation obtained by the proposed method using the filtered data with the Nishiura et al. serial interval, *r**W* = 3, *σ* = 2, *w* ≡ 10−1.2 and *β* = 105 (in blue), *β* = 10−5 (in orange), and *β* = 10−2 (in grey).

In Fig. 10 we show the influence of the data filtering. We notice that the median filter gets rid of much of the “administrative” noise, so the estimate of *R*(*t*) is much smoother when the median filter is applied.

![Figure 10:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/08/04/2020.08.01.20165142/F10.medium.gif)

[Figure 10:](http://medrxiv.org/content/early/2020/08/04/2020.08.01.20165142/F10)

Figure 10: 
Influence of the data filtering. We show, in the the case of France, the results of the *R*(*t*) computation obtained by the proposed method using the Nishiura et al. serial interval, *w* = 10−1.2, *β* = 105 and *r**W* = 3, *σ* = 2 (in blue), *r**W* = 3, *σ* = 0 (in orange), and *r**W* = 0, *σ* = 0 (in grey).

### 6.4 Variability of the *R**t* estimate

We emphasize that we do not assume any statistical model on the distribution of the *R**t* values, therefore, we cannot provide confidence intervals for this estimate in statistical terms. For certain choices of the variational model parameters, the good agreement with the results obtained by EpiEstim gives us an idea, by comparison, about the variability of our estimate. On the other hand, this variability strongly depends on the applied regularization determined by the parameter *w*. To illustrate this variability, in the software available online at [www.ipol.im/ern](http://www.ipol.im/ern), we represent the estimate of *R**t* around an empirical interval of variability defined at each point as [*R**t* − 2*σ*(*t*), *R**t* + 2*σ*(*t*)], where *σ*(*t*) is the standard deviation of the difference between the values of *R**t* estimates at [*t* − 20, *t*] and the linear regression calculated using the values of *R**t* at [*t* − 20, *t*]. In this way an empirical estimate of the variability of *R**t* is obtained.

## Acknowledgment

This research has been partially supported by Kayrros, inc.

## Footnotes

*   1 [https://en.wikipedia.org/wiki/COVID-19\_pandemic\_in\_France](https://en.wikipedia.org/wiki/COVID-19_pandemic_in_France)

*   Received August 1, 2020.
*   Revision received August 1, 2020.
*   Accepted August 4, 2020.


*   © 2020, Posted by Cold Spring Harbor Laboratory

This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at [http://creativecommons.org/licenses/by-nc-nd/4.0/](http://creativecommons.org/licenses/by-nc-nd/4.0/)

## References

1.  [1]. P. Abry,  N. Pustelnik,  S. Roux,  P. Jensen,  P. Flandrin,  R. Gribonval,  C.-G. Lucas,  E. Guichard,  P. Borgnat,  N. Garnier, and  B. Audit, Spatial and temporal regularization to estimate covid-19 reproduction number r(t): Promoting piecewise smoothness via convex optimization. medRxiv, 2020.
    
    
2.  [2]. T. Boulmezaoud, Un modèle de prédiction de l’épidémie covid-19 et une stratégie zig-zag pour la contrôler, (2020).
    
    
3.  [3]. T. Z. Boulmezaoud,  L. Alvarez,  M. Colom, and  J.-M. Morel, A daily measure of the sars-cov-2 daily reproduction number for all countries, IPOL Journal. Image Processing On Line, submitted, (2020).
    
    
4.  [4]. G. Chowell,  H. Nishiura, and  L. M. Bettencourt, Comparative estimation of the reproduction number for pandemic influenza from daily case notification data, Journal of the Royal Society Interface, 4 (2007), pp. 155–166.
    
    
5.  [5]. A. Cori,  N. M. Ferguson,  C. Fraser, and  S. Cauchemez, A new framework and software to estimate time-varying reproduction numbers during epidemics, American journal of epidemiology, 178 (2013), pp. 1505–1512.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/aje/kwt133&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24043437&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F08%2F04%2F2020.08.01.20165142.atom) 

6.  [6]. Z. Du,  X. Xu,  Y. Wu,  L. Wang,  B. J. Cowling, and  L. A. Meyers, The serial interval of covid-19 from publicly reported confirmed cases, medRxiv, (2020).
    
    
7.  [7].Groupe de modélisation de l’équipe ETE (Laboratoire MIVEGEC, CNRS, IRD, Université de Montpellier), Estimation du nombre de reproduction temporel, 2020 (accessed May 30, 2020). [https://bioinfo-shiny.ird.fr/Rt/](https://bioinfo-shiny.ird.fr/Rt/).
    
    
8.  [8]. Q.-H. Liu,  M. Ajelli,  A. Aleta,  S. Merler,  Y. Moreno, and  A. Vespignani, Measurability of the epidemic reproduction number in data-driven contact networks, Proceedings of the National Academy of Sciences, 115 (2018), pp. 12680–12685.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czoxMjoiMTE1LzUwLzEyNjgwIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMDgvMDQvMjAyMC4wOC4wMS4yMDE2NTE0Mi5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

9.  [9]. S. Ma,  J. Zhang,  M. Zeng,  Q. Yun,  W. Guo,  Y. Zheng,  S. Zhao,  M. H. Wang, and  Z. Yang, Epidemiological parameters of coronavirus disease 2019: a pooled analysis of publicly reported individual data of 1155 cases from seven countries, Medrxiv, (2020).
    
    
10. [10]. H. Nishiura, Time variations in the transmissibility of pandemic influenza in prussia, germany, from 1918–19, Theoretical Biology and Medical Modelling, 4 (2007), p. 20.
    
    
11. [11]. H. Nishiura,  N. M. Linton, and  A. R. Akhmetzhanov, Serial interval of novel coronavirus (covid-19) infections, International journal of infectious diseases, (2020).
    
    
12. [12]. T. Obadia,  R. Haneef, and  P.-Y. Boëlle, The r0 package: a toolbox to estimate reproduction numbers for epidemic outbreaks, BMC medical informatics and decision making, 12 (2012), p. 147.
    
    
13. [13]. R. Thompson,  J. Stockwin,  R. D. van Gaalen,  J. Polonsky,  Z. Kamvar,  P. Demarsh,  E. Dahlqwist,  S. Li,  E. Miguel,  T. Jombart, et al., Improved inference of time-varying reproduction numbers during infectious disease outbreaks,Epidemics, 29 (2019), p. 100356.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.epidem.2019.100356&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F08%2F04%2F2020.08.01.20165142.atom) 

14. [14]. M. A. Vink,  M. C. J. Bootsma, and  J. Wallinga, Serial intervals of respiratory infectious diseases: a systematic review and analysis, American journal of epidemiology, 180 (2014), pp. 865–875.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/aje/kwu209&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25294601&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F08%2F04%2F2020.08.01.20165142.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000344600900001&link_type=ISI)

 [1]: /embed/graphic-1.gif
 [2]: /embed/graphic-2.gif
 [3]: /embed/inline-graphic-1.gif
 [4]: /embed/graphic-3.gif
 [5]: /embed/inline-graphic-2.gif
 [6]: /embed/graphic-5.gif
 [7]: /embed/graphic-6.gif
 [8]: /embed/graphic-7.gif
 [9]: /embed/inline-graphic-3.gif
 [10]: /embed/inline-graphic-4.gif
 [11]: /embed/inline-graphic-5.gif
 [12]: /embed/inline-graphic-6.gif
 [13]: /embed/graphic-8.gif
 [14]: /embed/inline-graphic-7.gif
 [15]: /embed/graphic-9.gif
 [16]: /embed/graphic-10.gif
 [17]: /embed/graphic-11.gif
 [18]: /embed/inline-graphic-8.gif
 [19]: /embed/inline-graphic-9.gif
 [20]: /embed/inline-graphic-10.gif
 [21]: /embed/inline-graphic-11.gif
 [22]: /embed/inline-graphic-12.gif
 [23]: /embed/graphic-12.gif
 [24]: /embed/graphic-13.gif
 [25]: /embed/inline-graphic-13.gif
 [26]: /embed/graphic-19.gif
 [27]: /embed/graphic-20.gif
 [28]: /embed/graphic-22.gif
 [29]: /embed/inline-graphic-14.gif
 [30]: /embed/inline-graphic-15.gif
 [31]: /embed/graphic-23.gif
 [32]: /embed/graphic-24.gif
 [33]: /embed/graphic-25.gif
 [34]: /embed/graphic-26.gif
 [35]: /embed/inline-graphic-16.gif
 [36]: /embed/inline-graphic-17.gif
 [37]: /embed/inline-graphic-18.gif
 [38]: /embed/graphic-27.gif
 [39]: /embed/inline-graphic-19.gif