Bayesian nowcasting with Laplacian-P-splines

Bryan Sumalinab; Oswaldo Gressani; Niel Hens; Christel Faes

doi:10.1101/2022.08.26.22279249

Abstract

During an epidemic, the daily number of reported (infected/death) cases is often lower than the actual number of cases due to underreporting. Nowcasting aims to estimate the cases that have not yet been reported and combine it with the already reported cases to obtain an estimate of the daily cases. In this paper, we present a fast and flexible Bayesian approach to nowcasting combining P-splines and Laplace approximations. The main benefit of Laplacian-P-splines (LPS) is the flexibility and faster computation time compared to Markov chain Monte Carlo (MCMC) algorithms that are often used for Bayesian inference. In addition, it is natural to quantify the prediction uncertainty with LPS in the Bayesian framework, and hence prediction intervals are easily obtained. Model performance is assessed through simulations, and the method is applied to the Belgian COVID-19 mortality cases for the year 2021. Simulation results show that our model has good predictive performance except when the nowcast date is near the peak date, where it has lower prediction interval coverage.

1 Introduction

Nowcasting is a term used for estimating the occurred-but-not-yet-reported-events (Donker et al., 2011; Van de Kassteele et al., 2019). In epidemiology, real-time updates of new symptomatic/infected individuals are helpful to assess the present situation and provide recommendations for rapid planning and for implementing essential measures to contain an epidemic out-break. The exact number of new daily cases is frequently subject to reporting delays, resulting in underreporting of the real number of infected individuals for that day. Failing to account for the reporting delays will lead to possibly biased predictions that might have an effect on policy making (Gutierrez et al., 2020). The main goal of nowcasting is to estimate the actual number of new cases by combining the (predicted) not-yet-reported-cases with the already reported cases.

Several early references that establish the statistical framework for this type of problem can be found in the paper of Lawless (1994) which demonstrates how nowcasting can be used not only in disease surveillance but also in other contexts such as warranty and insurance claims. More recently, Höhle and an der Heiden (2014) applied nowcasting to the outbreak of Shiga toxin-producing Escherichia coli in Germany and also to the SARS-CoV-2 outbreak (Glöckner et al., 2020; Günther et al., 2021). Their approach is formulated within a hierarchical Bayesian framework that consists of estimating the epidemic curve by using a quadratic spline based on a truncated power basis function and the time-varying reporting delay distribution is approximated by a discrete time survival model. Van de Kassteele et al. (2019) pointed out that a potential drawback of such a method is the long computation time required by the Markov chain Monte Carlo (MCMC) algorithm and therefore proposed an alternative fast and flexible modelling strategy based on bivariate P-splines (Penalized B-splines). P-splines (Eilers and Marx, 1996) provide a flexible smoothing tool used to describe trends in the data. It introduces a penalty parameter that controls the roughness of the fit and counterbalances the flexibility of a rich B-splines basis (Eilers and Marx, 1996; Eilers et al., 2015). Other attractive features of the P-splines smoother are the relatively simple structure of the penalty matrix that is effortlessly computed and a natural extension to the Bayesian framework (Lang and Brezger, 2004). Based on the approach of Van de Kassteele et al. (2019), the number of cases are structured in a two-dimensional table (with calendar time as the first dimension and delay time as the second dimension), yielding the data matrix used as an input in the model. The reporting intensity is assumed to be a smooth surface and is modelled using two-dimensional P-splines.

In this paper, we extend the method of Van de Kassteele et al. (2019) by proposing a new now-casting methodology based on Laplacian-P-splines (LPS) in a fully Bayesian framework. A key advantage of working with the Bayesian approach is the ease to obtain the predictive distribution and quantify the uncertainty associated with the predictions. In addition, the posterior distribution of the penalty parameter can be explored, and hence its uncertainty can be accounted for. The Laplace approximation uses a second-order Taylor expansion to approximate the posterior distribution of the regression parameters by a Gaussian density. It is a sampling-free method with the major advantage of faster computational time as opposed to MCMC approaches that are commonly used in Bayesian inference. Therefore, given the flexibility of previously mentioned P-splines smoothers and the computational benefit of Laplace approximations, it can be a helpful tool in the daily monitoring of new cases during an epidemic period. Laplacian-P-splines already proved to be useful in survival models (Gressani and Lambert, 2018; Gressani et al., 2022), generalized additive models (Gressani and Lambert, 2021) and also in epidemic models for estimating the effective reproduction number (Gressani et al., 2021). We build on the work of Gressani and Lambert (2021) to extend the Laplacian-P-splines methodology to now-casting, thereby providing a fast and flexible (fully) Bayesian alternative to Van de Kassteele et al. (2019). To evaluate the (predictive) performance of our method, a simulation study is implemented and several performance measures are reported such as the mean absolute percentage error (MAPE), symmetric mean absolute percentage error (SMAPE) and prediction interval coverage. Finally, we apply our method to the COVID-19 mortality data in Belgium for the year 2021. The data and R codes used to implement the method are available on GitHub through this link https://github.com/bryansumalinab/Laplacian-P-spline-nowcasting.git.

2 Methodology

2.1 Bayesian model formulation

In this section, we build upon the work of Van de Kassteele et al. (2019) to formulate a fully Bayesian model based on P-splines. Let y_t,d denote the number of cases that occurred at time t = 1, 2, …, T (corresponding to the calendar day) and reported with a delay of d = 0, 1, 2…, D days. The information on cases can be summarized in matrix form: with cases that have not yet been reported (at time T) highlighted in bold. The not-yet-reported cases correspond to (t, d) combinations satisfying t > T − d. The main objective is to predict the total number of cases, , for t = T − (D − 1), …, T for which the nowcasted and already reported cases can be combined.

Let 𝒟 := y = (y₁, y₂ …, y_n)′ denote the vector of the observed number of cases by stacking the columns of matrix Y for the reported cases, where each entry corresponds to each (t, d) combination of reported cases y_t,d. The model assumes that the number of cases are Poisson distributed, i.e., y_t,d ∼ Poisson(μ_t,d) with mean μ_t,d > 0. Following Van de Kassteele et al. (2019), the (log) mean number of cases is modeled with two dimensional B-splines: where β₀ is the intercept; b_j(·) and b_k(·) are univariate B-splines basis functions specified in the time and delay dimensions, respectively; and z_l(t, d) represents additional covariates with regression coefficients β_l. In matrix notation: where the matrices B and Z correspond to the basis functions and covariates, respectively, and vectors θ and β are the associated parameters to be estimated (Details in Appendix A1).

In the philosophy of P-splines (Eilers and Marx, 1996), we use a rich (cubic) B-splines basis and counterbalance the associated flexibility by imposing a discrete roughness penalty on contiguous B-spline coefficients. For the two dimensional P-splines, the penalty can be obtained based on row-wise (direction of calendar time) and column-wise (direction of reporting delay) differences for matrix Θ = (θ_j,k) with j = 1, …, K_T and k = 1, …, K_D (Appendix A1) (see Durbán et al. (2002) and Fahrmeir et al. (2013) (pp. 507-508)). Let and denote the mth order row-wise and column-wise difference matrix with dimensions (K_T − m) × K_T and (K_D − m) × K_D, respectively. In this paper, we use a second order (m = 2) difference penalty. For ease of notation, let and . The difference matrix for vector θ can be obtained by expanding the difference matrix into and , where ⊗ denotes the Kronecker product. Using this notation, the row-wise and column-wise difference penalty can be written, respectively, as: Let λ_t > 0 and λ_d > 0 denote the row-wise and column-wise penalty parameter, respectively. The penalty for the two dimensional B-splines is then given by: Let us define the penalty matrices and , where δ is a small number (say δ = 10⁻⁶), to ensure that the penalty matrices are full rank and thus invertible. Following Lang and Brezger (2004), the penalty can be translated in the Bayesian framework by specifying a Gaussian prior for each column and row vector of Θ. We write Θ in terms of its columns and rows: The priors are then given by and for k = 1, …, K_D and j = 1, …, K_T, respectively. It follows that the joint prior for θ can be written as: Gaussian priors are assumed for β and θ, namely with V_β = ζI_p+1 (small ζ, e.g. ζ = 10⁻⁵) and (θ|λ) ∼ 𝒩_dim(θ)(0, 𝒫−1(λ)), where λ = (λ_t, λ_d)′ is the penalty vector and the global penalty matrix. Denote by X = (B, Z) the global design matrix, ξ = (β′, θ′)′ the latent parameter vector and precision matrix for ξ. The full Bayesian model is then summarized as follows: where 𝒢 (a, b) denotes a Gamma distribution with mean a/b and variance a/b². This robust prior specification on the penalty parameters follows from Jullion and Lambert (2007). They have shown that when the hyperparameters a_δ, b_δ are chosen to be equal and small enough (say 10⁻⁴), then the resulting fit is robust to the value of ν (e.g. ν = 3 in this paper).

2.2 Laplace approximation to the conditional posterior of ξ

To obtain a Laplace approximation for the set of regression parameters ξ = (β′, θ′)′, we use the posterior of ξ conditional on the vector of penalty parameters λ. Then, the gradient and Hessian of the log-posterior are computed and used in a Newton-Raphson algorithm to obtain the Gaussian approximation to the conditional posterior distribution of ξ.

Note that for a Poisson distributed y_i with mean μ_i, we have f (y_i) ∝ exp (y_iγ_i − s(γ_i)), where γ_i = log(μ_i) and s(γ_i) = exp(γ_i) = μ_i. Using Bayes’ rule, the posterior of ξ conditional on the penalty vector λ is: where ℒ(ξ; 𝒟) denotes the likelihood function. Hence, the log-conditional posterior of ξ can be obtained as: where ≐ denotes equality up to an additive constant.

It is known for Poisson generalized linear models that the gradient (score vector) and Hessian of the first term on the right hand side of (2) are given, respectively, by (see Agresti (2013), p.136): where W = diag(w₁, …, w_n) is a diagonal matrix with w_i = (V ar(y_i)[g^′(μ_i)]²)⁻¹ = μ_i and g(·) is the log-link function such that g(μ_i) = log(μ_i). Hence, the gradient and Hessian for the log-conditional posterior of ξ are: Using a Newton-Raphson algorithm, the Laplace approximation for the conditional posterior of ξ is, by abuse of notation, (Gressani and Lambert, 2021), where and correspond to the mode and variance-covariance matrix, respectively, given by: where is the vector at convergence resulting from the sequence , with , the initial condition , and W (ξ⁽⁰⁾) = diag(Xξ⁽⁰⁾).

2.3 Hyperparameter optimization

In this section, we derive the (approximate) posterior distribution of the hyperparameters belonging to the penalization part, i.e., λ and δ. We first derive the joint posterior of λ and δ and then proceed with an integration to isolate the marginal posterior distribution of the penalty parameter.

Let η = (λ_t, λ_d, δ_t, δ_d)′ denotes the vector of hyperparameters. Using Bayes’ theorem, the marginal posterior of η is: Following Rue et al. (2009), the above posterior can be approximated by replacing p(ξ|η, 𝒟) with obtained in Section 2.2 and by evaluating the latent vector at . Note that ,Where corresponds to the i th row of the design matrix X. Also, the determinant in p(ξ|λ) is . Hence, the approximated marginal posterior of η is: To obtain the marginal posterior of λ, we need to integrate out the hyperparameters δ_t and δ_d from as follows: Hence, The posterior mode (obtained via Newton-Raphson) is used as a point estimate for the penalty vector. To ensure numerical stability, we log-transformed the penalty vector v = (v_t, v_d)′ = (log(λ_t), log(λ_d))′. Using the method of transformations, we obtain the posterior of v given by: The exponent of the last term above reduces to because of multiplying with the Jacobian of the transformation J = exp(v_d) exp(v_t). Moreover, the log-posterior of v is given by:

2.4 Nowcasting with prediction interval

To obtain the mean nowcast estimate with the prediction interval, note that or equivalently , where . Thus, . The mean estimate for the not-yet-reported cases is calculated as for all (t, d) combinations with t > T − d. Then, the estimate for the total number of cases for each t is obtained by summing the already reported cases and the mean estimate for the not-yet-reported cases, i.e. .

The prediction interval for the nowcast values is obtained by sampling from the posterior predictive distribution of the log-mean number of cases by following the five steps below:

For each (t, d) combinations with t > T − d (corresponding to the not-yet-reported cases), generate 1000 random samples from a Gaussian distribution with mean and variance
Exponentiate the sampled values from the previous step to obtain the average reporting intensities .
Compute the average prediction for the not-yet-reported cases for each t. That is, compute for t > T − d
For each t, sample containing 1000 values from Poisson.
Finally, compute the quantiles of the sampled values that correspond to the desired prediction interval.

2.5 Delay distribution

To obtain the smooth estimate of the delay distribution, only the first term on the right hand side of equation (1) is used (excluding the week effects), as explained in the paper of Van de Kassteele et al. (2019). Specifically, the procedure to compute the delay distribution is as follows:

Compute the contribution of the smoothing term in equation (1) to the reporting intensity for all (t, d) combinations: μ^smooth = exp(Bθ).
Arrange μ^smooth into a T × (D + 1) matrix with entries .
For each t = 1, …, T, compute the reporting delay distribution given by: .

3 Simulations

A simulation study is implemented in order to evaluate the predictive performance of the pro-posed method. The procedure to perform the simulations is as follows:

Consider a function f (t) that represents the mean epidemic curve of all cases such that μ(t) = exp(f (t)) for t = 1, …, T.
For each t, generate a random sample y_t from a Poisson distribution with mean μ(t).
To account for possible delays d = 0, 1, 2, …, D, generate samples from a multinomial distribution with probabilities (p₀, p₁, p₂, · · ·, p_D such that , i.e., This sample represents the reported number of cases for each (t, d) combination.
Repeat steps 1 to 3 for 1000 times to generate 1000 possible realizations and compute the desired accuracy measures.

We consider two functions f (t) inspired from the paper of Noufaily et al. (2016) given by: for t = 1, …, 365. That is, we assume a one year (365 days) time window in the simulation. In terms of delay probabilities, we consider a maximum delay of D = 7 days with probabilities (p₀, p₁, p₂, · · ·, p₇) = (0.0, 0.1, 0.4, 0.2, 0.1, 0.1, 0.05, 0.05). This is illustrated in Figure 1. For example, at time T − 6, only the cases that have a delay of 7 days (with probability p₇ = 0.05) are not yet reported, or equivalently, 95% of the cases have already been reported. For time T − 5, only 90% of the cases have been reported, as cases with delays of 6 and 7 days are yet to be reported. On the nowcast day (time T), no case is reported, that is, the delay is 100%.

Figure 1: Illustration of the delay probabilities considered in the simulation.

Two scenarios are considered for the first function f₁(t): (i) the first one is having a small number of cases with values of θ₁ = 2 and θ₂ = 1; (ii) the second scenario has relatively larger cases with θ₁ = 3 and θ₂ = 1. These functions are denoted by f₁₁(t) and f₁₂(t), respectively. The second function f₂(t) has values θ₁ = 0, θ₂ = 0.4 and θ₃ = 0.2. The first function has a symmetric curve with three peaks as shown in Figures 2a and 2b. On the other hand, the second function is not periodic as opposed to the first function (see Figure 2c). The plot for (one realization of) simulated cases based on these functions are shown in Figure 3 with (blue dashed) vertical lines corresponding to the different nowcast dates.

Figure 2: Plot of the functions considered in the simulation.

Figure 3: : Plot of the simulated cases for different functions. Blue dashed vertical lines - different nowcast dates.

The mean absolute percentage error (MAPE) and symmetric mean absolute percentage error (SMAPE) are chosen to measure the predictive accuracy of our methodology. Moreover, for each simulation, we consider a 95% prediction interval and determine the (true) unreported cases that falls within the interval to obtain the prediction interval coverage. The formulas to compute these measures are given by: where S = 1000 is the number of generated realizations, y_st is the true value (generated from step 2 of the simulation in Section 3 at iteration s) and ŷ_st is the corresponding predicted value (mean nowcast estimate in Section 2.4). The MAPE and SMAPE metrics are scale-independent and measure the error of prediction from the actual value in terms of percentage. One disadvantage of MAPE is that it could have undefined values when the actual values are zero, which is evident in the formula where the true value is in the denominator. In our setting, this happened especially with very small delay probabilities where there is a high probability of zero unreported cases, and so we exclude this case in computing the MAPE. Another way to address this issue is to use the SMAPE indicator that has values with lower and upper bounds of 0% and 200%, respectively.

In the simulation, every end of the month from March to November are set as nowcast dates. For each nowcast date, the prediction measures are computed for the dates having unreported cases, that is, for t = T − (D − 1), …, T. As we have a maximum delay of 7 days, there will be seven dates (including the nowcast date) that involves prediction of unreported cases. Tables 1-3 summarize the MAPE results for the different functions being considered. The error increases as it gets closer to the nowcast day, that is, from 6 days prior to the nowcast day to the nowcast day. This is expected, as there is less data available at each delay time as we approach the nowcast day. Indeed, on day T, about 95% of the cases that occurred on day T − 6 have been reported, while on day T no case is reported yet. The errors are relatively higher in March and April, where we have the least available data. For functions f₁₂(t) and f₂(t), the MAPE on the nowcast day ranges between 20% - 35% for May to November. This is reasonable considering that we do not have data available on the nowcast day because all cases have a delay of at least one day. Moreover, the MAPE results using LPS are similar to the ones obtained using Van de Kaasteele et al.’s method (values in parentheses in the Table). There is no clear pattern that indicates one approach has higher or lower error than the other.

View this table:

Table 1:

Mean absolute percentage error (MAPE) for the number of cases at day T− t (t = 0, …, 6) using function f₁₁(t), as well as the average over these days. Values in parentheses are results using Van de Kassteele et al.’s method.

View this table:

Table 2:

Mean absolute percentage error (MAPE) for the number of cases at day T − t (t = 0, …, 6) using function f₁₂(t), as well as the average over these days. Values in parentheses are results using Van de Kassteele et al.’s method.

View this table:

Table 3:

Mean absolute percentage error (MAPE) for the number of cases at day T − t (t = 0, …, 6) using function f₂(t), as well as the average over these days. Values in parentheses are results using Van de Kassteele et al.’s method.

Results for the prediction interval coverage can be found in Tables 4-6. For function f₁₁(t), almost all of the values are higher than the 95% nominal prediction interval. For function f₁₂(t), the prediction interval coverage is close to 95% for most cases, although the coverage of the nowcast day for June and November are somewhat too low (60.04% and 73.95%, respectively). This could be resulting from reaching the peak (curve change) and the fact that no data is available for the nowcast day, that is, there is no zero delay. Moreover, the function f₂(t) has high coverage except on the nowcast day for October and November, having higher simulated cases. There is moderate to strong undercoverage in the prediction interval with Van de Kassteele et al.’s method for 4 to 6 days before the nowcast day, while LPS is more conservative and has coverages that are closer to the nominal value. On the other hand, from 2 days before the now-cast day to the nowcast day, Van de Kaasteele et al.’s method has higher coverage than LPS, reflecting much wider prediction intervals. The results for SMAPE of the three functions are also displayed in Appendix A2 with the percentage error of functions f₁₂(t) and f₂(t) ranging between 20% - 40% on the nowcast day. Similar to MAPE results, LPS and Van de Kaasteele et al.’s method have comparable SMAPE values with no systematic difference between the two approaches.

View this table:

Table 4:

Prediction interval coverage for the number of cases at day T − t (t = 0, …, 6) using function f₁₁(t), as well as the average over these days. Values in parentheses are results using Van de Kassteele et al.’s method.

View this table:

Table 5:

Prediction interval coverage for the number of cases at day T − t (t = 0, …, 6) using function f₁₂(t), as well as the average over these days. Values in parentheses are results using Van de Kassteele et al.’s method.

View this table:

Table 6:

Prediction interval coverage for the number of cases at day T − t (t = 0, …, 6) using function f₂(t), as well as the average over these days. Values in parentheses are results using Van de Kassteele et al.’s method.

4 Real Data application

We apply our method to COVID-19 mortality data in Belgium, for the year 2021. The raw data is available on the website of the Sciensano research institute (see sciensano.be/covid19data). The data contains the cumulative number of mortality cases, reported up to the day of the file. The file is updated every day, and in this way, the number of cases and reporting delays are obtained. The data is structured in matrix format with the date of death as rows and number of days of reporting delay as columns. Figure 4 shows the total number of cases with (blue dashed) vertical lines corresponding to different nowcast dates used for illustration. As no cases are reported immediately on the day of death in the data, all cases have a delay of at least one day. The proportion of cases reported for each delay is shown in Table 7. Majority of the cases (66%) were reported with a delay of 2 days. Followed by a delay of 1 day and 3 days in 14% and 12% of the cases, respectively. The rest of the delays account for 8% of the cases. The reporting delay is truncated to a maximum of seven days.

View this table:

Table 7:

Proportion of cases reported for each delay for the mortality data.

Figure 4: Covid-19 death cases in Belgium for the year 2021. Blue dashed vertical lines - different nowcast dates.

In this section, we present the model results at different nowcast dates. In the analysis, we choose K_T = 40 and K_D = 10 as the number of B-splines for the time and delay dimension, respectively. Moreover, a cubic B-splines basis with a second order penalty is used. In addition, we also consider the day of the week effect as additional covariate in the model.

Figure 5 presents the nowcasting results using the mortality data at different nowcast dates (namely, at every end of the month). The blue color represents the reported cases for the past 14 days on the nowcast date, the gray color are cases that have not yet been reported, and the orange line corresponds to the nowcast prediction with the 95% prediction interval. It can be seen that the nowcast predictions are fairly close to the observed cases (gray). In addition, all of the observed cases fall within the prediction interval. The nowcasts at the day of nowcasting (i.e. last date presented in the plots) for March and April are slightly underestimated. Based on the simulation results, possible reasons for the underestimation are: (1) the number of cases in this period is higher as compared to other periods; (2) the amount of information used in this period is still limited (from January till March or April); and (3) at the nowcast date, no information is available as there is a 100% delay of cases.

Figure 5: Nowcast plot for mortality data with different nowcast dates. Blue - reported cases ; Gray - Not-yet-reported cases; Orange - nowcast with prediction interval.

The estimated delay density for different nowcast dates is shown in Figure 6. The steps to obtain the time-varying delay distribution is outlined in Section 2.5. Figure 6 shows that the delay distribution is fairly constant through time, with density being highest for a delay of 2 days. This confirms the observed reporting intensity in our data that most cases are reported with a two-day delay. Furthermore, the density is highest between April and June, which relates to a higher number of reported cases (see Figure 4).

Figure 6: Estimated delay density for mortality data with different nowcast dates.

Table 8 shows the computational time required for doing the nowcasting analysis using mortality data. We used the microbenchmark() function from the microbenchmark package in R. The analysis was implemented on a device with an Intel(R) Core(TM) i5-1135G7, CPU running at a base frequency of 2.40GHz, and having 4 cores. Twenty function calls were evaluated for each nowcast date. The average real elapsed time is around 12 to 25 seconds, except for the nowcast date on October, for which we used almost all of the 2021 data with an average of 50 seconds.

View this table:

Table 8:

Computation time (in seconds) for 20 evaluations using mortality data with the LPS methodology.

5 Conclusion

This paper presents a novel Bayesian approach of the nowcasting model proposed by Van de Kassteele et al. (2019). One advantage of using the Bayesian approach is that the prediction intervals can be naturally obtained which is the main goal of nowcasting. Our proposed methodology is based on a combination of Laplace approximations and P-splines, making it both fast and flexible. We perform a simulation study to evaluate the (predictive) performance of our method. Simulation results show that nowcasts and prediction interval coverage are fairly close to the true value and nominal (95%) prediction interval, respectively, except when the nowcast date falls on the day of the peak where it has low coverage. This could be due to the higher values and a shift in the epidemic trend. Moreover, we also apply our method to the Belgium COVID-19 mortality data (2021). The nowcasts are close to the actual observed values, and all of the 95% prediction intervals for the different nowcast dates being considered contain the observed values.

This study has some limitations. First, it assumes that the counts are Poisson distributed. One possible consequence of this is that the variability in the data tends to be underestimated because of overdispersion, which can cause undercoverage in the prediction intervals. This can be addressed by using the negative Binomial distribution. Another issue is that the priors for the B-spline coefficients are based on the Kronecker product of the penalty matrices. Lang and Brezger (2004) pointed out that this could overfit the data, and they suggest using spatial smoothness priors. Moreover, as mentioned by Van de Kassteele et al. (2019) our model specification makes it difficult to model abrupt changes in the reporting process.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This project is funded by the European Union Research and Innovation Action under the H2020 work programme EpiPose (grant number 101003688).

6 Appendix

A1

In equation (1), we have a two-dimensional B-spline model: where

B = B_D ⊗ B_T is the Kronecker product of B-splines matrices B_T and B_D with dimension T × K_T and D × K_D, given by: As a result, B has dimension TD × K_T K_D.
Z is the design matrix for additional covariates with dimension TD × (p + 1) and corresponding parameters β. This is given by:
θ = vec(Θ) has dimension K_T K_D × 1 where vec(Θ) denotes the vectorization of a matrix Θ. It is obtain by vectorizing the matrix: of B-splines coefficients θ_j,k by stacking the columns of Θ, that is,

A2. Result for symmetric mean absolute percentage error (SMAPE)

View this table:

Table 9:

Symmetric mean absolute percentage error (SMAPE) for the number of cases at day T − t (t = 0, …, 6) using function f₁₁(t), as well as the average over these days. Values in parentheses are results using Van de Kassteele et al.’s method.

View this table:

Table 10:

Symmetric mean absolute percentage error (SMAPE) for the number of cases at day T − t (t = 0, …, 6) using function f₁₂(t), as well as the average over these days. Values in parentheses are results using Van de Kassteele et al.’s method.

View this table:

Table 11:

Symmetric mean absolute percentage error (SMAPE) for the number of cases at day T − t (t = 0, …, 6) using function f₂(t), as well as the average over these days. Values in parentheses are results using Van de Kassteele et al.’s method.

References

↵
Agresti, A. (2013). Categorical Data Analysis. John Wiley & Sons.
Google Scholar
↵
Donker, T., van Boven, M., van Ballegooijen, W. M., van’t Klooster, T. M., Wielders, C. C., and Wallinga, J. (2011). Nowcasting pandemic influenza A/H1N1 2009 hospitalizations in the Netherlands. European Journal of Epidemiology, 26(3):195–201.
OpenUrl CrossRef PubMed Google Scholar
↵
Durbán, M., Currie, I., and Eilers, P. H. C. (2002). Using P-splines to smooth two-dimensional Poisson data. In Proceedings of 17th International Workshop on Statistical Modelling, Chania, Crete, pages 207–214.
Google Scholar
↵
Eilers, P. H. C. and Marx, B. D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science, 11(2):89–121.
OpenUrl CrossRef Web of Science Google Scholar
↵
Eilers, P. H. C., Marx, B. D., and Durbán, M. (2015). Twenty years of P-splines. SORT: Statistics and Operations Research Transactions, 39(2):149–186.
OpenUrl Google Scholar
↵
Fahrmeir, L., Kneib, T., Lang, S., and Marx, B. (2013).Regression: Models, Methods and Applications. Springer Science & Business Media.
Google Scholar
↵
Glöckner, S., Krause, G., and Höhle, M. (2020). Now-casting the COVID-19 epidemic: The use case of Japan, March 2020. medRxiv.
Google Scholar
↵
Gressani, O., Faes, C., and Hens, N. (2022). Laplacian-P-splines for Bayesian inference in the mixture cure model. Statistics in Medicine, 41(14):2602–2626.
OpenUrl Google Scholar
↵
Gressani, O. and Lambert, P. (2018). Fast Bayesian inference using Laplace approximations in a flexible promotion time cure model based on P-splines. Computational Statistics & Data Analysis, 124:151–167.
OpenUrl Google Scholar
↵
Gressani, O. and Lambert, P. (2021). Laplace approximations for fast Bayesian inference in generalized additive models based on P-splines. Computational Statistics & Data Analysis, 154:107088.
OpenUrl Google Scholar
↵
Gressani, O., Wallinga, J., Althaus, C., Hens, N., and Faes, C. (2021). EpiLPS: a fast and flexible Bayesian tool for near real-time estimation of the time-varying reproduction number. medRxiv.
Google Scholar
↵
Günther, F., Bender, A., Katz, K., Küchenhoff, H., and Höhle, M. (2021). Nowcasting the COVID-19 pandemic in Bavaria. Biometrical Journal, 63(3):490–502.
OpenUrl Google Scholar
↵
Gutierrez, E., Rubli, A., and Tavares, T. (2020). Delays in death reports and their implications for tracking the evolution of COVID-19. Available at SSRN 3645304.
Google Scholar
↵
Höhle, M. and an der Heiden, M. (2014). Bayesian nowcasting during the STEC O104: H4 outbreak in Germany, 2011. Biometrics, 70(4):993–1002.
OpenUrl CrossRef Google Scholar
↵
Jullion, A. and Lambert, P. (2007). Robust specification of the roughness penalty prior distribution in spatially adaptive Bayesian P-splines models. Computational Statistics & Data Analysis, 51(5):2542–2558.
OpenUrl Google Scholar
↵
Lang, S. and Brezger, A. (2004). Bayesian P-splines. Journal of Computational and Graphical Statistics, 13(1):183–212.
OpenUrl Google Scholar
↵
Lawless, J. (1994). Adjustments for reporting delays and the prediction of occurred but not reported events. Canadian Journal of Statistics, 22(1):15–31.
OpenUrl Google Scholar
↵
Noufaily, A., Farrington, P., Garthwaite, P., Enki, D. G., Andrews, N., and Charlett, A. (2016). Detection of infectious disease outbreaks from laboratory data with reporting delays. Journal of the American Statistical Association, 111(514):488–499.
OpenUrl Google Scholar
↵
Rue, H., Martino, S., and Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(2):319–392.
OpenUrl CrossRef Web of Science Google Scholar
↵
Van de Kassteele, J., Eilers, P. H. C., and Wallinga, J. (2019). Nowcasting the number of new symptomatic cases during infectious disease outbreaks using constrained P-spline smoothing. Epidemiology, 30(5):737
OpenUrl Google Scholar

Comments

medRxiv aims to provide a venue for anyone to comment on a medRxiv preprint. Comments are moderated for offensive or irrelevant content (this can take ~24 h). Please avoid duplicate submissions and read our Comment Policy before commenting. The content of a comment is not endorsed by medRxiv.

Community Reviews

medRxiv aims to inform readers about online discussion of this preprint occurring elsewhere. The content at the links below is not endorsed by either medRxiv or the preprint's authors.

Community reviews for this article:

There are no community reviews for this paper.

Automated Evaluations

Certain services provide automated analysis of preprints. Analyses invited by the authors are displayed at the top of this tab. Those done independently of authors are shown underneath . None of these analyses is endorsed by medRxiv.

Automated Evaluations:

There are no automated evaluations for this paper.

[1] ↵
Agresti, A. (2013). Categorical Data Analysis. John Wiley & Sons.
Google Scholar

[2] ↵
Donker, T., van Boven, M., van Ballegooijen, W. M., van’t Klooster, T. M., Wielders, C. C., and Wallinga, J. (2011). Nowcasting pandemic influenza A/H1N1 2009 hospitalizations in the Netherlands. European Journal of Epidemiology, 26(3):195–201.
OpenUrl CrossRef PubMed Google Scholar

[3] ↵
Durbán, M., Currie, I., and Eilers, P. H. C. (2002). Using P-splines to smooth two-dimensional Poisson data. In Proceedings of 17th International Workshop on Statistical Modelling, Chania, Crete, pages 207–214.
Google Scholar

[4] ↵
Eilers, P. H. C. and Marx, B. D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science, 11(2):89–121.
OpenUrl CrossRef Web of Science Google Scholar

[5] ↵
Eilers, P. H. C., Marx, B. D., and Durbán, M. (2015). Twenty years of P-splines. SORT: Statistics and Operations Research Transactions, 39(2):149–186.
OpenUrl Google Scholar

[6] ↵
Fahrmeir, L., Kneib, T., Lang, S., and Marx, B. (2013).Regression: Models, Methods and Applications. Springer Science & Business Media.
Google Scholar

[7] ↵
Glöckner, S., Krause, G., and Höhle, M. (2020). Now-casting the COVID-19 epidemic: The use case of Japan, March 2020. medRxiv.
Google Scholar

[8] ↵
Gressani, O., Faes, C., and Hens, N. (2022). Laplacian-P-splines for Bayesian inference in the mixture cure model. Statistics in Medicine, 41(14):2602–2626.
OpenUrl Google Scholar

[9] ↵
Gressani, O. and Lambert, P. (2018). Fast Bayesian inference using Laplace approximations in a flexible promotion time cure model based on P-splines. Computational Statistics & Data Analysis, 124:151–167.
OpenUrl Google Scholar

[10] ↵
Gressani, O. and Lambert, P. (2021). Laplace approximations for fast Bayesian inference in generalized additive models based on P-splines. Computational Statistics & Data Analysis, 154:107088.
OpenUrl Google Scholar

[11] ↵
Gressani, O., Wallinga, J., Althaus, C., Hens, N., and Faes, C. (2021). EpiLPS: a fast and flexible Bayesian tool for near real-time estimation of the time-varying reproduction number. medRxiv.
Google Scholar

[12] ↵
Günther, F., Bender, A., Katz, K., Küchenhoff, H., and Höhle, M. (2021). Nowcasting the COVID-19 pandemic in Bavaria. Biometrical Journal, 63(3):490–502.
OpenUrl Google Scholar

[13] ↵
Gutierrez, E., Rubli, A., and Tavares, T. (2020). Delays in death reports and their implications for tracking the evolution of COVID-19. Available at SSRN 3645304.
Google Scholar

[14] ↵
Höhle, M. and an der Heiden, M. (2014). Bayesian nowcasting during the STEC O104: H4 outbreak in Germany, 2011. Biometrics, 70(4):993–1002.
OpenUrl CrossRef Google Scholar

[15] ↵
Jullion, A. and Lambert, P. (2007). Robust specification of the roughness penalty prior distribution in spatially adaptive Bayesian P-splines models. Computational Statistics & Data Analysis, 51(5):2542–2558.
OpenUrl Google Scholar

[16] ↵
Lang, S. and Brezger, A. (2004). Bayesian P-splines. Journal of Computational and Graphical Statistics, 13(1):183–212.
OpenUrl Google Scholar

[17] ↵
Lawless, J. (1994). Adjustments for reporting delays and the prediction of occurred but not reported events. Canadian Journal of Statistics, 22(1):15–31.
OpenUrl Google Scholar

[18] ↵
Noufaily, A., Farrington, P., Garthwaite, P., Enki, D. G., Andrews, N., and Charlett, A. (2016). Detection of infectious disease outbreaks from laboratory data with reporting delays. Journal of the American Statistical Association, 111(514):488–499.
OpenUrl Google Scholar

[19] ↵
Rue, H., Martino, S., and Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(2):319–392.
OpenUrl CrossRef Web of Science Google Scholar

[20] ↵
Van de Kassteele, J., Eilers, P. H. C., and Wallinga, J. (2019). Nowcasting the number of new symptomatic cases during infectious disease outbreaks using constrained P-spline smoothing. Epidemiology, 30(5):737
OpenUrl Google Scholar

Bayesian nowcasting with Laplacian-P-splines

Abstract

1 Introduction

2 Methodology

2.1 Bayesian model formulation

2.2 Laplace approximation to the conditional posterior of ξ

2.3 Hyperparameter optimization

2.4 Nowcasting with prediction interval

2.5 Delay distribution

3 Simulations

4 Real Data application

5 Conclusion

Data Availability

Competing Interest Statement

Funding Statement

6 Appendix

A1

A2. Result for symmetric mean absolute percentage error (SMAPE)

References

Subject Area

Citation Manager Formats

Bayesian nowcasting with Laplacian-P-splines

Abstract

1 Introduction

2 Methodology

2.1 Bayesian model formulation

2.2 Laplace approximation to the conditional posterior of ξ

2.3 Hyperparameter optimization

2.4 Nowcasting with prediction interval

2.5 Delay distribution

3 Simulations

4 Real Data application

5 Conclusion

Data Availability

Competing Interest Statement

Funding Statement

6 Appendix

A1

A2. Result for symmetric mean absolute percentage error (SMAPE)

References

Subject Area

Follow this preprint