Using newspapers obituaries to nowcast daily mortality: evidence from the Italian COVID-19 hot-spots ==================================================================================================== * Paolo Buonanno * Marcello Puca ## Abstract Real-time tracking of infectious disease outbreaks helps policymakers to make timely data-driven decisions. Official mortality data, whenever available, may be incomplete and published with a substantial delay. We report the results of using newspapers obituaries to nowcast the mortality levels observed in Italy during the COVID-19 outbreak between February 24, 2020 and April 15, 2020. We find that the mortality levels predicted using newspapers obituaries outperforms forecasts based on past mortality according to several performance metrics, making obituaries a potentially powerful alternative source of information to deal with real-time tracking of infectious disease outbreaks. Keywords * COVID-19 * Nowcasting * Big data * Excess mortality ## 1. Introduction Since the first suspected pneumonia cases observed on December 2019 in Wuhan (China), the novel coronavirus (COVID-19) causing a severe acute respiratory syndrome turned into a global pandemic.1 Having a timely reaction to control the outbreak of an infectious disease is a fundamental factor for the success of a containment measure [1, 2, 3]. While the number of reported cases and infections suffers from several measurement biases, comparing the total mortality rates to those of previous years offers a reliable information on the severity of an epidemic [4, 5]. Mortality data in the middle of a pandemic, however, are not perfect and difficult to estimate [6, 7].2 Mortality records, moreover, are published with substantial delay. For example, Britain’s National Statistical Office has recently started to release weekly mortality data after death certificates have been processed.3 In Italy, the National Statistical Institute released official mortality data about the January 1, 2020 to February 21, 2020 period only on March 31, 2020, and it usually releases mortality data with a one year lag.4 In this paper we propose to use newspapers obituaries as an alternative source of information to ‘now-cast’ daily mortality levels. Specifically, we use obituaries published on the local newspapers of Bergamo and Brescia municipalities, both in the region of Lombardy (Italy), during the Italian COVID-19 outbreak peak, that is from February 24 to May 14, 2020. The Italian region of Lombardy is considered the European hot-spot, with 88,183 reported cases and 15,974 deaths as of May 25, 2020, over a total population of approximately 10 million inhabitants[8, 9].5 Figure 1 displays the daily evolution of the raw mortality level (solid line) and the number of published obituaries (dashed line). While obituaries represent only a subset of the officially registered deaths, with a gap increasing at the peak of the outbreak, the correlation between the two measures is glaring. ![Figure 1:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/06/03/2020.05.31.20117168/F1.medium.gif) [Figure 1:](http://medrxiv.org/content/early/2020/06/03/2020.05.31.20117168/F1) Figure 1: Deaths vs Obituaries *Our contribution*. Building on standard forecasting techniques, we show the predictive power of newspapers obituaries as an alternative measure of mortality levels. We also compare different forecasting models and report that obituaries-based forecasts outperform all other considered models according to several accuracy criteria. ## 2. Results Table 1 reports retrospective estimates of daily mortality from February 24, 2020 to May 15, 2020, using several forecasting models, with *Panel A* (resp. *Panel B*) reporting observations for the municipality of Bergamo (resp. Brescia). We compare the estimated mortality level to the true mortality published by ISTAT on May 4, 2020 and computed different accuracy metrics described in 3. These measure include the root mean squared error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), the Theil’s U, the Akaike’s information criterion (AIC), and the Bayesian Information Criterion (BIC). We compare these measures for (i) ordinary least squares (OLS) estimates; (ii) “augmented” autoregressive-moving-average (AARMA(1,2)) estimates with obituaries as exogenous variables; (iii) one lag autoregressive estimates (AR(1)); three lags autoregressive estimates (AR(2)). Comparing these metrics, we report that the AARMA(1,2) model outperforms all other models according to every performance metric, for both municipalities in our sample. View this table: [Table 1:](http://medrxiv.org/content/early/2020/06/03/2020.05.31.20117168/T1) Table 1: Comparison of different forecasting models of mortality Figure 2 displays the forecasted mortality against the observed mortality level. A close inspection of the estimates shows that both the AARMA(1,2) and the OLS estimates outperform models based only on previously observed mortality data (i.e. AR(1) and AR(3)) over the entire period in our sample. Figure 3 displays the daily evolution of the estimated standard errors for each model. Also in this case, the OLS estimate outperforms the other models during the entire period in our sample. ![Figure 2:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/06/03/2020.05.31.20117168/F2.medium.gif) [Figure 2:](http://medrxiv.org/content/early/2020/06/03/2020.05.31.20117168/F2) Figure 2: Forecasts ![Figure 3:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/06/03/2020.05.31.20117168/F3.medium.gif) [Figure 3:](http://medrxiv.org/content/early/2020/06/03/2020.05.31.20117168/F3) Figure 3: Deaths vs Obituaries ## 3. Data and Methods The basic principle of now-casting is exploiting information which is published at a higher frequency than the variable of interest [11]. We explore the accuracy of newspapers obituaries published in local newspapers in predicting actual daily mortality in almost real time. Newspapers obituaries contain information on individual characteristics such as name, surname, gender, age, date of death, and the municipality of death. This information allows us to increase the information set available to external observers and estimate a real-time mortality rate. *Newspapers obituaries*. We digitalized newspapers obituaries published by *L’Eco di Bergamo* and *Il Giornale di Brescia*, the two most read and circulated newspapers in the province of Bergamo and in the province of Brescia, respectively.6 Our final dataset contains 4,054 unique individuals from February 24 to May 14, 2020 for the province of Bergamo and 3,784 unique individuals for the province of Brescia over the same period. We combine obituaries data with mortality data at the municipality level released by the Italian National Statistical Institute (ISTAT) on May 9, 2020.7 The ISTAT dataset contains daily deaths at the municipality level from January 1 to April 15, 2020 for a sample of 4,433 Italian municipalities. The ISTAT sample covers the universe of municipalities belonging to the two provinces of our analysis (243 municipalities in the province of Bergamo and 205 municipalities in the province of Brescia). *Formulation of the AARMA(1,2) model*. Our AARMA(1,2) model is motivated by the inspection of the autocorrelation and partial autocorrelation plots, which display a one lag significant autocorrelation coefficient, and a two lags partial autocorrelation coefficients. This leads us to estimate the following model ![Formula][1] where *yt* = ln(*mortalityt*) is the log-transformed mortality observed at time *t, xt* = ln(*obituariest*) is the log-transformed number of newspapers obituaries published at time *t*, which is assumed to be exogenous with respect to the time series {*yt*} (i.e. ℰ[*εt*|*xt*] = 0). *Accuracy metrics*. The RMSE, MAE, MAPE, and Theil’s U of the estimator ![Graphic][2] to the target mortality level *yt* are defined, respectively, as ![Graphic][3], ![Graphic][4], ![Graphic][5], ![Graphic][6], where *RMSEnaive* refers to the RMSE of a naive forecast, i.e. *yt* = *yt−1*. The AIC and BIC are defined, respectively, as ![Graphic][7] and ![Graphic][8], where ![Graphic][9] maximizes the likelihood function of the estimated model, *k* is the number of estimated parameters, and *T* is the sample size. ## 4. Discussion and concluding remarks We use newspapers obituaries to nowcast the mortality levels observed in Italy during the COVID-19 outbreak peak. We find that forecasting models using newspapers obituaries outperform other models based on previously observed mortality. Our approach, despite powerful, is not free from limitations. First, newspapers obituaries may underrepresent the actual mortality level, an issue that becomes more severe during the epidemic peak (see Figure 1). Such underrepresentation, however, goes against our estimates since it should decrease the precision of our estimates. Second, despite concentrated in the most affected Italian region, our sample refers only to two municipalities. We are agnostic about the existence of heterogeneous individual behavioral attitudes towards publishing newspapers obituaries in other locations.8 Understanding how such heterogeneity may affect our estimates constitutes a valuable path for future research. ## Data Availability The data used in this paper are authors' elaboration of publicly available data. ## Acknowledgements We thank Nunzia Vallini (Director of *Il Giornale di Brescia*) and Mauro Torri (CEO of *Editoriale Bresciana*) for their help. We thank Sergio Galletta for useful comments and discussions. We also thank Endri Avduli and Oumar Ben Salha for research assistance. ## Footnotes * 1 World Health Organization rolling updates available at [https://www.who.int/emergencies/diseases/novel-coronavirus-2019](https://www.who.int/emergencies/diseases/novel-coronavirus-2019). * 2 There is substantial evidence that the reported number of deaths underestimates the actual mortality value, c.f. [https://www.nytimes.com/interactive/2020/04/21/world/coronavirus-missing-deaths.html](https://www.nytimes.com/interactive/2020/04/21/world/coronavirus-missing-deaths.html), [https://www.nationalgeographic.com/science/2020/05/what-we-need-to-find-true-coronavirus-death-toll/](https://www.nationalgeographic.com/science/2020/05/what-we-need-to-find-true-coronavirus-death-toll/). * 3 C.f. [https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/bulletins/deathsregisteredweeklyinenglandandwalesprovisional/weekending20march2020](https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/bulletins/deathsregisteredweeklyinenglandandwalesprovisional/weekending20march2020). * 4 See Section 3 for further details. * 5 Data on cumulative cases are available at [http://www.protezionecivile.gov.it/media-communication/press-release/detail/-/asset\_publisher/default/content/coronavirus-la-situazione-dei-contagi-in-ita-37](http://www.protezionecivile.gov.it/media-communication/press-release/detail/-/asset_publisher/default/content/coronavirus-la-situazione-dei-contagi-in-ita-37). * 6 In 2019, the daily number of readers of *L’Eco di Bergamo* has been 402,000, while the daily number of readers of *Il Giornale di Brescia* has been 427,000. Source: [http://audipress.it/quotidiani/](http://audipress.it/quotidiani/) * 7 Data are available at the ISTAT website: [https://www.istat.it/it/archivio/240401](https://www.istat.it/it/archivio/240401). * 8 The large heterogeneity observed in civic attitude and prosocial behavior across Italian municipalities may play a role in determining such propensity to publish obituaries [12]. * Received May 31, 2020. * Revision received May 31, 2020. * Accepted June 3, 2020. * © 2020, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at [http://creativecommons.org/licenses/by-nc-nd/4.0/](http://creativecommons.org/licenses/by-nc-nd/4.0/) ## References 1. [1].Richard J Hatchett, Carter E Mecher, and Marc Lipsitch. Public health interventions and epidemic intensity during the 1918 influenza pandemic. Proceedings of the National Academy of Sciences, 104(18):7582–7587, 2007. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czoxMToiMTA0LzE4Lzc1ODIiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMC8wNi8wMy8yMDIwLjA1LjMxLjIwMTE3MTY4LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 2. [2].Shihao Yang, Mauricio Santillana, and Samuel C Kou. Accurate estimation of influenza epidemics using google search data via argo. Proceedings of the National Academy of Sciences, 112(47):14473– 14478, 2015. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czoxMjoiMTEyLzQ3LzE0NDczIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMDYvMDMvMjAyMC4wNS4zMS4yMDExNzE2OC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 3. [3].Shunqing Xuand Yuanyuan Li. Beware of the second wave of covid-19. The Lancet, 395(10233):1321–1322, 2020. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/s0140-6736(20)30845-x&link_type=DOI) 4. [4]. Paolo Buonanno, Sergio Galletta, and Marcello Puca. Estimating the severity of covid-19: evidence from the italian epicenter. Center for Law & Economics Working Paper Series, 3, 2020. 5. [5]. Chirag Modi, Vanessa Boehm, Simone Ferraro, George Stein, and Uros Seljak. How deadly is covid-19? a rigorous analysis of excess mortality and age-dependent fatality rates in italy. medRxiv, 2020. 6. [6]. Andrew Atkeson. How deadly is covid-19? understanding the difficulties with estimation of its fatality rate. Technical report, National Bureau of Economic Research, 2020. 7. [7].James H Stock. Data gaps and the policy response to the novel coronavirus. Technical report, National Bureau of Economic Research, 2020. 8. [8]. Marino Gatto, Enrico Bertuzzo, Lorenzo Mari, Stefano Miccoli, Luca Carraro, Renato Casagrandi, and Andrea Rinaldo. Spread and dynamics of the covid-19 epidemic in italy: Effects of emergency containment measures. Proceedings of the National Academy of Sciences, 117(19):10484–10491, 2020. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czoxMjoiMTE3LzE5LzEwNDg0IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMDYvMDMvMjAyMC4wNS4zMS4yMDExNzE2OC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 9. [9]. Dino Gibertoni, Kadjo Yves Cedric Adja, Davide Golinelli, Chiara Reno, Luca Regazzi, and Maria Pia Fantini. Patterns of covid-19 related excess mortality in the municipalities of northern italy. medRxiv, 2020. 10. [10]. H Theil. Applied economic forecasting, 1966. 11. [11]. Marta Bańbura, Domenico Giannone, Michele Modugno, and Lucrezia Reichlin. Now-casting and the real-time data flow. In Handbook of economic forecasting, volume 2, pages 195–237. Elsevier, 2013. 12. [12]. Robert Putnam. The prosperous community: Social capital and public life. The american prospect, 13(Spring), Vol. 4. Available online: [http://www](http://www). prospect. org/print/vol/13 (accessed 7 April 2003), 1993. [1]: /embed/graphic-5.gif [2]: /embed/inline-graphic-1.gif [3]: /embed/inline-graphic-2.gif [4]: /embed/inline-graphic-3.gif [5]: /embed/inline-graphic-4.gif [6]: /embed/inline-graphic-5.gif [7]: /embed/inline-graphic-6.gif [8]: /embed/inline-graphic-7.gif [9]: /embed/inline-graphic-8.gif