Abstract
Introduction Spain has been disproportionately affected by the COVID-19 pandemic, ranking fifth in the world in terms of both total cases and total deaths due to COVID-19 as of May 20, 2020. Here we derived estimates of pandemic severity and assessed its relationship with socio-demographic and healthcare factors.
Methods We retrieved the daily cumulative numbers of laboratory-confirmed COVID-19 cases and deaths in Spain from February 20, 2020 to May 20, 2020. We used statistical methods to estimate the time-delay adjusted case fatality risk (aCFR) for 17 autonomous communities and 2 autonomous cities of Spain. We then assessed how transmission and sociodemographic variables were associated with the aCFR across areas using multivariate regression analysis.
Results We estimated the highest aCFR for Madrid (25.9%) and the average aCFR in Spain (18.2%). Our multivariate regression analysis revealed three statistically significant predictor variables: population size, population density, and the unemployment rate.
Conclusions The estimated aCFR for 10 autonomous communities/cities in Spain are significantly higher than those previously estimated for other geographic regions including China and Korea. Our results suggest that public health interventions focused on densely populated areas and low socioeconomic groups can ameliorate the mortality burden of the COVID-19 pandemic in Spain.
Introduction
Since the emergence of COVID-19 in Wuhan, China in December 2019, the novel coronavirus (SARS-CoV-2) continues to spread throughout the world, straining and overloading healthcare systems and causing substantial morbidity and mortality burden during a short time period. As of May 20 2020, 4,789,205 confirmed cases including 318,789 deaths attributed to COVID-19 have been reported from 215 countries/territories/areas [1]. Thus far, the US has reported the highest number of cases (30.8%) and deaths (28.0%) globally. Other countries that follow the US in terms of COVID-19 death toll include the United Kingdom (11.1%), Italy (10.1%), France (8.8%) and Spain (8.7%) [1].
The case fatality risk is a useful metric to assess pandemic severity, which is typically estimated as the proportion of deaths among the total number of cases attributed to the disease [2, 3]. However, during the course of outbreak of an infectious disease such as COVID-19, real-time estimates of CFR need to be derived carefully since it is prone to ascertainment bias and right censoring [4, 5]. In particular, the disease spectrum for COVID-19 ranges from asymptomatic and mild infections to severe cases that require hospitalization and specialized supportive care. This may lead to overestimation of the CFR among ascertained cases. On the other hand, there is a delay from illness onset to death for severe cases [6], which could lead to an underestimation of the CFR [5, 7]. Therefore, statistical methods that help mitigate inherent biases in estimates of the CFR should be employed to accurately plan for medical resources such as ICU units and ventilators, which are essential resources to save the lives of critically ill patients [8–10].
Several studies have reported CFR estimates for COVID-19 [11–13]. Overall, these estimates have varied substantially across geographic regions even within the same country. For example, a recent study estimated the time-delay-adjusted CFR at 12.2% for the ground zero of the COVID-19 pandemic: the city of Wuhan [6], whereas for the most affected region in Italy (Lombardia), the delay-adjusted CFR reached 24.7% [14]. The drivers behind the geographical variations in the severity of the COVID-19 pandemic are yet to be investigated but could provide critical information to mitigate the morbidity and mortality impact of this and future pandemics [15].
In this study we aim to estimate the severity of COVID-19 pandemic across 19 geographic areas in Spain and aim to explain how these estimates varied geographically as a function of underlying factors. For the real-time estimation of severity, we adjust for right censoring using established methods [16, 17] and report the estimates of the time-delay adjusted CFR of COVID-19 for 17 autonomous communities (Comunidades Autonomas, CCAA), and 2 autonomous cities of Spain as well as for the entire Spain. We then assessed the association between different transmission and socio-demographic factors and the estimated CFRs across areas using multivariate regression analyses.
Methods
Study setting
Spain is situated on the Iberian Peninsula and is divided into 17 autonomous communities (CCAA) and 2 African autonomous cities [18]. Ceuta and Melilla are the 2 African autonomous cities whereas the 17 CCAA include: Andalucía, Aragon, Asturias, Balears, Canarias, Cantabria, Castilla-La Mancha, Castilla y Leon, Catalunya, C. Valenciana, Extremadura, Galicia, Madrid, Murcia, Navarra, Pais Vasco, La Rioja [18].
Initial cases of COVID-19 in Spain
The first case of COVID-19 in Spain was confirmed on 31st January, 2020 in La Gomera, Canary Islands in a person who was in contact with an infected person while in Germany [19]. By February 27, there were total 12 cases which increased to 45 cases in March 1 and then continued to rise rapidly throughout the country [20]. The incidence trajectory continued to rise for about 4 weeks before a gradual decline. The daily reported number of new cases has not exceeded 1000 for 10 consecutive days since May 11, 2020 [21]
Data Sources
The Ministry of Health of Spain releases daily reports on COVID-19 cases and deaths [21]. From these reports we retrieved the daily cumulative numbers of reported laboratory-confirmed COVID-19 cases and deaths from February 20, 2020 to May 20, 2020. We then stratified the data into 20 groups that included 17 CCAA, 2 African autonomous cities and for the entire Spain.
For each CCAA we obtained data on demographic, socio-demographic, and healthcare factors from the Statistics National Institute (Instituto Nacional de Estadística) [22] and the annual report of the national health system [23]. The demographic factors we included were: total population size, population density, proportion of population size by region, proportion of the population older than 60 years, proportion of the population older than 70 years, and life expectancy at birth. We included four socio-demographic factors: poverty risk rate, proportion of population of low social class, unemployment rate, and infant mortality rate. To assess the effect of healthcare factors, we obtained the data on percentage of total consolidated expenditure on hospital and specialized services, and public health expenditure per GDP by CCAA from the annual report issued by the national health system, 2018 [23]. Finally, we also included three factors related to transmission dynamics: The number of days from the date the first case was reported to the day the cumulative number cases surpassed one hundred cases, the cumulative morbidity (total number of reported cases) and the cumulative morbidity rate calculated by the cumulative number cases divided by the local population size. Table S1 summarizes the range and median of above variables for 17 CCAA regions in Spain. Additionally, we obtained the shapefiles of the autonomous communities of Spain from the national geographic information system of Spain [24].
Time-delay adjusted CFR estimation
The crude CFR is defined as the number of cumulative deaths divided by the number of cumulative cases at a specific point in time. For the estimation of CFR in real time during the course of outbreak, we employed the delay from hospitalization to death, hs, which is assumed to be given by hs = H(s) – H(s-1) for s>0 where H(s) is a cumulative density function of the delay from hospitalization to death and follows a gamma distribution with mean 10.1 days and SD 5.4 days, obtained from the previously published paper [6]. Let πa,ti be the time-delay adjusted case fatality risk on reported day ti in area a, the likelihood function of the estimate πa,ti is where ca,t represents the number of new cases with reported day t in area a, and Da,ti is the cumulative number of deaths until reported day ti in area a [16, 17]. Second parenthesis corresponds to the estimated crude case fatality risk at day ti in area a. Among the cumulative cases with reported day t in area a, Da,ti have died and the remainder have survived the infection. The contribution of those who have died with biased death risk is shown in the middle parenthetical term and the contribution of survivors is presented in the right parenthetical term. We assume that Da,ti is the result of the binomial sampling process with probability πa,ti.
The priors specified in our Bayesian model were non-informative priors. We estimated model parameters using a Monte Carlo Markov Chain (MCMC) method in a Bayesian framework. Posterior distributions of the model parameters were estimated by sampling from the three Markov chains. For each chain, we drew 100,000 samples from the posterior distribution after a burn-in of 20,000 iterations. Convergence of MCMC chains were evaluated using the potential scale reduction statistic [25, 26]. Estimates and 95% credibility intervals for these estimates are based on the posterior probability distribution of each parameter and based on the samples drawn from the posterior distributions.
Multivariate regression analysis
After quantifying the extent of spatial auto-correlation of CCAA-level time-delay-adjusted CFR median estimates, with the use of Moran’s I statistic with a Delauney triangulation neighbours spatial mixing matrix [27, 28], we explored the association between time-delay-adjusted CFR and predictor variables to identify simplified models with significant factors linked to the variation in CFR estimates across geographic areas in Spain. Multicollinearity tests were examined using variance inflation factors (VIF). Predictors exceeding VIF value of 4.0 were excluded one by one from the multivariate analyses. By fitting all combinations of models given a set of predictors, we derived best minimal models and indicated in Table 2. By taking 1000 draws from the posterior estimates of time-delay-adjusted CFR and running the linear regression 1000 times, the combination of variables with the largest count was selected as the final model. With this model, pooled estimates accounting for both within-model uncertainty and between models uncertainty were derived.
All statistical analyses were conducted in R version 3.6.1 (R Foundation for Statistical Computing, Vienna, Austria).
Results
As of May 20, a total of 233,037 cases and 27,940 deaths due to COVID-19 have been reported in Spain. Moreover, the Madrid region has reported the highest number of cases at 67,049 (28.8%) and deaths at 8,931 (32.0%) followed by Catalunya with 55,888 cases (24.0%) and 6,021 deaths (21.5%).
Figure 1 displays the observed and posterior estimates of crude case fatality risk in Spain, 17 CCAAs and 2 autonomous cities (A through T). Day 1 corresponds to March 1st in 2020. Black dots show the crude case fatality risks, and light and dark indicate 95% and 50% credible intervals for posterior estimates, respectively. Our model-based crude CFR fitted the observed data well in all the regions except Aragon where the model did not fit well for first 2 weeks. There was a rapid rise in crude CFR in the Aragon, Asturias, Castilla-La Mancha, C. Valenciana, Extremadura, and in Madrid.
Figure 2 illustrates observed and model based posterior estimates of time-delay-adjusted CFR in the 20 areas. Black dots show crude case fatality risks, and light and dark indicate 95% and 50% credible intervals for posterior estimates, respectively. In most of the regions and at the national level, we saw a greater difference between the time delay adjusted CFR and crude CFR in the first 4 weeks of the epidemic followed by a slowly declining difference in the later stage of the epidemic. The graph of the time-delay adjusted CFR varies considerably for different areas. For instance, as the epidemic progressed, the adjusted CFR increased slightly in CN and NC. In contrast, it showed a rapidly increasing trend for first three weeks and then gradually declined in CL. For AR, it was at a high level for the first two weeks after which it started to decline. For the most affected MD region, the time-delay adjusted CFR fluctuated for first 25 days followed by a decline, a leveling off, and then moved upwards.
A summary of the time delay adjusted case fatality risk, range of median estimates and crude CFR of COVID-19 across different areas of Spain are presented in Table 1. Our model based posterior estimates of time-delay adjusted CFR are higher than the observed crude CFR. The Madrid autonomous community had the highest time delay adjusted CFR of 25.9% [95% credible interval: 25.3-26.3%] followed by Extremadura (EX) (21.7%) [95%CrI: 20.0-23.4%], Aragon (AR) (21.3%) [95%CrI: 19.9-22.5%], and Asturias(AS) (20.3%) (95%CrI: 18.3-22.4%). The national estimate for Spain was 18.2% (95%CrI: 18.0-18.4%]. Of the 19 autonomous areas of Spain, 13 had the time-delay adjusted CFR greater than 10% (Table 1, figure 3).
Our analysis of the statistical geospatial heterogeneities in the time-delay adjusted using Moran’s I statistic, which represents the extent of the degree of spatial autocorrelation, was not statistically significant (Moran’s Index = 0.08, P = 0.15). After inspecting VIF, the proportion of population size by region, proportion of seniors aged 60 and above, life expectancy at birth, public health expenditure per GDP, proportion of low social class, poverty risk rate, cumulative morbidity, and the cumulative morbidity rate were excluded from the multivariate analysis.
Autonomous communities with larger population size (p value= 0.10 (95%CrI: 0.05 – 0.20)), higher population density (p value= 0.08 (95%CrI:0.05 – 0.12)) and with a higher unemployment rate (p value=0.07 (95%CrI: 0.04 – 0.14)) experienced higher time-delay adjusted CFRs. These three significant factors explained 16% (95%CrI: 7 – 24)) (adjusted R-squared) of the variation of the pandemic severity across CCAAs (TablelJ2). The correlation matrix of predictors with correlation coefficient (r) is presented in Figure S3. We found that the population size is strongly correlated (r >= 0.7 or r <= −0.7) with the proportion of population size by region (r = 1.00) and the cumulative morbidity while population density is strongly correlated with the proportion of low social class (r = −0.77) and the cumulative morbidity (r = 0.70). Meanwhile the unemployment rate is strongly correlated with the poverty risk rate (r=0.91) and the life expectancy at birth (r=−0.76).
The range of variables included in the linear regression analysis are provided in supplementary Table S1.
Discussion
In this paper, we have estimated the time delay adjusted case fatality risk of COVID-19 for 19 autonomous areas/cities of Spain. Our estimate of time-delay adjusted CFR in Spain was at 18.2%, but it varied widely across the 19 Spanish areas, with some areas exhibiting higher CFR values such as in Madrid (25.9%), and Extremadura (21.7%) while other areas such as Melilla (2.9%), Ceuta (4.6%), and Galicia (6.9%) experiencing relatively lower CFR values. Of the 19 areas, 13 experienced an adjusted CFR greater than 10%. We also observed a significant positive association of the time-delay adjusted CFR with population size, population density, and unemployment rate across 17 Spanish areas. Our findings suggest the need for additional control efforts as well as generating real time regional severity estimates particularly for areas with high population density and unemployment rate, which have been hit hard by the COVID-19 pandemic.
The adjusted CFR estimates in Spain (18.2%) is higher than the estimates for Wuhan (12.2%)[6], and Korea (1.4%) [29] and Italy (17.4%) [14]. All of these studies employed same methods and hence are comparable. When we compare the estimates for the most affected areas across different countries, the risk in Madrid in Spain (25.9%) is higher than that estimated for Wuhan in China (12.2%) [6], Daegu in Korea (2.4%) [29] and Lombardia in Italy (24.7%) [14]. This difference across countries and regions may be partly explained by differences in population age structure, density, other socio-demographic factors, and the scale of the pandemic in different areas. The median age in Spain (44.9 years) [30] is comparable to that of Italy (45.4 years) but higher than that for China (36.7 years) [31]. Indeed, the elderly population is at risk of severe outcomes from COVID-19 [32–34], which could partly explain the higher severity observed for Italy and Spain. Other factors that could have played a role in these differences may be related to differences in testing strategies. For example, in Korea extensive testing and rigorous contact tracing strategy were implemented [35] while testing prioritized more severe cases in Italy [34] and Spain [36].
Our results from the multivariate analysis found a significant positive association of adjusted CFR with population size and population density, emphasizing the importance of social distancing. We also found a significant positive association between adjusted CFR and unemployment rate across areas in Spain. Unemployment rate is not only an indicator of performance of labor market but also an indicator of socioeconomic status for an area, and this is supported by the strong positive correlation with poverty risk and negative correlation with the life expectancy at birth. Unemployment rate can lead to decreased investment in health, education, nutrition, and good housing condition [37]. In any pandemic situation like COVID-19, the poorer tend to exhibit the highest morbidity and mortality rates. For instance, lower socioeconomic groups were also disproportionately affected by the 1918 influenza pandemic [38, 39]. Data from the US and the UK have shown marked ethnic and racial disparities in COVID-19 death rates [40]. Moreover, preliminary COVID-19 mortality data from the US also indicates a 2-fold age-adjusted death rate among Hispanic/Latino and 1.9-fold among Black/African American compared to Whites [41]. However, we do not know whether the association between unemployment rate and adjusted CFR is inflated by a limited access to COVID-19 testing among low socio-economic groups.
In our study we saw considerable variations in CFR trend across areas. For instance, as the epidemic progressed, the adjusted CFR showed a slightly upward trend in CN and NC, a rapid upward trend followed by the slow decline in CL, and a downward trend in AR. Likewise, for CM the graph showed an upward trend followed by a relative decline and then again an upward trend before declining gradually. The CFR trend for the 19 autonomous areas can be helpful in the planning and implementation of health care services and prevention measures separately for each of them. The downward trend in CFR as seen in some of the areas in our study suggest the improvement in epidemiologic surveillance leading to the increased capture of mild or asymptomatic cases. A higher number of mild and asymptomatic cases also indicate an increase in human-to-human transmission leading to a prolonged epidemic which can be controlled through effective social distancing measures until an effective vaccine or treatment becomes available [6].
The upward trend in CFR indicates that the temporal disease burden exceeded the capacity of healthcare facilities and the surveillance system probably missed many cases during the early phase of the epidemic [6], particularly due to a significant presence of mild and asymptomatic cases. It has been found that about 18% of the COVID-19 infections in Diamond Princess Cruise ship were asymptomatic [42]. The increasing trend in CFR could further be explained the nosocomial transmission affecting the health care workers, inpatients and their visitors [6]. In China, of 44672 confirmed COVID-19 cases, 3.8% was among the health care personnel [43]. Similarly, Wang et al. in their study suspected 41% of the patients to have human-to-human hospital associated transmission of COVID-19 [44].
Our study has some limitations. The preferential ascertainment of severe cases bias in COVID-19 may have spuriously increased our estimate of CFR [5], which is a frequent caveat in this type of studies [45, 46]. Similarly, given the long infection-death time for COVID-19 which ranges between 2 to 8 weeks [30], our estimate may have been affected by delayed reporting bias [5, 7]. Similarly, in our data, the date of report reflects the date of reporting and not the date of onset of illness. Finally, we assumed infant mortality and poverty risk rate as a proxy for areas with low socio-economic groups.
Conclusion
The risk of death due to COVID-19 in Spain was estimated at 18.2%, but estimates varied substantially across 19 geographic areas. The CFR was as high as 25.9% in Madrid and as low as 2.9% in Melilla. Of the 19 autonomous areas/cities, 13 had a time-delay-adjusted CFR greater than 10% reflecting a disproportionate severity burden of COVID-19 in Spain. Importantly, our estimate of CFR for the most affected Madrid region is higher than previous estimates for the most affected areas within China, Korea, and Italy. Our findings suggest a significant association between unemployment rate, population density, and population size with an increased risk of death due to COVID-19. Further studies with patient level data on mortality, and risk factors could provide a more detailed understanding of the factors shaping the risk of death related to COVID-19.
Data Availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
Note
Financial support
KM acknowledges support from the Japan Society for the Promotion of Science (JSPS) KAKENHI Grant Number 20H03940, and from the Leading Initiative for Excellent Young Researchers from the Ministry of Education, Culture, Sport, Science & Technology of Japan. GC acknowledges support from NSF grant 1414374 as part of the joint NSF-NIH-USDA Ecology and Evolution of Infectious Diseases program; UK Biotechnology and Biological Sciences Research Council grant BB/M008894/1.
Potential conflicts of interest
No conflict.
Additional files
Additional file 1: Table S1. The range of major socio-demographic, and healthcare variables for 17 regions in Spain used in our analyses (linear regression).
Additional file 2: Figure S1. Daily number of reported COVID-9 cases by region in Spain. Day 1 corresponds to March 1st in 2020.
Additional file 3: Figure S2. Daily number of reported COVID-9 deaths by region in Spain. Day 1 corresponds to March 1st in 2020.
Additional file 4: Figure S3. Correlation matrix of predictors.
Footnotes
↵* Joint first authors