Abstract
Detailed reconstruction of the SARS-CoV-2 transmission dynamics and assessment of its burden in several parts of the world has still remained largely unknown due to the scarcity of epidemiological analyses and limited testing capacities of different countries to identify cases and deaths attributable to COVID-19 [1-4]. Understanding the true burden of the Iranian COVID-19 epidemic is subject to similar challenges with limited clinical and epidemiological studies at the subnational level [5-9]. To address this, we develop a new quantitative framework that enables us to fully reconstruct the transmission dynamics across the country and assess the level of under-reporting in infections and deaths using province-level, age-stratified all-cause mortality data. We show that excess mortality aligns with seroprevalence estimates in each province and subsequently estimate that as of 2021-10-22, only 48% (95% confidence interval: 43-55%) of COVID-19 deaths in Iran have been reported. We find that in the most affected provinces such as East Azerbaijan, Qazvin, and Qom approximately 0.4% of the population have died of COVID-19 so far. We also find significant heterogeneity in the estimated attack rates across the country with 11 provinces reaching close to or higher than 100% attack rates. Despite a relatively young age structure in Iran, our analysis reveals that the infection fatality rate in most provinces is comparable to high-income countries with a larger percentage of older adults, suggesting that limited access to medical services, coupled with undercounting of COVID-19-related deaths, can have a significant impact on accurate estimation of COVID-19 fatalities. Our estimation of high attack rates in provinces with largely unmitigated epidemics whereby, on average, between 10% to 25% individuals have been infected with COVID-19 at least twice over the course of 20 months also suggests that, despite several waves of infection, herd immunity through natural infection has not been achieved in the population.
Assessing the true burden of COVID-19 requires a comprehensive surveillance system to monitor outbreaks and a large capacity for diagnostic testing. However, this is particularly challenging for many low- and middle-income countries due to limited diagnostic capacities and access to healthcare [10, 1]. An alternative approach to measuring the burden of COVID-19 is to monitor changes in all-cause mortality trends with respect to baseline levels based on historic trends (i.e., measuring excess mortality). While excess mortality during the current pandemic may not exactly match the true number of COVID-19-related deaths due to various reasons such as disruptions in treatment for other fatal diseases and socio-economic disparities in health care in different regions of a country, several works have shown that a significant portion of excess deaths in many countries are directly attributable to COVID-19 [1, 5, 12-15]. This enables us to use excess mortality as a proxy for estimating the number of infections and fatalities from COVID-19 and correct for under-reporting of cases and deaths that may arise due to limited testing from suspected cases and uncertainty in the number of fatalities attributable to COVID-19 [16, 7].
Iran was among the first countries outside mainland China to report a large outbreak of SARS-CoV-2 in February and March 2020 and also acted as a major source for its spread in several countries in the Middle East and elsewhere [18]. It was also one of the first countries to experience a second wave of infection after the relaxation of non-pharmaceutical interventions (NPIs) in July 2020 [6, 9]. Since then, the country experienced three additional waves, two of which were driven by the Alpha and Delta variants of SARS-CoV-2, in May and September 2021. Despite a recent increase in the number of daily administered vaccine doses, Iran had a very slow start on its immunisation programme with only ∼3% of the population being fully vaccinated by the end of July 2021, during the surge in cases with the Delta variant [20]. Because the Ministry of Health and Medical Education (MoHME) of Iran stopped releasing province-level data on the number of confirmed COVID-19 cases and deaths from 2020-03-22, the transmission dynamics across the country have remained largely unknown. One way to estimate the number of infections and account for under-reporting of cases is by converting deaths attributable to COVID-19 to infections using a population-weighted infection fatality ratio (IFR) estimate (i.e., infections = deaths / IFR), taking into account the demographics of the population of interest [21]. However, such estimation of attack rates can be inaccurate or misleading if it does not take into account the impact of limited access to health care in municipalities with low socioeconomic status, uneven adoption of pharmaceutical and non-pharmaceutical intervention in different areas, and the increased chance of re-infection over time. Neglecting to account for these factors can result in an overestimate of SARS-CoV-2 attack rates and underestimation its potential for continued transmission, with immediate implications for assessments of the potential burden of SARS-CoV-2 variants of concern such as Omicron in a population. In this work, we use the newly updated province-level age-stratified weekly all-cause mortality data from the National Organization for Civil Registration (NOCR) of Iran and develop a mathematical framework to fully reconstruct the Iranian epidemic with the aim to estimate the attack rate, number of daily hospital admissions, deaths, and re-infection rates of SARS-CoV-2 across the country.
Results
We obtain age-stratified weekly all-cause mortality data from NOCR and apply a statistical model to calculate the weekly excess mortality for all 31 provinces and across 17 age-groups during the Iranian epidemic (see Methods section). Figure 1A shows 5 distinct peaks in excess mortality with a temporal trend that is strongly associated with the nationwide reported COVID-19 deaths over time. Several provinces show significant levels of excess mortality from the first week of February 2020 suggesting that the Iranian epidemic likely started at least a month before that (Figure 1B). This is in agreement with previous phylogenetic and epidemiological studies suggesting the epidemic started in late December 2019 to early January 2020 [6, 22]. Furthermore, during the first two waves of the Iranian epidemic, the excess mortality was roughly 2.5 times higher than reported COVID-19 fatalities. However, this dropped to 2.1 at the later stages suggesting that the testing capacity of the country to record infections and deaths gradually improved over time, but not to the extent to fully capture the majority of COVID-19-related deaths. This is also in agreement with statements from the Iranian health officials suggesting that Iran’s COVID-19-related deaths could be twice the official numbers and also previous studies showing elevated levels of under-reporting of COVID-19 fatalities in Iran, particularly in the early stages of the pandemic [6, 7]. We also find that the cumulative nationwide excess deaths by 2021-10-22 is 257,950 (95% CI: 221,530 - 278,440) which is roughly twice the 125,000 reported COVID-19 deaths at the time.
Figure 1B shows the intensity of excess mortality as quantified by the number of excess deaths per 100,000 persons in each province over time. During the first wave in spring 2020, only a few provinces showed significant levels of excess mortality with Gilan, Qom, Mazandaran, and Golestan among the hardest-hit provinces. This is likely because these provinces usually attract a large number of tourists and pilgrims around the New Year’s holiday in February/March which led to large superspreading events and uncontrolled outbreaks due to limited NPIs before the national lockdown on 2020-03-05 [6]. As the epidemic progressed, the timing of the peak excess mortality between different provinces were more simultaneous (Figure 1B; Figure S1). Most notably, during the third wave in October/November 2020 and the fifth wave in August/September 2021 with the Delta variant of SARS-CoV-2, almost all provinces experienced significant number of fatalities around the same time.
We also carry out a similar analysis to estimate the temporal pattern of COVID-19 fatalities for different age-groups (Figure 1C). We find that the majority of excess deaths are concentrated in the older age categories (>55) and that the third and fifth epidemic waves had the largest impact on younger age groups. Figure 1D shows the per capita ratio of excess mortality per age group with respect to the 40–49 age group. While we see a gradual increase in the ratio of deaths in the older age-groups during the first three waves, there is a two-fold drop in the 60-74 and 75+ age-groups before the start of the fifth wave. This is likely due to a higher vaccination coverage in these age-groups. While the exact number of vaccinated individuals per age-group in Iran is not known, given that the immunisation programme for individuals aged >75 started in May 2021 (first phase priority groups for vaccination in Iran are frontline healthcare workers as well as those aged above 80; see [20] for more information), and that Iran fully vaccinated roughly 5 times the population size of individuals aged 75 and above by 2021-09-07, it is likely that the drop in excess mortality in the older age-groups before and during the fifth wave is due to their high vaccination coverage.
Assuming that lack of testing is the primary reason behind the discrepancy between excess mortality and official COVID-19 deaths [1, 23, 24], we attribute the mean excess mortality during the pandemic period to COVID-19. This enables us to reconstruct the SARS-CoV-2 transmission dynamics across the country and infer the population fatality rate (PFR), IFR, and attack rate per province (see Methods section). We first estimate the national and provincial attack rates of SARS-CoV-2 under two different set of assumptions of the IFR by age relationship and select the better-fitting model by comparing the modelled seroprevalence estimates to data (Figure S2; Data S1-S6). We find that the model with a higher IFR per age-group (and an overall lower estimated attack rate) that explicitly accounts for an increase in IFR when hospital capacity is reached (see Methods section) provides a better fit to seroprevalence data. With a median value of approximately 0.50% (0.32-0.64%), our inferred province-level IFR is roughly twice as high as the population-weighted IFR, which is calculated using age-specific IFR estimates derived from serological studies in predominantly high-income countries (Table 1) [21]. This discrepancy is expected because disparities in healthcare access can be offset by the younger population age structure of many developing countries resulting in population IFRs that are similar to high-income countries [12, 25, 26]. Figure 2A and Figure 2B show a nearly 75% attack rate of SARS-CoV-2 across the country with nearly 15% of all infections due to reinfections by 2021-10-22. We also find that the age-distribution of the modelled COVID-19 deaths are well-aligned with the pattern of excess mortality per age-group and both follow a clear log-linear relationship with respect to age, a signature characteristic of COVID-19 fatalities [27], suggesting that the model successfully reconstructs age-dependent contact patterns for disease spread in Iran (Figure 2C). There is also a strong association between the modelled and observed daily hospital admissions over time and across provinces, further indicating that the majority of excess deaths during the pandemic period are directly attributable to COVID-19 (Figure S3). There is, however, a notably lower number of modelled daily hospital admissions compared to observed hospitalisations in some provinces (e.g., South Khorasan, Lorestan, and Yazd) during the last wave in comparison to earlier waves. To explore these patterns, we use a mixed-effects model to examine the relationship between the number of hospitalisations inferred from our model fits and the observed confirmed COVID-19 hospital admissions over time and across epidemic waves while controlling for province-specific variation (see Methods section). Using this framework, we show that there was no significant decrease in the ratio of model inferred hospitalisations to observed hospitalisations during the last wave compared to previous waves, and that the majority of variation in hospitalisation patterns observed is explained by province-level random effects, most likely due to specific changes in treatment seeking or the age demographics being most affected in later waves (Figure S4). However, a significant decrease in the ratio of model inferred hospitalisations to observed hospitalisations was observed since the start of the pandemic, which suggests that case fatality rates from hospitalised infections have decreased during this period, likely due to a combination of improved case management, increased number of infections resulting from reinfections, and higher vaccination coverage.
Using our transmission model, we also calculate the cumulative attack rate by 2021-10-22 for each province (Figure 2D). Our results show a high degree of heterogeneity in the estimated attack rates across the country with 11 provinces reaching close to and higher than 100% attack rates after the fifth wave (Table S1). While in most regions, the majority of infections took place during the third and fifth waves, the Delta variant of SARS-CoV-2 was responsible for the highest re-infection rates during the fifth wave (Figure S5). Qazvin, Qom, and East Azerbaijan had the highest per capita mortality rate with cumulative PFRs of nearly 0.4%, while Golestan, Khuzestan, and Qom had the highest re-infection rate with nearly 20% of infections being due to re-infection (Table 1; Figure S5). Figure S1 and Figure S3 also show that while some provinces such as Ardabil, Alborz, Semnan, Golestan, Hamedan, and Mazandaran experienced five distinct epidemic peaks, others like Bushehr, Qazvin, Gilan, Kohgiluyeh and Boyer Ahmad, and Hormozgan experienced sustained transmission of SARS-CoV-2 for an extended period of time throughout the pandemic. In late September 2021, official reports from MoHME indicate that COVID-19 deaths in Sistan and Baluchistan, Ardabil, Ilam, Bushehr, Chahaar Mahal and Bakhtiari, South Khorasan, North Khorasan, Zanjan, Kohgiluye and Boyer Ahmad, Lorestan, Hormozgan, and Hamedan have all dropped to below 5 individuals per day in recent weeks, an almost record-low since the first wave in March 2020 [28-30]. This is congruent with our estimates of high attack rates in these provinces which may have resulted in a (temporary) herd-immunity to be reached after the fifth wave.
Discussion
Since the start of the pandemic, the global heterogeneity in COVID-19 reporting systems globally has resulted in substantial debate about the true extent to which COVID-19 has spread in each country [4, 31, 32]. International comparisons of COVID-19 death tolls have subsequently relied on excess mortality due to under-reporting of COVID-19 deaths in many parts of the world [2]. This study is the first to formally explore the suitability of excess mortality as a proxy for estimating under-reporting of COVID-19 infections and deaths, confirming the information contained within reliable excess mortality statistics through the broad agreement with representative seroprevalence estimates. Additionally, this analysis lends further support to evidence that IFR patterns estimated in high income countries may underestimate the IFR in settings with stretched healthcare capacity [26], as experienced in many provinces in Iran at periods during the pandemic [33, 34]. Even though in some provinces we find that the transmission model with a lower estimated IFR provides a better fit to seroprevalence estimates, such a model offers a poor fit across most provinces and often results in epidemiologically unrealistic outcomes such that the modelled transmission dynamics outruns the pool of susceptible individuals. We also note that while some factors such as misidentification of baseline all-cause mortality and deaths attributable to COVID-19, particularly in the younger age-groups with fewer COVID-19-related deaths, can contribute to varied attack rates in some provinces, other factors such as nonrepresentative serosurveys, varied time to seroreversion using different assays, and uncertainty in inferring the true contact matrix between different age-groups within the population can impact the inferred attack rates [35, 36].
Prior to the emergence of the variants of concern, the basic reproduction number of the SARS-CoV-2 virus was estimated to be around 2.8 which corresponds to a herd immunity threshold of about 64% [37]. This threshold does not represent the point at which the outbreak immediately stops, but rather the point where, on average, fewer secondary infections are produced. During unmitigated outbreaks with such reproduction numbers, the attack rate can reach as high as 90% [38, 39]. Similarly, during unmitigated outbreaks with variants of concerns such as the Delta variant with increased reproduction numbers compared to the ancestral SARS-CoV-2 virus, the attack rate can reach even higher levels [40]. While the duration of antibody responses after coronavirus infection over longer timescales remains largely unknown, recent surveillance data and comparative studies have shown that re-infection with SARS-CoV-2 is possible after 3-6 months, and that older age-groups have a much lower protection against reinfection [41, 42]. Previous studies on human coronaviruses and SARS-CoV have also shown that protective immunity typically starts declining after 5-8 months post infection [43-45]. Our findings of high attack rates in several provinces show that herd immunity through natural infection has not been achieved in the population even after nearly 20 months since the start of the Iranian epidemic. This is likely due to substantial reduction in protection against repeat infection over time either due to waning immunity, increased chance of re-infection with variants of concern, or a combination of both. We note that while systematic biases such as identifying the baseline level of all-cause mortality and overestimation of deaths attributable to COVID-19, particularly in the younger age-groups, can contribute to increased attack rates in some provinces, other factors such as heterogeneities in immune protection, limited access to health care in municipalities with low socioeconomic status, and uneven adoption of public health intervention in different regions can also contribute to varied attack rates across the country.
Methods
Excess mortality
We collect the weekly time-series data on all-cause mortality per province per age group from NOCR. We use data from the beginning of year 1394 to the end of summer 1398 in Solar Hijri calendar (SH) (from 2015-03-15 to 2019-09-22 in Common Era) to calculate background mortality. We exclude mortality data during autumn 2019 (from 2019-09-23 to 2019-12-21) from the calculation of background mortality as we have previously reported elevated mortality rates across several provinces, unrelated to the COVID-19 pandemic, which can bias our estimates for the expected mortality [7]. We calculate the expected mortality using a linear regression model that accounts for seasonality and trend, previously developed to track excess mortality across more than 100 countries and territories around the world [1].
Transmission model
We use a previously published age-structured COVID-19 transmission model [46] and fitting framework [24] to fit the weekly estimated excess deaths in each province in Iran. In overview, the model is a population-based age-structured Susceptible-Exposed-Infected-Recovered model, which explicitly represents disease severity, passage through different healthcare levels and the roll out of vaccination. Model fitting was carried out within a Bayesian framework using a Metropolis-Hastings Markov Chain Monte based sampling scheme, which estimates the epidemic start date, R0 and the time varying reproduction number, Rt, using a series of pseudo-random walk parameters, which alter transmission every 2-weeks, given by: where f(x) = 2.exp(x)/(1+exp(x)), i.e., twice the inverse logit function. Each parameter is introduced two weeks after the previous parameter, serving to capture changes in transmission every two weeks. The last change in transmission, ρn, is maintained for the last 4 weeks prior to the current day to reflect our inability to estimate the effect size of this parameter due to the approximate 21-day delay between infection and death [47].
Each model fit used province-specific demography, with the population size in 5-year age bands sourced from the Statistical Center of Iran [48]. The effective number of general hospital beds and intensive care beds were sourced from the Iran statistical yearbook published in 1396 in Solar Hijri calendar [49]. At the time of writing, MoHME has not published subnational daily vaccination data prior to 2021-10-26, nor the proportion of which vaccines are administered. As a result, we used multiple data sources to approximate the roll of vaccines subnationally. From 2021-10-26, MoHME started to release the total number of first and second vaccine doses subnationally, which we use in combination with national vaccination data [50] to infer the daily vaccination rate per province. A mix of vaccine types have been administered in Iran, with 80% of the population vaccinated with Sinopharm, 10% with AZ and the remaining 10% either Sputkink-V (1.5%) and CoViran Barekat (8.5%). Due to the mix of vaccine types with uncertain efficacy estimates, particularly related to Delta, we use vaccine efficacy estimates for first and second dose efficacy sourced from estimates for the Sinopharm vaccine and other inactivated whole virus vaccine platforms (Table 2). Efficacy estimates are weighted to produce an overall population efficacy by the proportion of vaccinated individuals who have received either a first dose or second dose, and by the assumed proportion of the Delta variant in the population, sourced from the sampling dates of Delta sequences in Iran from GISAID (gisaid.org), lagged by 14 days to reflect the delay from administering vaccine to protection being conferred.
To estimate the attack rate over time, we explore three sets of model assumptions, which scan across the likely range of parameter estimates for key model parameters that impact the attack rate inferred. These parameters relate to the assumed number of general and ICU hospital beds available, the change in hospitalisation related to the Delta variant and the level of immune escape conferred by the Delta variant relative to previous infection by the Alpha variant and the original wild type strains. These ranges are described in Table 2. Given uncertainty in the IFR and its impact of the inferred attack rate, we explore two alternative estimates of IFR by age [21, 51] and conduct model comparison between the two references through comparison against province level estimated seroprevalence data [8], with seroprevalence assumed to be described by a Binomial distribution. Lastly, to explore if IFR is changing over time, we explore the ratio of model inferred hospitalisations and observed daily hospitalisations against time using a mixed-effects model, controlling for variation between provinces using a random intercept and slope with respect to time.
Data Availability
All the data used for the analysis are available online on our GitHub repository (https://github.com/mg878/Iran_WeeklyMortality), with all model fitting code and analysis available at (https://github.com/OJWatson/iran-ascertainment). For regular updates on excess mortality in Iran, you can visit: https://github.com/akarlinsky/world_mortality.
https://github.com/mg878/Iran_WeeklyMortality
Funding
MG is funded by the Biotechnology and Biological Science Research Council (BBSRC), grant number BB/M011224/1. OJW is supported by a Schmidt Science Fellowship in partnership with the Rhodes Trust.
Authors contribution
Conceptualization: MG, AK, OJW, LF; Methodology: MG, OJW, AKar; Investigation: MG, OJW; Visualization: MG, OJW; Supervision: AK; Writing – original draft: MG; Writing – review & editing: All authors.
Conflict of interest
None.
Data availability
All the data used for the analysis are available online on our GitHub repository (https://github.com/mg878/Iran_WeeklyMortality), with all model fitting code and analysis available at (https://github.com/OJWatson/iran-ascertainment). For regular updates on excess mortality in Iran, you can visit: https://github.com/akarlinsky/world_mortality.
Extended data
Supplementary Tables
Supplementary Figures
Supplementary Data
Data S1-S6: Transmission model fits per province using weekly excess mortality data (as at week ending on 2021-10-21) under the (S1) central, (S2) optimistic, and (S3) worst scenarios using the Brazeau et al. IFR [51] and the (S4) central, (S5) optimistic, and (S6) worst scenarios using the O’Driscoll et al. IFR [21] estimates. For each province, there are 5 panels showing the model fits for: (i) The time-varying effective reproduction number, Reff, in green. The curve in blue shows the predicted decrease in Reff due to increasing immunity in the population as a result of people being infected by COVID-19. Dark green and blue show the 50% confidence intervals and light green and blue show the 95% confidence intervals. A horizontal dashed line is shown at Reff < 1 indicates a slowing epidemic in which new infections are not increasing. Reff > 1 indicates a growing epidemic in which new infections are increasing over time. (ii) Observed (black circles) and modelled (red line) weekly deaths, with the thick red line depicting the median, the faint depicting individual draws from the posterior and the 95% credible interval shown with dashed black lines. (iii) Modelled cumulative deaths over time. (iv) Modelled seroprevalence (black line) and point estimates by Khalagi et al. [8] (red) and Poustchi et al. [9] (green). Shaded area shows the 95% confidence in modelled seroprevalence. Vertical and horizontal lines around each point estimate represents uncertainty in the reported measurement. (v) Modelled proportion of people exposed to SARS-CoV-2 over time. In both (iv) and (v), the central line depicts the median and the lighter shaded band shows the 95% credible interval.
Acknowledgements
Footnotes
We have now updated our results to include a compartmental SEIR model to fit the excess mortality data rather than using excess mortality directly to estimate national and subnational attack rates. In this version, we have removed some of the old analysis on using age-stratified IFR to calculate attack rates per age-group. Therefore, we have modified attack rate estimates (now included in the revised figure 2, table 1, supplementary table 1, and supplementary figures) as well as reinfection rate estimates (now included in figure 2). The supplementary files have also been updated.