Abstract
Autocratic and democratic leaders have an incentive to misreport data that may reveal policy failure. However, it is easier for autocratic leaders to fabricate data because they are not subject to scrutiny from media, opposition parties, and civil society. This suggests that autocratic governments are more likely to manipulate policy-relevant statistics than democratic governments. It is inherently difficult to test that claim because researchers typically do not have access to data from sources other than the government. The COVID-19 pandemic represents a unique opportunity to examine the relationship between regime type and data manipulation because of its widespread impact, as well as the ability to compare reported with excess deaths and test for statistical anomalies in reported data. Based on regressions for undercounting and statistical irregularities that take into account unintentional mismeasurement, I find that autocratic governments are more likely to deliberately under-report the impact of COVID-19 than their democratic counterparts.
Introduction
Are autocratic governments more likely to manipulate policy-relevant data than democratic governments? According to one view, autocratic and democratic leaders both have an incentive to misreport data when the truth may reveal incompetence, leading to protest or electoral losses. However, data manipulation is harder to achieve in a democracy because the government is subject to greater scrutiny from opposition parties, independent media, and civil society organizations (Carlitz and McLellan 2021; Hollyer et al. 2011). According to another view, autocratic governments are primarily concerned about the interpretation of information, rather than access to information (Rozenas and Stukal 2019). Autocratic leaders may calculate that bad news will not lead to collective action if they can persuade citizens that the government was not responsible. Autocrats can, for example, use their control over media and the internet to convince citizens that a bad outcome is due to external forces (e.g., global macroeconomic factors) or natural phenomena (e.g., pandemic) that are beyond the government’s control.
It is not immediately obvious, therefore, that autocratic leaders are more likely to provide inaccurate data than their democratic counterparts. In this study I argue that autocrats retain an incentive to manipulate data even as they seek to shift the blame to external forces or natural phenomena. This is because they cannot be sure whether their attempts to persuade citizens that they are not responsible will succeed. They have a greater capacity to shape perceptions than democratic leaders, but there is a risk that a critical number of citizens remain unconvinced, leading to criticism and protest. Democratic leaders, by contrast, are less able to successfully deploy either strategy - manipulate data or shift the blame - because they lack control over traditional and online media, opposing political parties, and civil society groups.
It is inherently difficult to prove whether autocrats are more likely to manipulate because data are typically not available from a source that is independent of the government. Historically, national statistical agencies emerged in order to serve the aims of government (Brambor et al. 2020; Brewer 1990, Chapter 8), and they often remain subject to political pressure or direct control (Herrera and Kapur 2017, pp. 375–376). Various methods have been developed to enable researchers to detect manipulation when government-independent data is not available. Firstly, they may use a substitute indicator to estimate the real values (e.g., night time lights as a proxy for GDP) (Magee and Doces 2015; Martínez 2022). However, that strategy can only be used in those cases where a suitable proxy is available. Secondly, they may check the reported data for statistical anomalies. For example, they may examine whether the digits are consistent with Benford’s Law (Michalski and Stoltz 2013), or whether a count sequence is unexpectedly smooth (Kobak 2022). However, attentive autocrats can manipulate the data such that they accord with the expected pattern. Thirdly, researchers may infer from aggregate level data that has been provided by the government. For example, they may estimate excess deaths due to a shock or policy intervention by comparing total mortality with past trends (Ashton et al., 1984; Spagat & van Weezel, 2017). However, that method is dependent on the availability of complete and reliable data at the aggregate-level. Fourthly, they may estimate the real data values for a particular country based on statistical models that reliably predict the outcome of interest in those countries where data is known to be complete and accurate (Viboud et al. 2016). This method is reliant on the availability of veridical data for the variables that are included in the predictive model. Governments may be willing to provide accurate measures of those variables if, by themselves, they do not reveal incompetence. Alternatively, data for those predictor variables may be available from sources independent of the government.
While all of these methods have their shortcomings, each enables the researcher to get closer to the truth, especially in those cases where tampering is suspected. Each provides a way to reduce the likelihood that they are drawing incorrect conclusions about the phenomena under study. In the analysis below, I rely on data that have been constructed based on a combination of the last two methods (aggregate trends and predictive model), as well as the second method (testing for statistical anomalies).
The devastating pandemic produced by the spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) represents a unique opportunity to examine whether autocratic governments are more susceptible to data manipulation than their democratic counterparts. Firstly, the sheer scale of the shock and, therefore, the potential threat posed to the legitimacy of each government in the eyes of their citizens, magnifies the incentive for political leaders to hide the truth. Secondly, excess mortality can be estimated for most sovereign states based on a combination of predictive models and a comparison between total mortality trends before and after the onset of the pandemic. The deviation of reported COVID-19 deaths from excess deaths provides an estimate of the extent to which governments are, deliberately or unintentionally, undercounting. If we control for the factors that have led to unintentional undercounting, then we can estimate the degree of manipulation and whether it varies by regime type. Furthermore, we can directly test for manipulation by checking whether there are statistical anomalies in the daily reported cases and deaths. Deviation of those reported numbers from Benford’s Law or expected variation across time allows us to detect manipulation and assess whether it varies by regime type. Thirdly, the pandemic has impacted virtually every country in the world and so we can examine the phenomenon of data manipulation for a large sample of countries. In the following analysis I utilize excess mortality data for as many as 197 states and statistical anomaly estimates for as many as 201 states.
Previous studies on the impact of regime type on data manipulation have mostly focused on the reporting of data in autocratic contexts (e.g. Carlitz and McLellan 2021; Chen et al. 2019; Lamberova and Sonin 2022; Wallace 2022, Chapter 6). They do not, therefore, provide a systematic comparison between data manipulation in autocratic and democratic countries. Hollyer et al (2011) represents a partial exception, but their focus is on the withholding of policy-relevant data, rather than the misreporting of such data. As Carlitz and McLellen (2021) note, many autocracies are now more willing to provide development data due to the expectations of foreign aid donors, as well as the targets specified in the Millennium Development Goals and Sustainable Development Goals. The question is whether they accurately report such data.
Magee and Docees (2015) and Martínez (2022) investigate the impact of regime type on the manipulation of economic data for a large number of democratic and autocratic countries. Using night time lights as a proxy for GDP both studies find that autocratic regimes tend to overstate yearly GDP growth. In the current study I shift the focus to the reporting of population health statistics.1 Three recent studies examine the relationship between regime type and the manipulation of COVID-19 data using Benford’s law (Kilani 2021) and the discrepancy between reported and excess mortality (Knutsen and Kolvani 2022; Neumayer and Plümper 2022) for a large sample of countries. While those studies represent important contributions, each relies on one method for measuring the manipulation of COVID-19 data. This means their results may reflect a potentially biased sample of countries (e.g. exclusion of countries with insufficient observations, or a small number of cases and deaths, in order to fulfill the criteria for Benford digit analysis), measurement error (e.g. excess death estimates that include non-covid deaths), or the particular model used to estimate excess deaths (e.g. the set of covariates used to predict total deaths in locations without sufficient all-cause mortality data). To address that issue I utilize three distinct measures of manipulation (ratio of excess to reported deaths, Benford-noncompliance of daily counts, and underdispersion of daily counts). Moreover, for the sake of further robustness, I use four different estimates of excess mortality, as well as four different measures of regime type. This multi-measurement approach helps to reduce the likelihood that the overall conclusions of this study are erroneous.
In the following, I firstly elaborate on the theory behind the claim that autocratic governments manipulate policy-relevant statistics more than democratic governments. I then describe the ways in which governments can manipulate published data. In the subsequent section, I outline those factors independent of regime type that increase the likelihood that COVID-19 deaths will go under-reported. I then describe the estimation models, results, and 11 robustness checks. In the penultimate section I discuss how the results align with the existing literature on the link between regime survival and information control, as well as the limitations of this study. I conclude by outlining the problems that data manipulation creates for citizens, researchers, and international organizations, as well as potential solutions.
Why do autocratic leaders manipulate reported data more than democratic leaders?
Autocratic leaders have an incentive not to share accurate information if it might lead to their removal via mass mobilization. That is, full and accurate disclosure may enable citizens, firstly, to become aware of any failure in government policy and, secondly, to realize that their fellow citizens are also aware of that failure. As a result, each citizen has enough information to judge whether protests will generate sufficient participation to remove the government (Hollyer et al. 2015). Incumbent democratic leaders also have an incentive to tamper with published data when the actual data might threaten their performance at the ballot box. However, it is harder for them to hide poor performance in this way because they are subject to greater scrutiny from opposition parties, civil society, and media. There is a greater likelihood that manipulation by a democratic government will be detected, simultaneously drawing further attention to the negative news it was attempting to hide and tainting its credibility in the eyes of voters.
Autocratic leaders may also attempt to use their control over traditional and online media to persuade citizens that they are not responsible for bad news (Rozenas and Stukal 2019).2 They may, for example, promote the view that the COVID-19 pandemic is an unstoppable natural phenomenon, or that its spread is due to the containment failures of other countries. Indeed, this may be the preferred approach in those cases where the disclosure of full and accurate data helps to combat the problem. Hiding the severity of an epidemic, for example, may lure citizens into a false sense of security, frustrating attempts to encourage life-saving changes in behavior (e.g., physical distancing, mask-wearing, and vaccination).
However, autocratic leaders cannot solely rely on that strategy because of the risk that it will fail. They have a greater capacity to shape the perceptions of citizens than democratic leaders, but they cannot rule out the possibility that a critical number of citizens will remain unpersuaded, thereby exposing them to criticism and protest. Ironically the lack of communication openness that is needed by autocrats to prevent collective action also makes it harder for them to gauge the proportion of citizens who do not believe their spin (Schedler 2013, pp. 37–39). Fear of reprisal means the outward behavior or expressed opinion of citizens in an autocratic context may not track whether they actually believe the government’s narrative (Jiang and Yang 2016; Wedeen 2015). Under these conditions of uncertainty, autocrats are more likely to conclude that a combination of data manipulation and shifting the blame (i.e. hiding the true extent of the bad news and spinning that news) represents the optimal way to prevent collective action.
Generally, the exact level of data manipulation (and concomitant level of spin) will depend on context. Autocrats may reduce the level of manipulation when citizens can reliably infer the real values (e.g. comparing official inflation statistics with supermarket prices),3 or when it starts to threaten the implementation of policy. Too much under-reporting of COVID-19 infections and deaths, for example, may frustrate attempts to encourage physical distancing, mask-wearing and vaccination. In such cases they are more reliant on persuading citizens that they are not responsible. By the same token, they may increase the level of manipulation when citizens find it harder to verify data (e.g. epidemic, conflict, or famine deaths), or it is less likely to hinder the implementation of policy (e.g. famine response). In such cases they are less reliant on persuading citizens that they are not responsible.
A further strategy available to political leaders is to simply not publish any data that may reveal incompetence. However, the complete withholding of data about a politically sensitive topic can be detected by citizens, whereas the misreporting of data is harder for them to detect. In this sense the withholding of data is akin to the overt censorship of content. Both are observable. As with overt censorship, the absence of data on a salient topic may rouse the suspicions of citizens, thereby encouraging them to invest more effort into finding out about the topic (e.g. circumventing the government’s control over the internet by using virtual private networks to access blocked content) (Roberts 2020, Chapter 4). Moreover, those efforts may lead them to uncover information about other politically sensitive topics (Hobbs and Roberts 2018). Because the withholding of data may backfire, the fabrication of published data typically represents a less risky way for autocratic leaders to forestall collective action.
So far I have assumed that the incentive to manipulate increases if the release of accurate data will reveal bad news, with the ensuing risk of domestic criticism and protest. However, it should be noted that the incentive to falsify data also rises when a government is prescribed macroeconomic and development targets by international and regional organizations such as the International Monetary Fund, World Bank, and European Union (Aragão and Linsi 2022; Herrera and Kapur 2017, pp. 378–379; Sandefur and Glassman 2015). Similarly, governments have an incentive to adjust published statistics in order to secure more funding (Kerner et al. 2017; Morgenstern 1965, p. 21), or to demonstrate to aid donors that their money is being well spent (Jerven 2013, pp. 75–77, 87). Autocracies may be more susceptible than democracies to both types of incentive, but in this study I will be focusing on the first. That is, controlling the flow of information so as to prevent collective action.
Because my focus is on policy-relevant data, I also set aside the incentive to manipulate electoral results. It goes without saying that electoral fraud is one of the tools that electoral autocrats can use to minimize the chances that the opposition will ever win an election. (Indeed, some of the techniques that may be used to detect anomalies in policy relevant-data have also been used detect anomalies in election returns (Deckert et al. 2011; Mebane 2011)). However, they may prefer to use their greater control over the dissemination of information (including the manipulation of official statistics and spinning the interpretation of those statistics) in order to shore up support, thereby, reducing the need to falsify vote counts in the first place.
How do governments manipulate reported data?
Manipulation may involve the blatant adjustment of data that has been accurately collected. This is perhaps best illustrated by the reporting of official government statistics during the Soviet era (von der Lippe 1999; Wheatcroft and Davies 1993). However, there are a range of cases where it is less obvious that the published data involves manipulation in the strict sense. Governments may, for instance, choose those estimates from a range of methodologically tenable estimates that favor its interests (Aragão and Linsi 2022; Kerner et al. 2017). The Rwandan government, for example, has been accused of deliberately underestimating income poverty during the period leading up to the 2015 referendum that made it possible for the incumbent president, Paul Kagame, to extend his rule for up to two more decades (“Has Rwanda been fiddling its numbers?” 2019; Wilson and Blood 2019). The World Bank’s senior advisor for the region defended the estimates arguing that there is no single best way to estimate income poverty (O’Brien 2019). Similarly, some governments may have exploited the disagreement among global medical authorities during the pandemic over when to classify the death of someone infected with SARS-CoV-2 as a death due to COVID-19 (Wang et al. 2022, p. 1515). Government’s that are keen to downplay the impact of the pandemic may have erred towards reporting another cause of death in such cases (Kobak 2021). Whether these scenarios amount to manipulation may hinge on whether the reported numbers would have been adopted by a statistical agency that is free from political pressure (Aragão and Linsi 2022; Prewitt 2010). Moreover, political pressure may take the form of direct interference by the government, or self-censorship by statisticians worried about their career prospects (Jerven 2013, p. 105). Irrespective, such cases involve the systematic biasing of reported data in favor of estimates that cast the government in a favorable light in the eyes of its citizens, donors, and lenders.
It should be noted that misreporting may also take place at the sub-national level. Local officials also have an incentive to hide policy failure and demonstrate policy success to the national leaders. In China, for example, there is an incentive for local governors to overstate regional economic figures in order to advance their career prospects (Wallace 2016). The Chinese government is aware of this and adjusts its national GDP estimates downward. Nevertheless, there is some evidence that the national government has been under-correcting since 2008 (See also Angrist et al. 2021, p. 235; Chen et al. 2019).
For the purposes of this study I set aside the question of whether unelected local officials are more likely to manipulate data than elected local officials. However, it is reasonable to assume that local officials who are elected to office in an open democracy face the same disincentives as their national counterparts. That is, there is a greater risk that the misreporting of politically sensitive data will be exposed by opposition politicians, media, and civil society groups.4 That is not to say that democratically elected leaders do not attempt to manipulate official statistics - indeed there is evidence that it does occur, especially during periods leading up to elections (Alt et al. 2014; Gandrud and Hallerberg 2017). Rather the claim of this study is that they have more incentives to avoid fabricating official statistics and, as a result, it is less likely to occur.
Unintentional under-reporting
The misreporting of data may occur even when a government has no intention to mislead. Thus, any analysis of data manipulation must take into account those factors independent of the incentives of political leaders that may lead to mismeasurement. At least three such factors may explain the under-reporting of mortality during the COVID-19 pandemic.
Firstly, under-reporting may be due to the capacity of the health system to tackle a novel and rapidly spreading virus. Those countries that are insufficiently prepared for such epidemics may have struggled to keep track of the number of deaths due to COVID-19, especially if their testing capacity is limited and deaths due to the pathogen are only recorded when patients have been hospitalized (Whittaker et al. 2021).
Secondly, under-reporting may arise because of a shortfall in the government’s overall capacity to collect and process policy-relevant data. Governments that lack the infrastructure necessary to collect information from all geographic locations will struggle to gather complete and accurate mortality data when faced with a rapidly spreading contagion. Information-gathering capacity is also relevant to health system capacity because the collection of infection, testing, vaccination, and mortality data is crucial for determining the right policy response at each stage of the pandemic.
Thirdly, some countries may be more vulnerable to pandemic deaths due to their epidemiological and geographic features, such as the prevalence of co-morbidities (e.g. hypertension, cardiovascular diseases, diabetes, etc.), population density, altitude, and environmental seasonality (Bollyky et al. 2022). As a result, those countries may have been overwhelmed by the rapid increase in mortality caused by the pandemic, and therefore struggled to maintain accurate mortality figures. For example, during the first phase of the pandemic, when tests were not widely available, there was a rapid rise in deaths among the very elderly, especially in long-term care facilities. As a result, COVID-19 deaths were more likely to go unrecorded in countries with older populations (Wang et al. 2022, p. 1514).
All three of these explanations show how under-reporting can occur even when a government is not deliberately under-counting the number of deaths. Thus, it is important to control for them in order to isolate the extent to which, if at all, under-reporting is due to data manipulation.
Estimation methods
I use ordinary least squares (OLS) regressions to assess the relationship between regime type and the manipulation of COVID-19 mortality data. The estimation model takes the following form.
Where Democracy is the main independent variable of interest, Health system capacity is a set of variables capturing the ability of each country to respond to the pandemic, Information-gathering capacity is a variable capturing the ability of the government to collect and process policy-relevant data, Pandemic vulnerability is a set of variables capturing the extent to which each population is susceptible to COVID-19 mortality, Z is the set of additional control variables, and i is the set of countries.
I use three different indicators to estimate under-reporting in each country: the undercount ratio and two indicators that gauge the extent to which daily reported cases and deaths depart from expected statistical patterns (Benford-noncompliance and underdispersion). The undercount ratio is cumulative excess deaths divided by cumulative reported deaths (logged) as of December 31, 2021. The data source is the COVID-19 projections produced by the Institute for Health Metrics and Estimations (IHME) (Wang et al. 2022). IHME estimated excess deaths based on three steps. Firstly, in those locations with sufficient total mortality data, excess deaths were estimated based on a weekly or monthly comparison between mortality due to all causes and what would have been expected based on past trends and seasonality. Secondly, a statistical model was then constructed using covariates to predict the excess deaths in those locations. Thirdly, the predictive model was then used to estimate excess deaths in those locations without sufficient mortality data. In the robustness section I also test whether the baseline results hold when I use three alternative estimates of excess mortality produced by the World Health Organization (WHO 2022a), The Economist (2022), and Karlinksy and Kobak (2021).
One potential limitation of using excess mortality is that it may capture deaths that are not due to COVID-19. WHO guidelines stipulate that COVID-19 should be listed as the cause of death if a person dies with a probable or confirmed case of COVID-19, unless there is “… a clear alternative cause of death that cannot be related to COVID disease (e.g. trauma)..” (WHO 2020). Based on this definition, the proportion of non-covid deaths captured by excess mortality is likely to be small, and that assumption is supported by two country case-studies (Wang et al. 2022, p. 1533). A related concern is that excess mortality might capture deaths that are due to contemporaneous shocks such as a natural disaster or conflict (Wang et al. 2022, p. 1534). I take steps to address this issue in the robustness section. A further consideration is that excess mortality can be negative if a country’s containment response to the pandemic also reduced deaths from other causes such as injuries and seasonal flu. Indeed five countries registered negative cumulative excess deaths by the end of 2021. I also take steps to address that issue in the robustness section below.
One advantage of measuring manipulation in terms of statistical anomalies in the reported daily COVID-19 data is that it avoids the issue of non-covid deaths and negative total deaths. Benford-noncompliance is measured in terms of the Kolmogorov-Smirnov test statistic (logged), which allows us to estimate the extent to which daily reported cases and deaths (for the period 22 January 2020 to 31 December 2021) deviate from Benford’s Law. According to that law, first digits in non-manipulated data should accord with a distribution where the number 1 is the most likely to occur and the remaining digits are increasingly less likely to occur (Tam Cho and Gaines 2007). I provide a complete description of the steps used to measure Benford-noncompliance in the online appendix. Underdispersion is measured using the index constructed by Dimitry Kobak (2022) (logged). That index gauges the extent to which the reported cases and deaths (for the period 3 March 2020 and 30 January 2022) deviate from the expected variation in the reported cases and deaths across time. Reported COVID-19 data should fluctuate randomly across days of the week due to the nature of the data generating process. Thus, daily counts that vary smoothly across time suggest the presence of tampering.
The indicator for democracy is the Electoral Democracy Index from the Varieties of Democracy (V-Dem) project (Coppedge and et al 2022). That index is scaled to a continuous interval ranging from 0 (lowest level of democracy) to 1 (highest level of democracy). It combines five components of democratic rule which aim to ensure that political leaders are sufficiently responsive to citizens: suffrage, elected officials, free and fair elections, freedom of civil and political association, and freedom of expression and access to alternative information. Those components also implicitly capture the extent to which the government’s data reporting is subject to scrutiny by media, opposition political parties, and civil society groups. If those entities are sufficiently independent of the government’s control and influence, they can collect and publish their own data and publicly question the veracity of the government’s data (World Bank 2021, pp. 32–33, 69–70). Arguably, V-Dem’s index is methodologically superior to the other democracy indices that are currently available (Boese 2019; Coppedge et al. 2017). Nevertheless, in the robustness section I test whether the baseline results hold when three alternative indicators of democracy are used.
Health system capacity, information-gathering capacity, and pandemic vulnerability capture under-reporting that is not due to deliberate under-reporting. I use two variables to capture health system capacity. GDP per capita (logged) (base 2010 international dollars) and health service capacity and access (WHO 2021). The latter index measures the density of hospital beds and health care professionals, as well as the level of preparedness for public health events of international concern (WHO 2019). I constructed an indicator of information-gathering capacity based on the latent factor analysis of three input variables: Hanson and Sigman’s (2021) measure of census frequency, the World Bank’s (2022) Statistical Capacity indicator, and Brambor et al’s (2020) information capacity index. I describe those input variables and the method for identifying the underlying factor in more detail in the online appendix. I use three variables to capture greater pandemic vulnerability due to factors such as seasonality, population density, and the presence of co-morbidities. These are, prevalence of lower respiratory diseases, prevalence of non-communicable diseases, and median age (GBD Collaborative Network 2020a; UNDESA 2019).
A further advantage of using the Benford non-compliance and underdispersion measures is that they are less likely to be affected by unintentional mismeasurement than the undercount ratio. Nevertheless, it remains possible that observed anomalies in the data are not due to an attempt by a government to mislead. For example, a government that has reached the limits of its testing capacity may in good faith estimate the numbers in a way that conflicts with the expected patterns. Thus, I retain the full set of covariates for all versions of the dependent variable.
To address the possibility of region-specific factors, I also include World Bank regions as additional control variables in all the models. All the continuous independent variables are for the year 2019 (except for median age and information-gathering capacity which are for 2015) given the likelihood that the pandemic has impacted political institutions, economic growth, the delivery of routine health care, and the spread of other respiratory pathogens. I report robust standard errors for all model specifications. Variable descriptions, summary statistics, and correlation matrices are reported in the online appendix.
Estimations results
The results of this analysis are presented in Table 1. As we can see the democracy indicator is negatively associated with all three indicators of under-reporting – undercounting (columns 1-2), Benford non-compliance as measure by the Kolmogorov-Smirnov statistic (columns 3-5), and underdispersion (columns 6-7). Moreover, democracy remains statistically significant both with and without the covariates. For the complete model using undercounting (column 2) a 10% increase in the level of democracy (i.e. a 0.1 increase in the Electoral Democracy Index’s 0-1 scale) is associated with a 5.79% (95% CI 1.93, 9.49) reduction in the ratio. For the complete model using Benford-noncompliance (column 4) a 10% increase in the level of democracy is associated with a 2.27% (95% CI 0.94, 3.58) reduction in the Kolmogorov-Smirnov statistic. For the complete model using underdispersion (column 7) a 10% increase in the level of democracy is associated with a 3.48% (95% CI 1.6, 5.33) reduction in the index.
Figure 1 presents the Shapley decomposition of the R-squared for each of the dependent variables, based on the covariate groups described in the methods section above. In the case of undercounting, for example, the share of the explained variance captured by democracy is 11.55%, while for health system capacity, information-gathering capacity, and pandemic vulnerability it is 31.17%, 8.73%, and 27.82% respectively. The two measures of statistical irregularity – Benford-noncompliance and underdispersion - are less likely to reflect unintentional mismeasurement. Unsurprisingly, therefore, the three covariate groups designed to control for unintended misreporting capture less of the explained variation in those two dependent variables.
Robustness checks
I report five types of robustness check in Table 2. Firstly, I examine whether the results are affected when a dummy variable for island states is added to the set of covariates. It may be argued that island democracies such as Taiwan, Iceland, and New Zealand were blessed with a natural advantage when it came to slowing the spread of the virus. As a result, they were better placed to keep an accurate count of the number of COVID-19 deaths (column 2). Moreover, five of those island states (Australia, Iceland, New Zealand, Singapore, and Taiwan) registered negative excess deaths during the first two years of the pandemic, in part because their public health response reduced the spread of other respiratory pathogens. It is not clear whether those five countries are in some way biasing the results. Nevertheless, the addition of the dummy variable for island states provides one way to control for that possibility. I also test whether the baseline result holds when those five countries are dropped from the sample (column 3).
Secondly, I include dummy variables for contemporaneous disasters and conflicts (column 4). Other shocks that occurred during the pandemic may have increased excess deaths even though they are not due to COVID-19 and, therefore, do not reflect undercounting. In order to identify disasters that took place during the years 2020 and 2021 I used the Emergency Events Database compiled by the Center for Research on the Epidemiology of Disasters (Guha-Sapir 2022). In order to identify armed conflicts that took place during those two years I used the Battle-Related Deaths Dataset (version 22.1) constructed by the Uppsala Conflict Data Program (Pettersson et al. 2021).
Thirdly, I examine whether the results hold when three indicators for administrative capacity are included among the covariates (column 5). Those indicators are tax revenue as a percentage of GDP (Heritage Foundation n.d.) (logged), rigorous and impartial public administration (Coppedge and et al 2022), and mean war mortality during the 10 years prior to the pandemic (GBD Collaborative Network 2020b) (logged). It may be argued that the observed negative association between democracy and under-reporting is due to the fact that democracies are typically characterized by a greater capacity to implement policy. That is, administrative capacity, rather than democracy itself, may explain the association with under-reporting (Halleröd et al. 2013; Stasavage 2020). Having said that, some scholars contend that the capacity to gather and process information represents a suitable proxy for administrative capacity because it is a pre-condition for the collection of taxes, as well as the successful implementation of law and policy (Brambor et al. 2020; D’Arcy and Nistotskaya 2017; Lee and Zhang 2017). If that is correct, then the baseline model already controls for administrative capacity. Nevertheless, I include these three further controls in order to address the possibility that information-gathering capacity, by itself, does not fully capture overall administrative capacity.
Fourthly, I examine whether the results hold when three alternative indicators of democracy are used (columns 6-8). Specifically, the dichotomous Democracy-Dictatorship index (Bjørnskov and Rode 2020), the Lexical Index of Electoral Democracy (Skaaning et al. 2015), and the polychotomous Polity2 index (Marshall and Gurr 2020). These indicators, along with V-Dem’s Electoral Democracy Index, represent distinct ways to conceptualize and measure the level of democracy in each country. Thus, it is important to check whether the baseline results are sensitive to the selection of democracy indicator.
Fifthly, I examine whether the results hold when undercounting is calculated based on the estimates of excess deaths produced by World Health Organization (WHO 2022a), The Economist (2022), and Karlinksy and Kobak (2021) (columns 9-11). It is important to check whether the baseline results are not merely an artifact of the particular estimation method used by IHME. As before, I use cumulative excess deaths up to the end of December 2021. Like IHME, WHO and The Economist used covariate prediction models based on those countries with sufficient all-cause mortality data to generate estimates for those locations without sufficient data.5 However, it may be argued that the prediction models used by those three organizations are built based on countries whose characteristics do not adequately represent the countries for which they aim to provide estimates (Adam 2022). One advantage of Karlinksy and Kobak’s estimates is that they are restricted to the 101 countries with sufficiently complete vital statistics, and so they are not model dependent. This comes at cost, however, because the excluded countries may be self-selecting based on regime type.
For ease of comparison the first column in Table 2 replicates the baseline results for the undercount ratio (Table 1, column 2). As we can see each of the robustness checks is consistent with the baseline finding. Nevertheless, I cannot rule out the possibility that there are other factors that may explain the association between regime type and the under-reporting of COVID-19.
Discussion
These results indicate that democracy remains negatively associated with under-reporting after controlling for pre-existing characteristics that may affect each government’s ability to collect accurate COVID-19 data. This in turn implies that autocratic leaders are more likely to manipulate COVID-19 data than their democratic counterparts. However, even with the inclusion of a number control variables and a range of robustness checks, it remains possible that omitted factors are driving the results. Thus, these results should be seen as providing suggestive, rather than conclusive, evidence.
Nevertheless, they are consistent with existing research on the way in which the political survival of autocrats is dependent on their ability to manage the information available to citizens (Carlitz and McLellan 2021; Hollyer et al. 2015; King et al. 2013; Little 2017; Lorentzen 2014; Stockmann and Gallagher 2011). Indeed there is growing evidence that the balance between information control and repression in autocracies has shifted over the last two decades. The new breed of autocrat places more emphasis on the manipulation of information than the inculcation of fear in order to prolong their tenure in power (Guriev and Treisman 2019). According to that approach to regime survival, the key is to prevent disgruntled citizens from becoming aware that there are a sufficient number of them to overthrow the government. The threat of repression remains as a deterrent, but that may not suffice if a critical number of citizens manage to overcome the collective action problem.
Moreover, traditional repression is more likely to attract the attention of the international community, raising the prospect of sanctions and the withdrawal of financial aid. I have argued that the misreporting of policy-relevant statistics remains an important means for the autocrat to block access to the information necessary for collective action, even as they endeavor to persuade citizens that they are not responsible for bad outcomes. The autocrat’s own lack of information about the degree to which citizens believe their interpretation of the bad news, means they prefer to also hide the amount of bad news.
During the pandemic at least two authoritarian states adopted a different approach to information control. Rather than undercounting the number of deaths, the governments of Tanzania and Turkmenistan simply denied the presence of the virus in their countries (Human Rights Watch 2021; Mwai and Giles 2021). Denial precluded the very need to report data or to spin bad news. However, even though it is difficult for citizens to ascertain the exact death toll, it would have been increasingly obvious that a deadly contagion was spreading through their communities. Persuading citizens that the death count is lower than elsewhere would have become an easier proposition than persuading them that there are no deaths to count in the first place. Moreover, denial would have made it very difficult to implement public health policies designed to limit the number of cases and deaths. Unsurprisingly, therefore, nearly all other governments chose to regularly release mortality data during the pandemic. The Tanzanian government did eventually report some mortality data, but the number of data releases were few and far between, and likely severely understated the true death toll (by the end of 2021, for example, reported deaths in that country were 180 times lower than the excess mortality estimated by IHME). Generally speaking, the withholding of economic and development data is now less common than it used to be due to the expectations of international organizations, aid donors, and investors (Carlitz and McLellan 2021). Moreover, because the complete withholding of information is observable it may raise suspicions among citizens, thereby encouraging them to seek out more information about the topic the government wishes to hide (Roberts 2020, Chapter 4). However, while governments have an incentive to release policy-relevant statistics, the incentive to doctor them remains. Paying lip service to transparency while at that same time publishing falsified information represents a more nuanced and potentially more successful way for autocrats to prevent collective action.
One area that the current study does not fully explore is the extent to which deliberate under-reporting is due to the revision of data received by the national government, or conscious attempts by the government to prevent the collection of accurate data in the first place (e.g. deliberately curtailing COVID-19 testing, such that it is harder for healthcare workers to assign the cause of death in each case). Autocratic leaders are likely to prefer the first approach because accurate information is often needed in order to develop an adequate policy response and, thereby, to forestall criticism and protest. However, it remains possible that mortality numbers received by the national government are being deliberately underestimated by local officials, keen to hide their failure in handling the pandemic from their superiors. In other words, it may be the national government itself that is being misled. As noted above this appears less likely to occur in a democratic context because local officials themselves are exposed to scrutiny from the opposition, media, and civil society. Irrespective, the regression results for statistical irregularities - Benford-noncompliance and underdispersion - suggest that autocratic governments, at a minimum, manipulate data that has already been collected at the national level.
It might be argued that citizens care more about economic outcomes than population health outcomes, or that they are more likely to hold governments’ responsible for bad economic outcomes than bad health outcomes. Indeed there is some evidence that citizens in democracies are already pre-disposed to treat pandemics as natural phenomena that are beyond the control of policy-makers (Acharya et al. 2020; Achen and Bartels 2017, pp. 140– 142). If that is correct then there is less incentive for political leaders to manipulate population health data. The results of this study suggest that, even if that is the case, there remains an incentive for autocratic leaders to misinform citizens about their health risks and status.
Finally, it should be noted that the results presented here relate to data manipulation after an epidemiological shock. These findings likely apply to the propensity for manipulation following other kinds of shock such as war, famine, natural disaster, or severe economic recession. Arguably, the threat to the government’s survival is less pronounced in the absence of severe shocks, and so the incentive to falsify data is reduced in such cases. Still, existing research indicates that autocrats fabricate economic data even in the absence of a recession (Carlitz and McLellan 2021; Magee and Doces 2015; Martínez 2022). Nevertheless, further research is needed to determine whether data relating to adverse health outcomes that are not due to a shock are more likely to be manipulated by autocrats.
Conclusion
It is fundamentally difficult to assess the relationship between regime type and data manipulation because researchers typically do not have access to statistics from sources that are not controlled by the government. The COVID-19 pandemic represents a unique opportunity to examine that relationship because of its widespread impact and the ability to compare reported deaths with excess deaths and examine reported cases and deaths for statistical irregularities. Using estimates of excess mortality for a large number of countries, I find evidence that autocratic governments are more likely to deliberately undercount deaths due to the pathogen than their democratic counterparts. Similarly, I find that the case and death counts reported by autocratic regimes are more likely to feature statistical anomalies. These results hold when controls are included for unintentional mismeasurement and after running a battery of robustness checks. Overall, these results suggest that autocratic leaders manipulate data that may trigger criticism and protest, even though they can use their control over traditional and online media to disown responsibility for bad news.
This conclusion is consistent with two previous cross-national studies on the association between regime type and the manipulation of national income statistics (Magee and Doces 2015; Martínez 2022). Taken together these three studies imply that politically-sensitive data are systematically biased in favor of autocratic regimes. If this general finding is confirmed by subsequent research, it presents a significant problem for citizens, researchers, and international organizations. Firstly, the absence of accurate information may prevent citizens from being able to make the decisions necessary to protect their own well-being (e.g. governments that deliberately understate the threat posed by a disease limit the ability of individuals to take steps to avoid preventable morbidity and premature mortality). Secondly, aggregate indices of economic and human development may overstate the performance of autocratic regimes if the input data are directly sourced from each government. Thirdly, cross-national studies that examine the association between regime type and policy outcomes such as economic growth, educational attainment, and infant mortality may be biased in favor of autocratic regimes (Knutsen 2021, p. 1509). Fourthly, the manipulation of data makes it harder for international organizations and aid donors to determine whether recommended targets have been achieved and whether they are supporting the right policies.
One solution to these problems is to rely on proxy indicators, or the covariate model approach outlined in this study, to estimate the outcome of interest in those cases where the reported data are suspect. Reported data may be deemed to be questionable in those cases where there are unexplained statistical anomalies, or the national statistical agency is subject to the direct influence of the government.6 The latter suggests a more forward-looking approach. Namely, the advocacy of reforms designed to ensure the institutional independence of statistical agencies (Taylor 2016, pp. 15–20). There is a growing emphasis on building the capacity of such agencies (e.g. Dang et al. 2021), but so far less weight has been placed on reforms designed to minimize their exposure to political interference. This stands in stark contrast with well-established efforts to promote the independence of central banks (Herrera and Kapur 2017, p. 375; Jolliffe et al. 2021).
Data Availability
All data produced in the present study and replication codes are available upon reasonable request to the author
Footnotes
↵1 In that context it is typically very difficult to find a suitable proxy indicator to estimate the actual level of morbidity and mortality. Two studies have attempted to estimate actual pandemic deaths by using satellite imagery of grave sites in cemeteries (Koum Besson et al. 2021; Warsame et al. 2021). However, that approach cannot be used in those countries where cremation is commonly practiced and, unlike night time lights, political leaders can take steps to hide the evidence. Thus, this approach is better reserved for assisting governments that lack the capacity to collect accurate mortality data for themselves, rather than as a means to detect deliberate undercounting.
↵2 Crucially an autocratic government need not solely rely on state media to propagate its interpretation of events. Instead it may disguise its involvement by recruiting individuals to repeat its interpretation on social media, or via news outlets that it does not directly control. This approach enables the government to crowd out interpretations that imply policy failure, thereby reducing the need to directly censor content. The upshot of this is that narrative control can take place even when citizens have access to multiple sources of information (Roberts 2020, Chapter 6).
↵3 Manipulation that is clearly inconsistent with the experienced reality of citizens is both observable and ineffective (Cavallo et al. 2016). In addition, it may reduce citizens’ trust in official statistics in general, including those that have not been manipulated (e.g. citizens may interpret accurately reported epidemic deaths as underestimates if the government has a track record of understating inflation statistics). Nevertheless, autocrats may still engage in observable data manipulation because it enables them to signal strength and, thereby, deter dissent (Huang 2015).
↵4 Some scholars contend that the absence of information checks is the Achilles heel of autocratic rule (Egorov et al. 2009; Sen 1999, pp. 180–182; Wigley and Akkoyunlu-Wigley 2017). Autocratic leaders typically lack the means to independently verify the information provided by local officials and, therefore, cannot be sure whether they have selected the right policies, or whether those officials are correctly implementing policy. The consequences of this blind spot can be catastrophic. During the Great Leap Forward in China lower-level bureaucrats over-reported grain production leading the central government to extract too much for urban centers, leaving farmers with insufficient food (Wallace 2016). This contributed to the emergence of a famine that killed an estimated 30 million people in China between 1959 and 1961.
↵5 Nevertheless, there is some important differences between the three estimation projects in terms of the precise model design, covariates selection, and data sources for all-cause mortality. All three projects provide detailed descriptions of their methodologies (The Economist 2021; Wang et al. 2022; WHO 2022b). In addition, IHME and The Economist have made their replication code publicly available (The Economist 2022; Wang and & et al 2022).
↵6 Two existing datasets provide indicators of the extent to which national statistical agencies are free from political influence. Angrist et al (2021) have extracted three binary indicators of independence from IMF audit data for 79 countries. The Ibrahim Index of African Governance (Mo Ibrahim Foundation n.d.) includes an indicator that captures the independence of statistical agencies in that region. Ideally, a fine-grained indicator with global coverage would be developed, along the lines of the one produced for the Ibrahim Index and the one for electoral monitoring board autonomy produced by the V-Dem project (Coppedge and et al 2022).