Abstract
Background Many studies report the seroprevalence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) antibodies. We aimed to synthesize seroprevalence data to better estimate the level and distribution of SARS-CoV-2 infection, identify high-risk groups, and inform public health decision making.
Methods In this systematic review and meta-analysis, we searched publication databases, preprint servers, and grey literature sources for seroepidemiological study reports, from January 1, 2020 to December 31, 2020. We included studies that reported a sample size, study date, location, and seroprevalence estimate. We corrected estimates for imperfect test accuracy with Bayesian measurement error models, conducted meta-analysis to identify demographic differences in the prevalence of SARS-CoV-2 antibodies, and meta-regression to identify study-level factors associated with seroprevalence. We compared region-specific seroprevalence data to confirmed cumulative incidence. PROSPERO: CRD42020183634.
Results We identified 968 seroprevalence studies including 9.3 million participants in 74 countries. There were 472 studies (49%) at low or moderate risk of bias. Seroprevalence was low in the general population (median 4.5%, IQR 2.4-8.4%); however, it varied widely in specific populations from low (0.6% perinatal) to high (59% persons in assisted living and long-term care facilities). Median seroprevalence also varied by Global Burden of Disease region, from 0.6 % in Southeast Asia, East Asia and Oceania to 19.5% in Sub-Saharan Africa (p<0.001). National studies had lower seroprevalence estimates than regional and local studies (p<0.001). Compared to Caucasian persons, Black persons (prevalence ratio [RR] 3.37, 95% CI 2.64-4.29), Asian persons (RR 2.47, 95% CI 1.96-3.11), Indigenous persons (RR 5.47, 95% CI 1.01-32.6), and multi-racial persons (RR 1.89, 95% CI 1.60-2.24) were more likely to be seropositive. Seroprevalence was higher among people ages 18-64 compared to 65 and over (RR 1.27, 95% CI 1.11-1.45). Health care workers in contact with infected persons had a 2.10 times (95% CI 1.28-3.44) higher risk compared to health care workers without known contact. There was no difference in seroprevalence between sex groups. Seroprevalence estimates from national studies were a median 18.1 times (IQR 5.9-38.7) higher than the corresponding SARS-CoV-2 cumulative incidence, but there was large variation between Global Burden of Disease regions from 6.7 in South Asia to 602.5 in Sub-Saharan Africa. Notable methodological limitations of serosurveys included absent reporting of test information, no statistical correction for demographics or test sensitivity and specificity, use of non-probability sampling and use of non-representative sample frames.
Discussion Most of the population remains susceptible to SARS-CoV-2 infection. Public health measures must be improved to protect disproportionately affected groups, including racial and ethnic minorities, until vaccine-derived herd immunity is achieved. Improvements in serosurvey design and reporting are needed for ongoing monitoring of infection prevalence and the pandemic response.
Funding Public Health Agency of Canada through the COVID-19 Immunity Task Force.
Introduction
Over one year has passed since the World Health Organization announced on January 30, 2020 that COVID-19 was a public health emergency of international concern, yet many questions persist about the spread and impact of the virus driving this crisis.1 As of May 15, 2021, there were over 160 million confirmed cases of SARS-CoV-2 infection and 3.3 million deaths worldwide.2 However, these case counts inevitably underestimate the true cumulative incidence of infection3 because of limited diagnostic test availability4, barriers to testing accessibility5, and asymptomatic infections.6 As a consequence, the global prevalence of SARS-CoV-2 infection remains unknown.
Serological assays identify SARS-CoV-2 antibodies, indicating previous infection in unvaccinated persons.7 Population-based serological testing provides better estimates of the cumulative incidence of infection by complementing diagnostic testing of acute infection and helping to inform the public health response to COVID-19. Furthermore, as the world moves through the vaccine and variant era, synthesizing seroepidemiology findings is increasingly important to track the spread of infection, identify disproportionately affected groups, and measure progress towards herd immunity.
SARS-CoV-2 seroprevalence estimates are reported not only in published articles and preprints, but also in government and health institute reports, and media.8 Consequently, few studies have comprehensively synthesized seroprevalence findings that include all of these sources.9, 10 Describing and evaluating the characteristics of seroprevalence studies conducted over the first year of the pandemic may provide valuable guidance for serosurvey investigators moving forward.
We conducted a systematic review and meta-analysis of SARS-CoV-2 seroprevalence studies published in 2020. We aimed to: (i) describe the global prevalence of SARS-CoV-2 antibodies based on serosurveys; (ii) detect variations in seroprevalence arising from study design and geographic factors; (iii) identify populations at high risk for SARS-CoV-2 infection; and (iv) evaluate the extent to which surveillance based on detection of acute infection underestimates the spread of the pandemic.
Methods
Data sources and searches
This systematic review and meta-analysis was registered with PROSPERO (CRD42020183634), reported per PRISMA11 guidelines (S1 File), and will be regularly updated on an open-access platform (SeroTracker.com).12
We searched Medline, EMBASE, Web of Science, and Europe PMC, using a search strategy developed in consultation with a health sciences librarian (DL). The strategies for MEDLINE and EMBASE were an expanded version of the published COVID-19 search strategies created by OVID librarians for these databases.13 Search terms related to serologic testing were identified by infectious disease specialists (MC, CY, and JP)7 and expanded using Medical Subject Heading (MeSH) or Emtree thesauri. These searches were adapted for the other databases. The full search strategy can be found in S2 File.
Given that many serosurveys are not reported in these databases8 we used four additional search approaches to identify serosurveys reported in the grey literature. First, we searched for reports from national and international health agencies using their website search functions and examining their recurring COVID-19 reports (World Health Organization, European Centres for Disease Control, Centres for Disease Control, National Institutes of Health). Second, we searched Google News for reports of seroprevalence studies. When we encountered reports of potentially eligible government, non-governmental organizations (NGO), or academic studies, we conducted a targeted Google search to locate and include the full study. Updates of routinely reported NGO and government studies (e.g., Public Health England’s weekly COVID-19 serosurveillance reports) were screened after the date they first appeared in the Google News search. Third, we consulted with international experts via e-mail to identify additional literature after all other sources had been searched. Fourth, we invited submission of seroprevalence study results on our live dashboard - SeroTracker.com.
Our search dates were from January 1, 2020 to December 31, 2020. MedRxiv pre-print articles that were updated or published as peer-review articles between January 1, 2021 and February 28, 2021, according to the MedRxiv website, were also included. No restrictions on language were applied.
Study selection
We included SARS-CoV-2 serosurveys in humans. We defined a single serosurvey as the serological testing of a defined population over a specified time period to estimate the prevalence of SARS-CoV-2 antibodies.14, 15 To be included, studies had to report a sample size, sampling date, geographic location of sampling, and prevalence estimate. Articles not in English or French were included if they could be fully extracted using machine translation.16 Articles that provided information on two or more distinct cohorts (different sample frames or different samples at different time points) without a pooled estimate were considered to be multiple studies.
If multiple articles provided unique information about a study, both were included. Articles reporting identical information to previously included articles were excluded as duplicates – this rule extended to pre-print articles that were subsequently published are peer-reviewed journals. In these cases, the peer-reviewed articles were considered the definitive version.
We excluded studies conducted only in people previously diagnosed with COVID-19 using PCR, antigen testing, clinical assessment, or self-assessment; dashboards that were not associated with a defined serology study; and case reports, case-control studies, randomized controlled trials, and reviews.
Data extraction and quality assessment
Two authors independently screened articles. Data were extracted by one reviewer and verified by a second. We extracted characteristics of the study, sample, antibody test, and seroprevalence. We extracted sub-group seroprevalence estimates when they were stratified by one variable (e.g., age) but not two variables (e.g., age and sex). Antibody isotype and time period were not considered as stratifying variables. We contacted study authors to request missing sub-group seroprevalence data.
A modified Joanna Briggs Institute (JBI) Critical Appraisal Checklist for Prevalence Studies was used to assess study risk of bias.17 Studies were classified by overall risk of bias: low, moderate, high, or unclear (detailed criteria in S3 File).
Data synthesis and analysis
Evaluation of seroprevalence studies and estimates
The intended geographic scope of each estimate was classified as (A) national; (B) regional (e.g., province-level); (C) local (e.g., county-level, city-level); or (D) sublocal (e.g., one hospital department). Countries were classified according to Global Burden of Disease (GBD) region, and country income status classified by distinguishing the high-income GBD region from other regions.18, 19
Seroprevalence studies were grouped as providing either population-wide or population-specific estimates. Population-wide studies included those using household or community sampling frames as well as convenience samples from blood donors or residual sera used for monitoring other conditions in the population. Population-specific studies were those sampling from well-defined population sub-groups, such as health care workers or long-term care residents.
We prioritized estimates based on more accurate laboratory-based assays (e.g. ELISA, CLIA), as opposed to rapid diagnostic tests. We also prioritized estimates based on IgG and anti-spike antibodies, as non-IgG and anti-nucleocapsid antibodies appear to decline more rapidly than anti-spike/RBD IgG antibodies.20–25
Data processing and descriptive statistics were conducted in Python. p-values less than 0.05 were considered statistically significant.
Correcting seroprevalence estimates
To account for imperfect test sensitivity and specificity, seroprevalence estimates were corrected using Bayesian measurement error models, with binomial sensitivity and specificity distributions.26 The sensitivity and specificity values for correction were derived, in order of preference, from: (i) the FindDx -McGill database of independent evaluations of serological tests27; (ii) independent test evaluations conducted by serosurvey investigators and reported alongside serosurvey findings; (iii) manufacturer-reported sensitivity and specificity (including author evaluated in-house assays); (iv) published pooled sensitivity and specificity by immunoassay type.25 If uncorrected estimates were not available, we used author-reported corrected seroprevalence estimates. Details of these evaluations are located in S4 File.
We presented corrected and uncorrected estimates for all studies. Subsequent analyses were done using corrected seroprevalence estimates. To assess the impact of correction, we calculated the absolute difference between seroprevalence estimates before and after correction. We also conducted each analysis with uncorrected data.
Global seroprevalence and associated factors
To examine study-level factors affecting population-wide seroprevalence estimates, we constructed a multivariable linear meta-regression model. The outcome variable was the natural logarithm of corrected seroprevalence. Independent predictors were defined a priori. Categorical covariates were encoded as indicator variables, and included: study risk of bias (reference: low risk of bias), GBD region (reference: high-income); geographic scope (reference: national); and population sampled (reference: household and community samples). The sole continuous covariate was the cumulative number of confirmed cases in the country of the study. We obtained data on total confirmed SARS-CoV-2 infections28, 29 and population size30 that geographically matched the study populations nine days before the study end date, to reflect the time period between COVID-19 diagnosis and seroconversion (S5 File).31–33 A quantile-quantile plot and a funnel plot were generated to visually check normality and homoscedasticity. All meta-analysis and meta-regression were done using the meta package in R.34
Population differences in seroprevalence
To quantify population differences in SARS-CoV-2 seroprevalence, we identified subgroup estimates within population-wide studies that stratified by sex/gender, race/ethnicity, contact with individuals with COVID-19, occupation, and age groups. We calculated the ratio in prevalence between groups within each study (e.g., prevalence in males vs. females) then aggregated the ratios across studies using inverse variance-weighted random-effects meta-analysis (S4 File). Heterogeneity was quantified using the I² statistic.35
Comparisons of seroprevalence and confirmed SARS-CoV-2 infections
To measure how much confirmed SARS-CoV-2 infections detected using RT-PCR underestimate seroprevalence, we calculated the ratio between population-wide seroprevalence estimates and the cumulative incidence of confirmed SARS-CoV-2 infections.
Role of the funding source
This research was funded by the Public Health Agency of Canada through Canada’s COVID-19 Immunity Task Force. The funding agency had no role in study design, data collection, data analysis, data interpretation, or writing of the report. The corresponding author had full access to all the data in the study and had final responsibility for the decision to submit for publication.
Results
Characteristics of included studies
We screened 24,999 titles and abstracts and 1,830 full text articles (Fig 1). We identified 968 unique seroprevalence studies in 605 articles. These studies included 9,329,185 participants.
There were 590 (61%) population-wide studies and 378 (39%) population-specific studies (Table 1). Characteristics of individual studies are reported in S1 Table and S2 Table. Study sampling dates ranged from September 1, 2019 to December 31, 2020.
Seventy-four countries across all GBD regions were represented among identified serosurveys (Fig 2; S1 Figure). A minority of studies were conducted in low- and middle-income countries (n = 221, 23%).
Countries with national-level seroprevalence studies reporting population-wide estimates are coloured on the map, based on the seroprevalence reported in the most recent such study in each country. Countries with no such national serosurveys but with “other serosurveys” are coloured in grey; this includes local and regional studies, as well as studies in specific populations. Map data reprinted from Natural Earth under a CC BY license, with permission from Natural Earth, original copyright 2009.
Many studies were at moderate (n=443, 46%) or high risk of bias (n=424, 44%), owing primarily to the absence of statistical correction either for population demographics or test sensitivity and specificity, using non-probability sampling methods, and using non-representative sample frames (Fig 3, S3 Table).
Item 1: Was the sample frame appropriate to address the target population? Item 2: Were study participants recruited in an appropriate way? Item 3: Was the sample size adequate? Item 4: Were the study subjects and setting described in detail? Item 5: Was data analysis conducted with sufficient coverage of the identified sample? Item 6: Were valid methods used for the identification of the condition? Item 7: Was the condition measured in a standard, reliable way for all participants? Item 8: Was there appropriate statistical analysis? Item 9: Was the response rate adequate, and if not, was the low response rate managed appropriately? Item 10: Overall risk of bias.
Correction of estimates for test sensitivity and specificity
In order to improve comparability between data and correct for misclassification error, we corrected seroprevalence values for imperfect sensitivity and specificity. To do so, we sourced additional evaluation data as described in the methods. Overall, there were 795 studies (82%) for which test sensitivity and specificity values were reported or located (S5 Table). Authors reported sensitivity and specificity data in 229 studies, with reported sensitivity values ranging from 35-100% and specificity between 87-100%.
Independent evaluation data from the FindDx initiative were available for 359 studies (37%), manufacturer evaluations were available for 182 studies (19%), and published pooled sensitivity and specificity results for ELISAs, LFIAs, and CLIAs, based on the test type known to have been used, and using the definitions for these test types provided by Bastos et al.25, were available for 101 studies (10%). Between FindDx, manufacturer evaluations, and published pooled results, test sensitivity ranged from 9-100% and specificity from 0-100%.
Estimates from 587 studies (61%) were corrected for imperfect sensitivity and specificity. We corrected seroprevalence estimates from 290 studies (30%), while author-corrected estimates were used in 297 (31%) studies as uncorrected estimates were not available for our analysis. The median absolute difference between corrected and uncorrected seroprevalence estimates was 1.1% (IQR 0.6-2.3%).
Of the 381 studies for which estimates were not corrected, data were insufficient to inform the correction analysis in 118 studies (12%). Corrected seroprevalence estimates could not be determined for 261 studies (27%), most of which were population-specific studies using small sample sizes and low test sensitivity and specificity. In these studies, the model used to correct for test sensitivity and specificity often failed to converge to a reasonable adjusted prevalence value.
Population-wide seroprevalence estimates
In studies reporting population-wide seroprevalence estimates, median corrected seroprevalence was 4.5% (IQR 2.4-8.4%, Table 2). These studies included household and community samples (n=125), residual sera (n=248), and blood donors (n=54), with median corrected seroprevalence of 6.0% (IQR 2.8-15.1%), 4.0% (IQR 2.4-6.8%), and 4.7% (IQR 1.4-6.8%), respectively (Table 3).
Among high-income countries, the median corrected seroprevalence in studies reporting population-wide estimates 4.1% (IQR 2.4-6.9%). In the low- and middle-income GBD regions, median corrected seroprevalence ranged from 0.6% (IQR 0.3-1.4%) in Southeast Asia, East Asia, and Oceania to 19.5% (IQR 9.0-26.0%) in South Asia (Table 2).
Population-specific seroprevalence estimates
The median corrected seroprevalence in studies reporting population-specific seroprevalence estimates was 3.6%, (IQR 0.9-12.3%, Table 4) however, there was wide variation (0.6-59%) between different populations (Table 3). Notably, the median corrected seroprevalence was 3.6% (IQR 0.8-11.0%, n=66 studies) in healthcare workers and caregivers and 2.7% (IQR 1.1-7.4%, n=24 studies) in specific patient groups (e.g., cancer patients). Essential non-healthcare workers (e.g., first responders) had a median seroprevalence of 7.5% (IQR 2.4-29.9%, n=11 studies, Table 3). Higher seroprevalence estimates were reported in studies of contacts of COVID-19 patients (median 31.5%, IQR 2.7-39.9%, n=11 studies), persons living in slums (median 41.7%, IQR 40.0-43.4%, n=2 studies), and persons in assisted living and long-term care facilities (median 59.2%, IQR 39.7-78.8%, n=2 studies).
Seroprevalence by population sub-groups (meta-analysis)
Within studies, seroprevalence was significantly lower for seniors 65+ compared to adults 18-64 (prevalence ratio [PR]: 0.79 [95% CI: 0.69-0.90]). Seroprevalence was significantly higher for Black persons, Asian persons, Indigenous persons, and other groups compared to Caucasian persons (PRs from 1.89-5.74), and in health care workers with close contact with COVID-19 patients compared to those with no close contact (PR 2.10 [1.28-3.44]). Seroprevalence differences approached significance for individuals in the community with close contact with COVID-19 patients (PR 1.85 [0.99-3.44]) and for health care workers compared to members of the community (PR 1.45 [0.99-2.14]). There were no differences in infection risk based on sex and gender. Full results are reported in Table 5, and results for uncorrected prevalence estimates are reported in S4 Table.
Seroprevalence by study and geographic factors (meta-regression)
On multivariable meta-regression, studies at low risk of bias reported higher corrected seroprevalence estimates relative to studies with moderate risk of bias (prevalence ratio 1.67, 95% CI 1.22-2.27, p=0.001), high risk of bias (1.54, 95% CI 1.11-2.13, p=0.01), and unclear risk of bias (2.63, 95% CI 1.54-4.55, p<0.001)(S6 Table). Blood donors and residual sera groups, both used as proxies for the general population, reported similar corrected seroprevalence estimates compared to household and community samples (blood donors: 0.96, 95% CI 0.76-1.22, p=0.77; residual sera: 1.12, 95% CI 0.94-1.35).
National studies reported lower seroprevalence estimates compared to regional studies (0.61, 95% CI 0.48-0.77, p<0.001), local studies (0.47, 95% CI 0.37-0.60, p<0.001) and sublocal studies (0.52, 95% CI 0.33-0.81, p=0.004). Finally, compared to high-income countries, higher seroprevalence estimates were reported by countries in Sub-Saharan Africa (5.01, 95% CI 2.89-8.69, p<0.001), South Asia (2.84, 95% CI 2.09-3.85, p<0.001), Central Europe, Eastern Europe, and Central Asia (2.83, 95% CI 1.75-4.55, p<0.001), and Latin America and Caribbean (2.71, 95% CI 2.07-3.54, p<0.001), while countries in Southeast Asia, East Asia, and Oceania (0.18, 95% CI 0.09-0.34) reported lower seroprevalence estimates. Visual checks confirmed that model assumptions of normality and homoscedasticity were met.
Ratio of seroprevalence to cumulative case incidence
The median ratio between corrected seroprevalence estimates from national studies and the corresponding cumulative incidence of SARS-CoV-2 infection nine days prior was 18.1 (IQR 5.9-38.7, n=49 studies; Table 6, S2 Figure), indicating a median of 18.1 serologically identified infections per 1 confirmed case globally. Stratifying by risk of bias and GBD showed variation in median ratios between seroprevalence and cumulative incidence (Table 6).
Discussion
This systematic review and meta-analysis provides an overview of global SARS-CoV-2 seroprevalence based on data from 9,329,185 participants in 968 serosurveys from 605 reports. Overall, in the first year of the COVID-19 pandemic, estimates of population-wide seroprevalence were low (median 4.5%, IQR 2.4-8.4%), however, population-specific estimates of seroprevalence varied widely from a low of 0.6% (perinatal) to a high of 59% (persons in assisted living and long-term care facilities).
Seroprevalence varied considerably between GBD regions after correcting for study characteristics and test sensitivity and specificity. Given the limited evidence for altitude or climate effects on SARS-CoV-2 transmission36, 37 variations in seroprevalence likely reflect differences in community transmission based on behaviour, public health responses, local resources, and the built environment. Stakeholders should carefully review the infection control measures implemented in Southeast Asia, East Asia, and Oceania as they appear to have been effective at limiting SARS-CoV-2 transmission.38, 39
Our results suggest clear population differences in SARS-CoV-2 infection, with marginalized and high-risk groups disproportionately affected. Differences in infection risk based on race might be attributed to crowding, higher-risk occupation roles (e.g., front-line service jobs) and other systemic inequities.40–43 Some of these groups (Black, Asian, and other minority racial and ethnic groups) are also known to have higher infection fatality rates.44 Such differences may inform policy on vaccine distribution, workforce protections, and other public health measures designed to protect marginalized persons.
Our review found that health care workers who had close contact with confirmed COVID-19 cases had a higher risk of seropositivity, consistent with previous reports.45 Results in this study regarding contact with a COVID-19 case among non-health care workers warrant further investigation. Our meta-analysis of seroprevalence in persons with and without contact in studies reporting both subgroups found no significant difference, despite the fact that studies of persons with exposure to COVID-19 reported much higher seroprevalence estimates compared to population-wide studies (31.5% vs. 4.5%). These results align with other evidence synthesis examining persons with and without COVID-19 exposure however, they conflict with studies of high-risk exposure, including health care workers.9, 46 It is possible that contact exposure in a clinical setting may be more narrowly defined and carefully measured, whereas definitions of exposure in non-clinical studies may be more heterogenous or prone to potential misclassification due to asymptomatic infection. Future analysis should explore the association of different definitions and measurement of contact status with seroprevalence estimates.
Few studies (23%) have been conducted in low- and middle-income countries. Results from the ongoing WHO Unity studies will help to bridge this knowledge gap and contribute to a more comprehensive understanding of the spread and impact of COVID-19 globally.15 Use of the standardized Unity protocols will also help to increase the pool of robust, comparable seroprevalence data.
Approximately half of studies reporting population-wide SARS-CoV-2 seroprevalence estimates used blood from donors and residual sera as a proxy for the community. Our results showed that these studies report seroprevalence estimates that are similar to studies of household and community-based samples. It has previously been shown that these groups contain disproportionate numbers of people that are young, White, college graduates, employed, physically active, and never-smokers.47, 48 However, the results of our study suggest that investigators may use these proxy sampling frames to obtain fairly representative estimates of seroprevalence if studies use large sample sizes with adequate coverage of important subgroups (e.g., age, sex, race/ethnicity) to permit standardization to population characteristics, tests with high sensitivity and specificity, and statistical corrections for imperfect sensitivity and specificity.
Our results suggest that studies at moderate, high, or unclear risk of bias may generate lower seroprevalence estimates relative to studies at low risk of bias. There are many possible explanations for this somewhat counterintuitive finding. Common reasons for unclear or elevated risk of bias were absent reporting of test information, use of tests with low sensitivity and specificity, no statistical correction for demographics or test sensitivity and specificity, use of non-probability sampling, and use of non-representative sample frames. Therefore, selection bias that favoured healthier, affluent, non-racialised groups at lower risk of infection paired with no adjustment for sample characteristics may have contributed to lower estimates of seroprevalence. It is also possible that the false negative rate was higher for studies in which authors used low sensitivity tests, particularly when authors did not statistically correct estimates for imperfect test performance or used inflated estimates of test sensitivity, as are often reported by manufactures, to conduct such corrections.
Systematic reviews of SARS-CoV-2 serological test accuracy have found that many tests have poor sensitivity and specificity.24, 25 Of the studies included in this review, only 298 (31%) corrected for test sensitivity and specificity, and 118 (12%) failed to report identifying information on the test used altogether. Our study corrected seroprevalence estimates for test sensitivity and specificity in an additional 290 (30%) studies. The median absolute difference between corrected and uncorrected estimates was 1.1% — a substantial change, given that the median corrected seroprevalence in studies reporting population-wide estimates was 4.5%. This difference emphasizes the importance of conducting such corrections to minimize bias in serosurvey data. Furthermore, improved reporting of serological testing information in serosurveys is needed to maximize the amount of robust and comparable data for evidence synthesis.
Seroprevalence estimates were 18.1 times higher than the corresponding cumulative incidence of COVID-19 infections, with large variations between the Global Burden of Disease Regions (seroprevalence estimates ranging from 6 to 602 times higher than cumulative incidence). This level of under-ascertainment suggests that confirmed SARS-CoV-2 infections are a poor indicator of the extent of infection spread, even in high-income countries where testing has been more widely available. The broad range of ratios mirrors estimates from other published evidence on case under-ascertainment, which suggests a range of 0.56 to 717.49, 50
Seroprevalence to cumulative case ratios can provide a rough roadmap for public health authorities by identifying areas that may be receiving potentially insufficient levels of testing and by providing an indication of the number of undetected asymptomatic infections.
While there is interest in using these seroprevalence to cumulative case ratios in identifying inadequate testing and estimating case ascertainment, caution is required in the quantitative interpretation of these ratios. Our study found a median ratio of 18.1, which aligns with other published analysis.50 This would imply that 2.9 billion people globally have been infected with SARS-CoV-2 rather than the 160 million reported as of May 15, 2021.2 This is not likely, and this estimate conflicts with the evidence that seroprevalence remains low in the general population. If applying this global ratio to countries with high cumulative incidence, such as the United States (32 million by May 15, 2021), then the total number of infections would exceed the population.
There are several possible reasons for these discrepancies. Firstly, these ratios clearly vary by geographic region and regional health policy, with higher diagnostic testing rates likely to correspond to lower seroprevalence to case ratios. Country-specific ratios, or region-specific ratios if available, should be used to inform planning wherever possible. Second, diagnostic testing-based estimates of cumulative incidence vary by assay; for example, lower RT-PCR cycle thresholds or the use of less sensitive rapid antigen tests would lead to lower estimates of cumulative cases. Finally, our analysis compares seroprevalence to cumulative case ratios at different point in time. As diagnostic testing measures expanded, these ratios may have declined over time, complicating the process of applying a single fixed ratio to a cumulative incidence number. As such, there is a need for more nuanced analysis of case under-ascertainment and caution should be exercised if utilizing them in public health planning.
This study has limitations. Firstly, some asymptomatic individuals may not seroconvert, some individuals may have been tested prior to seroconversion, and others may have antibodies that have waned by the time of blood collection, so the data in this study may underestimate the number of SARS-CoV-2 infections.51 To ameliorate this, we prioritized estimates that tested for anti-spike IgG antibodies, which show better persistence in serum compared to non-IgG and anti-nucleocapsid IgG antibodies.20–25 Secondly, to account for measurement error in seroprevalence estimates resulting from poorly performing tests, it was necessary to use sensitivity and specificity information from multiple sources of varying quality. While we prioritized independent evaluations, these were not available for all tests. Furthermore, lab-to-lab variation may undermine the generalizability and comparability of the test evaluation data we utilized. Going forward, investigators should conduct evaluations of their assays using a standard international reference panel, such as the panel created by the WHO52, and report their results in international units referenced against the World Antibody Titres Standard to increase comparability of serosurvey results. Where this is not feasible, investigators should at least report the test name, manufacturer, and sensitivity and specificity values to improve data comparability.53 Thirdly, some of the summary results may have been driven by the large volume of data from high-income countries, which primarily reported lower seroprevalence estimates.
While we frequently stratified by or adjusted for GBD region, caution is required when interpreting some of the summary estimates. Fourthly, the residual heterogeneity in our meta-regression indicates that not all relevant explanatory variables have been accounted for. Many factors may contribute to the spread of infection. Even if all important factors were known, it would be difficult to account for the variation in seroprevalence due to limited availability of data with sufficient granularity and changing health policy and individual behavior.
This systematic review is the largest synthesis of SARS-CoV-2 serosurveillance data to date. Our search was rigorous and comprehensive: we included non-English articles, government reports, unpublished data, and serosurveillance reports obtained via expert recommendations and the SeroTracker website. This comprehensive search is important because many serosurveys — especially in LMICs — have not been published or released as preprints. A strength of this review was the use of corrected prevalence estimates for analysis, revealing that imperfect sensitivity and specificity have major effects on seroprevalence findings. To our knowledge, this is the largest systematic comparison of seroprevalence estimates from blood donors, residual sera, and household and community-based general population samples. Finally, this study is part of a regularly-updated systematic review, and summary results will continue to be disseminated throughout the pandemic on a publicly available website (SeroTracker.com).12
Serosurveillance efforts so far have mostly taken the form of formal studies led by academic institutions. This approach makes sense when serosurveys are used as a tool to periodically monitor the spread of infection and identify high-risk groups. However, given the rise of more infectious SARS-CoV-2 variants, continued uncertainty about the global prevalence of infection, and variably quality of serosurvey design and reporting, more coordinated, standardized, and routine serosurveillance may be needed. Furthermore, as vaccines are deployed, there may be additional value derived from serosurveys, specifically in evaluating vaccine effectiveness in the real world, monitoring aggregate immunity arising from infection and vaccination, and measuring population antibody titres as a correlate of protection and as an indicator for vaccine boosters. Therefore, going forward, serosurveillance efforts may better serve end-users if they take the form of real-time monitoring programs housed in public health units, using standardized serosurvey protocols and reporting. Leaders who can compare studies in their regions over time and pair vaccine distribution data with live serosurveys will be well-equipped to track the pandemic, understand the impact of variants, and monitor outcomes of vaccination efforts in their communities in real time.
Conclusion
Our review shows that SARS-CoV-2 seroprevalence remains low in the general population, indicating the importance of remaining vigilant until vaccine-derived herd immunity is achieved. There are clear geographic and population differences in SARS-CoV-2 infection prevalence, with certain groups disproportionately affected. Policy and decision makers need to better protect these groups to reduce inequity in the impact of COVID-19.
As the COVID-19 pandemic progresses and serology data accumulate, ongoing evidence synthesis is needed to inform public health policy. We will continue to update our systematic review and seroprevalence dashboard to help address this need.
Contributors
The study was conceived by RKA, NB, TY, TGE, JP, and MPC. The protocol and data collection methods were designed by NB, RKA, CC, EB, ML, ND, JVW, CY, JP, and MPC. Analysis methods were designed by RKA, ML, NB, JoC, JP, and MPC. Article screening, data extraction, and critical appraisal were conducted by NB, RA, CC, EB, ML, MY, SPA, HR, CD, NI, ND, JVW, TY, LP, MS, JuC, and MW. Additional data was collected by CC, EB, ML, and NB. The data for this manuscript and the companion dashboard was managed by NB, CC, JVW, ND, AA, SR, and AJ. Data was analyzed by RKA, ML, AA, SR, and AJ. Data was interpreted by NB, RKA, CC, EB, MY, SPA, ML, DC, CPY, TW, TGE, JoC, JP, and MPC. The first draft was written by NB, RKA, CC, EB, ML, and MPC. NB, RKA, CC, EB, and ML verified the underlying data. All authors debated, agreed to the findings, and provided critical revisions to the paper.
Declaration of interests
DAC reports personal fees from Oxford University Innovation, Biobeats, and Sensyne Health. MPC reports grants from McGill Interdisciplinary Initiative in Infection and Immunity and grants from Canadian Institutes of Health Research during the conduct of the study; personal fees from Gen1E Lifesciences (as a member of the scientific advisory board) and personal fees from nplex biosciences (as a member of the scientific advisory board), both outside the submitted work. JP reports grants and personal fees from Seegene and AbbVie, grants from MedImmune and Sanofi Pasteur, outside the submitted work. RKA, NB, and TY report grants from the World Health Organization and the Canadian Medical Association for SARS-CoV-2 serosurveillance, both outside the submitted work.
Supporting information captions
S1 File. PRISMA checklist
S2 File. Search strategy
S3 File. Tool for assessing study risk of bias
S4 File. Additional data analysis details
S5 File. Methods for selecting and gathering data on cumulative incidence and population size
S1 Table. Characteristics and primary results of studies reporting population-wide seroprevalence estimates
S2 Table. Characteristics and primary results of studies reporting population-specific seroprevalence estimates
S3 Table. Risk of bias results for each included study
S4 Table. Summary of unadjusted meta-analysis results. aUsing adjusted seroprevalence estimates. Abbreviations: CI= confidence interval.
S5 Table. Summary of serological tests used in included seroprevalence studies
S6 Table. Summary of meta-regression results. aThe regression coefficient β refers to the change in the log seroprevalence of antibodies to SARS-CoV-2 given changes in the covariate. bDetails of uncorrected model: Intercept coefficient β -3.45 (95%CI -3.76, -3.13); Mixed-Effects Model (k = 570); tau^2 0.4759 (SE = 0.1331); I^2=99.64%; R^2=21.26%. Test of Moderators (coefficients 2:16): QM(df = 15) = 263.86, p-val < 0.0001. cDetails of corrected model: Intercept coefficient β -3.45 (95%CI -3.78, -3.12); Mixed-Effects Model (k = 417); tau^2 =0.3574(SE =0.0791); I^2=98.94%; R^2=63.97%. Test of Moderators (coefficients 2:16): QM(df = 15) = 322.6942, p-val < 0.0001. Abbreviations: B = beta; CI = confidence interval; exp = exponentiated; GBD = global burden of disease.
S1 Figure. Map of serosurvey distribution by global burden of disease region. The number of countries reporting any serosurvey in each GBD region were: Central Europe, Eastern Europe, and Central Asia (n=26); in High Income regions (n=747); in Latin America and Caribbean (n=69); in North Africa and Middle East (n=21); in South Asia (n=49); in Southeast Asia, East Asia, and Oceania (n=46); and in Sub-Saharan Africa (n=10).
S2 Figure. Seroprevalence to cumulative case incidence ratios using cumulative incidence nine days prior to the serosurvey end date
Acknowledgments
We would like to thank Dr. Diane Lorenzetti, a health science librarian at the University of Calgary, for her assistance in developing the search strategies. We would like to thank Prof John Ioannidis for his suggestion to disaggregate the case to infection ratio by global burden of disease region given the under-representation of data from low and middle income countries. We would also like to thank all serosurvey authors who contributed data and enhanced the quality of this review. CPY and JP hold a “Chercheur-boursier clinicien” career award from the Fonds de recherche du Québec – Santé (FRQS). JC holds a Canada Research Chair in Global Environmental Health and Epidemiology.
Footnotes
This review has been updated to include data published from January 1, 2020 to December 31, 2020. The amount of included data has tripled relative to the original version; the review now includes 968 seroprevalence studies with 9.3 million participants in 74 countries.