Abstract
Background Reported COVID-19 cases underestimate the true number of SARS-CoV-2 infections. Data on all infections, including asymptomatic infection, are needed to guide state testing and prevention programs. To minimize biases in estimates from seroprevalence surveys and reported cases, we conducted a state-wide probability survey of Georgia households and estimated cumulative incidence of SARS-CoV-2 infections adjusted for antibody waning.
Methods From August to December 2020, we mailed kits to self-collect specimens (nasal swabs and blood spots) to a random sample of Georgia addresses. One randomly-selected adult household member completed a survey and returned specimens for virus and antibody testing. We estimated cumulative incidence of SARS-CoV-2 infections adjusted for waning antibodies, reported fraction, and infection fatality ratio (IFR). Differences in seropositivity among demographic, geographic and clinical subgroups were explored with weighted prevalence ratios (PR).
Results Among 1,370 Georgia adult participants, adjusted cumulative incidence of SARS-CoV-2 was 16.1% (95% credible interval (CrI): 13.5-19.2%) as of November 16, 2020. The reported fraction was 26.6% and IFR was 0.78%. Non-Hispanic Black (PR: 2.03, CI 1.0, 4.1) and Hispanic adults (PR: 1.98, CI 0.74, 5.31) were more likely than non-Hispanic White adults to be seropositive. Seropositivity in metropolitan Atlanta’s Fulton and DeKalb counties was similar to seropositivity elsewhere in Georgia (7.8% vs. 8.8%).
Conclusions As of mid-November 2020, one in 6 adults in Georgia had been infected with SARS-CoV-2. The scope of the COVID-19 epidemic in Georgia is likely substantially underestimated by reported cases.
Main point Using data from a probability survey of households in Georgia, USA, we estimated that 1.3 million adults aged ≥18 years experienced SARS-CoV-2 infections by November 16, 2020, of whom 1 in 4 were reported and of whom 0.78% died.
Introduction
Like many states In the United States, Georgia has experienced substantial morbidity and mortality due to COVID-19. Comprehensive, unbiased estimates of the extent of SARS-CoV-2 infections in Georgia are challenging because not all people who are infected have symptoms, and not all people who are symptomatic get tested. Although Georgia’s robust testing efforts have diagnosed over one million individuals(1), no scientifically rigorous estimate of how many Georgias have been infected with SARS-CoV-2 exists. Seroprevalence studies conducted from remnant samples in clinical settings (e.g., dialysis centers and other settings in which specimens are collected for routine screening or clinical management) can detect people who have been infected, but such studies can have biased data if they are not representative of the general population and because antibodies can become undetectable over time (“antibody waning”).(2)
For Georgia, ascertaining the total number of people who have been infected has implications for understanding the impact of COVID-19 to date and for reaching herd immunity. Having these data also can support and inform vaccination strategies. We describe findings from the COVIDVu Georgia study, a state-specific seroprevalence survey conducted among a probability-based sample of Georgia households from August to December 2020 to develop a representative estimate of the cumulative incidence of SARS-CoV-2 infection among Georgia’s adult population after adjusting for antibody waning.
Methods
Sampling
Our sampling methods have been previously described as part of the national COVIDVu study.(3) We used a national address-based household sample derived from the USPS Computerized Delivery Sequence File, which contains about 130 million residential addresses and covers all residential delivery points in the US. This sampling frame has been used in numerous health research studies.(4–6) To achieve a total sample of 1,400 responding households from Georgia, 12,894 addresses were shipped COVIDVu study materials (Figure 1). Analogous to our national study, we oversampled households in census tracts with >50% Black residents and households with surnames likely to represent Hispanic ethnicity to overcome differentially low early response rates by Black and Hispanic persons.(4) We oversampled Fulton and Dekalb counties to facilitate estimation of seroprevalence in the City of Atlanta.
Survey and Laboratory Procedures
One adult ≥18 years in each household listed household members by gender and age, and an adult household member was then randomly selected for participation by the electronic data system. Following an online consent procedure, participants completed a behavioral survey with domains including demographics, comorbidities, and symptoms; the survey instrument has been previously published.(3) Participants self-collected an anterior nares (AN) swab and a dried blood spot (DBS) card, a method we previously validated based on clinician observation of specimen collection and laboratorian assessment of specimen quality.(7, 8) Specimens were returned to a central laboratory with a prepaid mailer.(8) PCR testing of AN swabs used the Thermo EUA Version 2 kit (Thermo Fisher Scientific, Waltham, MA). Antibody testing of DBS specimens used the BioRad Platelia Total Antibody test that targets the nucleocapsid protein (i.e., IgA, IgM, IgG; BioRad, Hercules, California). Testing protocols were validated under CLIA/CAP protocols for the development of Laboratory Developed Tests. Further detail has been previously described, including approaches taken to quantify the direction and magnitude of potential biases associated with antibody waning.(3) Participants in the oversample were provided with a $100 electronic gift card incentive and all other participants were provided with a $40 electronic gift card incentive. The COVIDVu study was approved by the Emory University Institutional Review Board (STUDY00000695).
Sample Weights
We developed three sets of sample weights to allow for estimation of key parameters representing non-institutionalized and housed adults in three areas: Georgia, Fulton/Dekalb counties, and all other counties in Georgia. Each set of weights was developed using the same method as previously described. In brief, hierarchical hot deck imputation(9) was performed to ensure no participants were missing data for key variables needed for weighting such as gender, education, race, ethnicity, and marital status that each had less than 3% missingness. Design weights, adjusted with Classification And Regression Tree (CART) analysis for differential non-response, were developed to facilitate population inference. A raking procedure aligned weighted distributions to the observed distributions from the Census along the lines including age, race-ethnicity, education, and income.(10) To address outlier weights, those at the 99th percentile of each side of the distribution were trimmed. Additional detail on the weighting process can be found in our protocol paper.(3)
We estimated weighted seroprevalence and 95% confidence limits of total Ig for the entire sample and by demographic factors and reported pre-existing comorbidities, month of sampling, and symptoms. To identify significant differences in seroprevalence among groups, we estimated prevalence ratios (PRs) and corresponding 95% Wilson Modified confidence intervals (CIs) using weighted logistic regression. All analyses were conducted in SAS v9.4 and SUDAAN.
Georgia SARS-CoV-2 cumulative incidence, IFR, and reported fraction
Given the considerable evidence from population-based surveys that SARS-CoV-2 antibodies wane over time to levels below detection by numerous laboratory tests (11–13), our analysis includes a Bayesian model that accounts for waning.(14) By accounting for (1) the time between infection and seroconversion to detectable antibodies, (2) the time between seroconversion and seroreversion to undetectable antibody test results, and (3) the time from symptom onset to death, the model estimates IFR and cumulative incidence of SARS-CoV-2 based on the Georgia weighted seroprevalence estimate from this study as well as the reported daily counts of COVID-19 associated deaths. The model applies cumulative density functions for the time from seroconversion to seroreversion, estimated by a previous study(14) to adjust for antibody waning. Cumulative incidence is calculated from the total number of modeled infections since the beginning of the epidemic until the median specimen collection date of our sample (November 16, 2020). This cumulative incidence also serves as the denominator for the infection fatality ratio (IFR). The ratio of the cases reported to cumulative incidence cases (the reported fraction) was developed from confirmed PCR+ cases in Georgia as of November 16, 2020 using data for adults ≥18 years from the Georgia Department of Public Health’s public use dataset.(15)
Results
Study sample
A total of 12,894 household addresses in Georgia were selected and mailed study materials from July-October 2020 (Figure 1). Of these, 6.4% (n=833) were unable to receive mail and excluded from the sample. Behavioral surveys were completed by 14.9% (n=1,804) households. A total of 11.3% (n=1,370) of sampled households completed a behavioral survey and returned a valid specimen for antibody testing during the study period of August 9-December 8, 2020 (Table 1). Of participating households, 43% (n=585) were in the oversampled area of Fulton/Dekalb and 57% (n=785) were from other counties in Georgia.
Serology and PCR results unadjusted for antibody waning
The weighted seroprevalence in Georgia was 8.6% (95% CI: 6.3 - 11.8%), representing the period prevalence of detectable antibodies for August 9 - December 8, 2020 (Table 2). This suggests that 687,450 out of 8,113,542 adults in Georgia had prevalent anti-SARS-CoV-2 Ig at the time they provided a sample. Unweighted, a total of 7.2% of all specimens tested (99/1,370) were reactive for total Ig.
Associations with prevalence of antibody response
For the state of Georgia, the weighted seroprevalence was two times higher for Black, non-Hispanic participants than for White, non-Hispanic participants (Table 2). A non-significant effect of similar magnitude was observed for Hispanic participants relative to White, non- Hispanic participants. Those reporting cold or flu-like symptoms after January 1, 2020 were nearly 5 times more likely than those without systems to be seropositive. Among those who were seropositive, 66/99 (weighted percent: 75%, 95% CI: 58%-86%) reported cold or flu symptoms since January 1, 2020. There were no observed differences in seroprevalence by education, income, or urbanicity.
For Fulton and DeKalb counties, point estimates of disparities in seroprevalence by race were higher than in the state as a whole, but not statistically significant (Table 3). Antibody prevalence for residents in Fulton and DeKalb counties (7.8%; CI: 5.1, 11.7) was similar to prevalence in other parts of Georgia (8.8%; CI: 6.1, 12.6). Experiencing cold or flu-like systems since the beginning of 2020 was the only variable significantly associated with seropositivity among participants not residing in Fulton or DeKalb (PR=5.2; CI: 2.0, 13.8) (Table 4).
SARS-CoV-2 cumulative incidence
Adjusting estimates for waning detectable antibody levels, the estimated number of cumulative new SARS-CoV-2 infections among Georgian adults was 1,307,518 (95% CrI: 1,081,788-1,541,200) as of November 16, 2020. The cumulative incidence was 16.1% (95% CrI: 13.5-19.2%) (Figure 2). The estimated IFR was 0.78% (95% CrI: 0.66-0.94%). The Georgia Department of Health reported 348,204 COVID-19 cases as of November 16, 2020, indicating that about one-quarter (26.6%; 95% Crl: 22.6%-32.2%) of SARS-CoV-2 infections among adults was reported.
Among specimens tested with PCR, a total of 16/1,529 (1.0%) were positive. Of these 16, 8 (50.0%) were also reactive for total Ig.
Discussion
A statewide probability sample of Georgia households conducted between August and December 2020 allowed for robust estimation of the cumulative incidence of SARS-CoV-2 infection among adults, finding that over 16% of Georgia’s adult population - about one in six - had been infected with the virus as of November 2020. Seroprevalence was highest among Hispanic and non-Hispanic Black persons, and similar for the Atlanta-metro counties of Fulton and DeKalb compared to the rest of the state.
The data obtained through this household-based, representative survey complement data on reported COVID-19 cases and overcome key limitations associated with data available through traditional state-based COVID-19 surveillance activities and seroprevalence surveys. Because our household sampling strategy was not restricted to individuals experiencing COVID-19 symptoms or seeking SARS-CoV-2 testing, biases associated with testing availability, test-seeking behaviors and the inability to identify asymptomatic individuals were minimal. Additionally, because these data are obtained from a random, representative sample of Georgia residents, the findings can provide reliable inference to all adult Georgia residents. Due to the finding that, as of mid-November 2020, Georgia had only recognized approximately 26% of adults infected with SARS-CoV-2, there is an ongoing need to lower barriers for testing. Our data also validate efforts made thus far in the pandemic response to encourage and invest in frequent, ample testing, despite pushback by lawmakers and certain segments of the general public who may have viewed public health mitigation strategies to have been excessive.(16)
Population-based COVID-19 data at the state level allows for a more nuanced understanding of the continuum of infection, diagnosis and mortality and the relationship of these metrics to programmatic priorities. For example, as of November 16, 2020, Georgia’s estimated case fatality ratio (CFR) was 2.1%.(17) Because the CFR is calculated from diagnosed cases (and practically from reported cases), having an IFR (which includes people who were asymptomatic in the denominator) advances our understanding of how common death is among all people infected with COVID-19, regardless of whether those infections were symptomatic. The result is that the CFR overstates how common death is among all those infected in the state. Our more comprehensive IFR that was estimated around the same time was about about a third of the CFR. Accordingly, SARS-CoV-2 infections may not be as fatal as had been previously reported; nonetheless, based on the IFR estimated in this study, 1 out of every 130 adult Georgians who are infected with SARS-CoV-2 will die.
Knowing that Georgia went into its winter surge with 16% of the adult population having had a SARS-CoV-2 infection is informative. It provides a reliable “lower bound” on how many adults have been infected, and despite waning antibodies, the proportion infected will only increase. When coupled with increasing data on duration of immunity, these more robust estimates of cumulative SARS-CoV-2 infection can help decision-makers understand how natural immunity contributes to a Georgia-specific herd immunity metric. To that end, data from our study suggest persons over the age of 65 have experienced far less infection than other age groups (and are therefore still susceptible), validating the state’s decision to prioritize that demographic for vaccination first. Our study also found the highest seroprevalence among Hispanic and non-Hispanic Black persons, suggesting that similar findings from diagnosed cases are not the result of biases in testing. These findings should be used to strengthen messaging around why vaccination remains important for these demographic groups despite previous infection, and add urgency to investments in increasing education and reducing barriers to access for Hispanic and Black Georgians.
Although we lack power to examine differences in seroprevalence by other meaningful geographic units (e.g. health district or state region), the results stratified by Fulton/DeKalb versus the rest of the state offer some useful local insights into differences in infection by metropolitan vs. non-metropolitan areas. Fulton and DeKalb comprise all of the City of Atlanta, Georgia’s capital and most populated city. The populations of these two counties comprise 17% of Georgia’s population.(18) Observing that the seroprevalence was similar among residents of those two counties compared to the rest of the state was notable, given that national data show consistently lower diagnosis rates for micropolitan and non-core areas through July 2020, with a switch in the pattern starting in August 2020 such that more infections were reported in less urbanized areas.(19) Thus, our finding of similar seroprevalence levels in urban and rural areas of Georgia might represent a combination of more historical infections in urban areas earlier in the year, and a higher concentration of infections in more rural areas during the period of the specimen collection. Infection rates (and subsequently antibody seroprevalence) are also related to risk mitigation behaviors. Although the use of face coverings has always been strongly encouraged across Georgia, the City of Atlanta issued a mask mandate in July 2020. With increasing ecological evidence suggesting the benefits masking can have on reducing community spread of SARS-CoV-2,(20, 21) Atlanta’s mask mandate may have limited the propagation of the virus in Fulton and Dekalb counties, where higher levels of transmission might have been favored by higher population density.
Our study is subject to a number of limitations. While we utilized a representative sampling frame, our response rate was 11.3%, which is low but typical for mailed surveys using address-based sampling frames.(22) The only other two household samples reported were conducted through door to door offer of enrollment (versus mailout enrollment packages in our study) and also had relatively low response rates (23.6 to 23.7%(23)). Our results are likely subject to some degree of differential response bias; we addressed this by oversampling specific groups (e.g., Black and Hispanic households) with lower response rates, and by weighting for non-response of households, a procedure with validity facilitated by the nature of an address-based sampling frame. Importantly, we were only able to address differential non-response using characteristics of the population that were available to us on the frame (e.g., population distributions by race/ethnicity or household income levels). Characteristics that may be associated with COVID-19 risk but not available at the population-level, such as higher general propensity to take risks, were not available for extrapolation to the underlying population from sample data and therefore may contribute to unaddressed selection bias in estimates. Misclassification of antibody status was possible due to waning antibodies(2, 24), but unlike other studies reported to date, we accounted for these biases through our modeling approach.(14) Our model used an estimated average time of seropositivity from a previous study conducted for New York City (14). This estimate was generated for an ELISA kit against the SARS-CoV-2 spike protein that detects the total immunoglobulin response (25), which is different from what we used in this study, but this is the only estimate of the timeline of the population-level waning antibody available at this point.
Knowing the true proportion of people who have been previously infected with SARS-CoV-2 is useful both epidemiologically and practically. Past seroprevalence studies from convenience samples and biased samples of residual blood provide important information, but the results are subject to selection biases associated with the sources of specimens. For Georgia, having reliable estimates of the cumulative incidence of SARS-CoV-2 infection among adults allows for more informed decision-making about risk mitigation and vaccination strategies. Data collections will be repeated in March and June of 2021, and results will be examined in an ongoing way as knowledge advances on topics ranging from duration of immunity to implications of antibodies for protection against novel variants.
Data Availability
COVIDVu study data are not yet publicly available. Data used for the ratio of the cases reported to cumulative incidence cases (the reported fraction) was developed from confirmed PCR+ cases in Georgia as of November 16, 2020 using data for adults ≥18 years from the Georgia Department of Public Health's public use dataset: https://ga-covid19.ondemand.sas.com/docs/ga_covid_data.zip
Funding
This work was supported by the US National Institute of Allergy and Infectious Diseases (3R01AI143875-02S1), the Center for AIDS Research at Emory University (P30AI050409), and the Robert W. Woodruff Foundation through a grant to the Emory Covid-19 Response Collaborative. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Conflicts of interest
Potential conflicts of interest
A.C. reports a grant from the Robert W. Woodruff Foundation during the conduct of this study and a paid consultancy with the Fulton County Board of Health, outside the submitted work.
B.L. reports grants and personal fees from Takeda Pharmaceuticals, personal fees from the World Health Organization, outside the submitted work.
M.F. reports consulting fees for providing statistical support during the conduct of this study.
A.S. reports grants from National Institute of Allergy and Infectious Disease, NIAID (3R01AI143875-02S1), a grant from California Department of Health (CDPH), and a grant from the Robert W. Woodruff Foundation, during the conduct of this study.
P.S. and T.S. report grants from the National Institutes of Health during the conduct of the study.
E.H., H.B., K.T., C.D., N.L., and K.S. report no conflicts.
Acknowledgements
The authors thank Salesforce for its donation of Salesforce.org licenses and system development for the project, and to acknowledge the contributions of development team members Lee Evans, John Thrasher, Stephen Noe, Anurag Jaiswal, Ryan Williams, David Affentranger, and Todd Siegler. They also thank the Kaiser Family Foundation for their design and thoughtful contributions.