ABSTRACT
Background Serological tests are crucial tools for assessments of SARS-CoV-2 exposure, infection and potential immunity. Their appropriate use and interpretation require accurate assay performance data.
Method We conducted an evaluation of 10 lateral flow assays (LFAs) and two ELISAs to detect anti-SARS-CoV-2 antibodies. The specimen set comprised 130 plasma or serum samples from 80 symptomatic SARS-CoV-2 RT-PCR-positive individuals; 108 pre-COVID-19 negative controls; and 52 recent samples from individuals who underwent respiratory viral testing but were not diagnosed with Coronavirus Disease 2019 (COVID-19). Samples were blinded and LFA results were interpreted by two independent readers, using a standardized intensity scoring system.
Results Among specimens from SARS-CoV-2 RT-PCR-positive individuals, the percent seropositive increased with time interval, peaking at 81.8-100.0% in samples taken >20 days after symptom onset. Test specificity ranged from 84.3-100.0% in pre-COVID-19 specimens. Specificity was higher when weak LFA bands were considered negative, but this decreased sensitivity. IgM detection was more variable than IgG, and detection was highest when IgM and IgG results were combined. Agreement between ELISAs and LFAs ranged from 75.8-94.8%. No consistent cross-reactivity was observed.
Conclusion Our evaluation showed heterogeneous assay performance. Reader training is key to reliable LFA performance, and can be tailored for survey goals. Informed use of serology will require evaluations covering the full spectrum of SARS-CoV-2 infections, from asymptomatic and mild infection to severe disease, and later convalescence. Well-designed studies to elucidate the mechanisms and serological correlates of protective immunity will be crucial to guide rational clinical and public health policies.
INTRODUCTION
As of April 23, 2020, more than 190,000 deaths have been attributed to Coronavirus Disease 2019 (COVID-19).1 Millions of infections by SARS-CoV-2, the virus responsible for COVID-19, have been reported, though its full extent has yet to be determined due to limited testing.2 Government interventions to slow viral spread have disrupted daily life and economic activity for billions of people. Strategies to ease restraints on human mobility and interaction, without provoking major resurgence of transmission and mortality, will depend on accurate estimates of population levels of infection and immunity.3 Current testing for the virus largely depends on labor-intensive molecular techniques.4 Individuals with positive molecular tests represent only a small fraction of all infections, given limited deployment and the brief time window when PCR testing is sensitive.5-7 The proportion of undetected cases in the original epidemic focus was estimated to be as high as 86%,8 and asymptomatic infections are suspected to play a substantial role in transmission.9-14
Widely available, reliable antibody detection assays would enable more accurate estimates of SARS-CoV-2 prevalence and incidence. On February 4, 2020, the Secretary of the United States Department of Health and Human Services issued emergency use authorization (EUA) for diagnosis of SARS-CoV-2,15 allowing nucleic acid detection and immunoassay tests to be offered based on manufacturer-reported data without formal FDA clearance.16 In response, dozens of companies have begun to market laboratory-based immunoassays and point-of-care tests. Rigorous, comparative performance data are crucial to inform clinical care and public health response.
We conducted a head-to-head comparison of available serology tests – immunochromatographic lateral flow assays (LFAs) and enzyme-linked immunosorbent assays (ELISAs) – including an evaluation of performance by time from symptom onset and disease severity. Our goal is to provide well-controlled performance data to highlight potentially useful serological assays, and to help guide their development and deployment.
METHODS
Ethical approvals
This study was approved by institutional review boards at the University of California, San Francisco (UCSF)/Zuckerberg San Francisco General Hospital (ZSFG) and Massachusetts General Hospital (MGH).
Study Design
The study population included individuals with symptomatic infection and positive SARS-CoV-2 real-time polymerase chain reaction (RT-PCR) testing of nasopharyngeal or oropharyngeal swabs, who had remnant serum and plasma specimens in clinical laboratories serving the UCSF and ZSFG Medical Center networks. We included multiple specimens per individual, but no more than one sample per time interval (1-5, 6-10, 11-15, 16-20, and >20 days after symptom onset). If an individual had more than one specimen for a given time interval, only the later specimen was included. For specificity, we included 108 pre-COVID-19 specimens from American Red Cross blood donors.17 We assessed cross-reactivity using 52 specimens from 2020: 50 with test results for other respiratory viruses (Biofire FilmArray; BioFire Diagnostics, Salt Lake City, UT), and 32 from individuals with negative results by SARS-CoV-2 RT-PCR. We based minimum sample size calculations on expected binomial exact 95% confidence limits. A total of 290 samples were included in the final analysis, including 130 from 80 SARS-CoV-2 RT-PCR-positive individuals. Some specimens were exhausted during the analysis and were not included in all tests. Data obtained from samples that did not conform to our study design were excluded.
Clinical data were extracted from electronic health records and entered in a HIPAA-secure REDCap database hosted by UCSF. Data included demographic information, major co-morbidities, patient-reported symptom onset date, symptoms and indicators of severity.
Independent data from testing efforts at MGH, with slight deviations in methods, are included as Supplementary Data. Briefly, 57 heat-inactivated serum/plasma samples from 44 SARS-CoV-2 RT-PCR-positive individuals were included. For specificity, the MGH study included 60 heat-inactivated, pre-COVID-19 samples from 30 asymptomatic adults and 30 individuals admitted with febrile and/or respiratory illness with a confirmed pathogen.
Sample Preparation
Samples from UCSF and ZSFG were assigned a random well position in one of four 96-well plates. Samples were thawed at 37°C, and up to 200uL was transferred to the assigned well without heat inactivation. Samples were then sub-aliquoted (12.5uL) to replica plates for testing. Replica plates were stored at -20°C until needed, then thawed for ten minutes at room temperature and briefly centrifuged before testing. All sample handling followed UCSF Biosafety Committee-approved Biosafety Level 2 (BSL2) practices.
For the MGH study, samples were heat-inactivated at 56°C for 60 minutes, aliquoted, and stored at 4°C and -20°C. Samples stored at 4°C were used within 7 days. Frozen aliquots were stored until needed with only a single freeze-thaw cycle for any sample. All samples were brought to room temperature and briefly centrifuged prior to adding the recommended volume to the LFA cartridge.
Immunochromatographic Lateral Flow Assays (LFAs)
Ten lateral flow assays were evaluated on samples from UCSF and ZSFG (Supplementary Table 1). At the time of testing, cartridges were labeled by randomized sample location (plate, well). The appropriate sample volume was transferred from the plate to the indicated sample port, followed immediately by provided diluent, following manufacturer instructions. The lateral flow cartridges were incubated for the recommended time at room temperature before readings. Each cartridge was assigned a semi-quantitative score (0 for negative, 1 to 6 for positive) for test line intensity by two independent readers blinded to specimen status and to each other’s scores (Supplementary Figure 1).17 For some cartridges (DeepBlue, UCP, Bioperfectus), the positive control indicator failed to appear after addition of diluent in a significant fraction of tests. For these tests, two further drops of diluent were added to successfully recover control indicators in all affected tests. These results were included in analyses. During testing, two plates were transposed 180° and assays were run in the opposite order from the wells documented on cartridges. These data were corrected and accuracy was confirmed by empty well position and verification of a subset of results.
Enzyme-Linked Immunosorbent Assays (ELISAs)
Epitope Diagnostics ELISAs were performed according to manufacturer specifications. Cutoffs for IgG and IgM detection were calculated as the package insert described (see Supplementary Methods). Values greater than the cutoff were considered positive.
An in-house ELISA was performed with minor deviations from a published protocol.18 SARS-CoV-2 Receptor Binding Domain (RBD) protein was produced from the published construct (NR-52306, BEI Resources). The positive cutoff was equal to the mean of the OD values of the negative control wells on the respective plate plus three times the standard deviation of the OD value distribution from the 108 pre-COVID-19 plasma. For both ELISAs, background-corrected OD values were divided by the cutoff to generate signal-to-cutoff (S/CO) ratios. Samples with S/CO values greater than 1.0 were considered positive.
Data Analysis
For LFA testing, the second reader’s scores were used for performance calculations, and the first reader’s score was used to calculate inter-reader agreement statistics. Percent seropositivity among RT-PCR-confirmed cases was calculated by time interval from symptom onset. Specificity was based on results in pre-COVID-2019 samples. Binomial exact 95% confidence intervals were calculated for all estimates. Analyses were conducted in R (3.6.3) and SAS (9.4).
RESULTS
Study population
SARS-CoV-2-positive individuals in the UCSF/ZSFG study ranged from 22 to >90 years of age (Table 1). The majority of SARS-CoV-2-positive individuals were Hispanic/Latinx (68.7%), reflecting the ZSFG patient population and demographics of the epidemic in San Francisco.19,20 Most presented with cough (91.2%) and fever (86.2%). Chronic medical conditions, such as hypertension, type 2 diabetes mellitus, obesity, and chronic kidney disease, were frequent. Of the 80 cases, 18.7% were outpatients, 45.0% inpatients without ICU care, and 36.2% required ICU care; there had been no reported deaths at the time of chart review.
Test Performance
The percentage of specimens testing positive rose with increasing time from symptom onset (Table 2, Figure 1A), reaching the highest levels in the 16-20 and >20 day time intervals. The highest detection rate was achieved by combining IgM and IgG results (Figure 1B). However, 95% confidence intervals for later time intervals showed substantial overlap with those for earlier intervals (Figure 1B). Four assays (Bioperfectus, Premier, Wondfo, in-house ELISA) achieved >80% positivity in the latest two time intervals (16-20 and >20 days) while maintaining >95% specificity. Some tests were not performed on a subset of specimens due exhausted sample material, which may have affected reported percent positivity. IgM detection was less consistent than IgG for nearly all assays. Kappa agreement statistic ranged from 0.95 to 0.99 for IgG and 0.81-1.00 for IgM for standardized intensity score and training (Supplementary Table 2 and Supplementary Figure 2). Although variability in mean band intensities exists among different assays, the rate of sample positivity was generally consistent (Figure 2).
We observed a trend towards higher percent positivity by LFA for patients admitted to ICU compared to those with milder disease, but the specimen numbers per time interval were low, limiting statistical power (Supplementary Figure 3).
Test specificity ranged from 84.3%-100.0%, with 39/108 samples demonstrating false positive results by at least one LFA (Table 2 and Figure 2B). Of the false positive results, 61.5% (24/39) had a weak positive score (1). Intensity scores of 2-3 were seen in 30.8% (12/39) and scores of 4-6 were seen in 7.7% (3/39) of the positives from the pre-COVID-19 samples.
We evaluated the tradeoff between percent positivity and specificity as a function of LFA reader score. Changing the positive LFA threshold from 1 to 2 decreased the mean overall percent positivity across tests from 66.0% (range: 56.9%-74.2%) to 56.7% (range: 44.0%-64.6%) and increased the average specificity from 94.6% (range: 84.3%-100.0%) to 98.2% (range: 94.4%-100.0%) (Supplementary Figure 4). An independent study at MGH compared three LFAs, of which BioMedomics was also assessed in the current study (Supplementary Table 3). Overall, both studies showed a trend for increased detection of SARS-CoV-2 specific antibodies with increased time from symptom onset. However, the MGH study displayed increased specificity with lower percent positivity at early timepoints after symptom onset. MGH positivity thresholds were set higher to prioritize test specificity (Supplementary Figure 4B-C).
A set of specimens obtained during the COVID-19 outbreak that had negative SARS-CoV-2 RT-PCR testing and/or alternative respiratory pathogen diagnoses demonstrated higher numbers of positive results compared to the pre-COVID-19 sample set (Figure 2C). Five specimens had positives results by >3 tests, all with respiratory symptoms and concurrent negative SARS-CoV-2 RT-PCR testing (Figure 2C, arrows). One patient was positive on 8 different tests including the in-house ELISA. In this limited panel, no consistent pattern of cross-reactivity was identified in samples from individuals with non-SARS-CoV-2 respiratory viruses, including 2 strains of seasonal coronavirus (1 coronavirus OC43, 3 coronavirus HKU1).
Agreement between results of LFAs with those of IgG and IgM Epitope ELISAs ranged from 75.8%-85.7%, while agreement with the in-house ELISA ranged from 83.6%-94.8% (Figure 3A). LFA band intensity scores showed a direct correlation with ELISA S/CO values (Figure 3B).
DISCUSSION
This study describes test performance for 12 COVID-19 serology assays on a panel of 130 samples from 80 individuals with PCR-confirmed SARS-CoV-2 infection and 108 pre-COVID-19 specimens. For each test, we quantified detection of IgM and/or IgG antibodies by time period from onset of symptoms and assessed specificity and cross-reactivity. We hope these data will inform the medical community, public health efforts, and governmental institutions in planning for SARS-CoV-2 serological testing. This study also seeks to provide feedback to manufacturers about areas of success and necessary improvement. There is no “gold standard” to identify true seropositive blood samples. The extent and time-course of antibody development are not fully understood as yet, and may vary between different populations, even among RT-PCR-confirmed cases.
We focused on comparisons of percent positivity by time interval, rather than reporting the “sensitivity” of each assay. As expected, percent positivity rose with time after symptom onset.5,6,21-23 High rates of positive results were not reached until at least 2 weeks into clinical illness; diagnosis at time of symptom onset thus remains dependent on viral detection methods. The assays showed a trend to higher positive rates within time intervals for more severe disease, but this finding should be interpreted with caution, due to the limited data from ambulatory cases. The majority of samples >20 days post-symptom onset had detectable anti-SARS-CoV-2 antibodies, suggesting good to excellent sensitivity for all evaluated tests in hospitalized patients three or more weeks into their disease course. However, well-powered studies testing ambulatory or asymptomatic individuals, including performance with capillary blood, will be essential to guide appropriate use of serology.
Our data demonstrate specificity greater than 95% for the majority of tests evaluated and >99% for 2 LFAs (Wondfo, Sure Biotech) and the in-house ELISA (adapted from Amanat et al, 2020)18. We observed moderate-to-strong positive bands in several pre-COVID-19 blood donor specimens, some of them positive by multiple assays, suggesting the possibility of non-specific binding of plasma proteins, non-specific antibodies, or cross-reactivity with other viruses. Three of the pre-COVID-19 specimens (2.3%) were scored positive by more than three assays. Intriguingly, the fraction of positive tests was higher in a set of recent specimens obtained during the COVID-19 outbreak from individuals undergoing respiratory viral workup, many with negative SARS-CoV-2 RT-PCR. Five of these (9.6%) had positive results by more than three assays, without relation to a specific viral pathogen, suggesting non-specific reactivity and/or missed COVID-19 diagnoses. One specimen was positive by 8 of 12 assays, including the in-house ELISA. The patient was >90 years old and presented with altered mental status, fever, and ground glass opacities on chest radiological imaging. SARS-CoV-2 RT-PCR was negative and ancillary laboratory testing suggested a urinary tract infection. This case could represent COVID-19 not detected by RT-PCR, reinforcing the importance of caution in interpreting negative molecular results as ruling out the infection. Moreover, this suggests potential utility of serological testing as a supplemental diagnostic tool, given the decline in sensitivity of current molecular tests in the second week of illness.5 Appropriate clinical algorithms for serology testing, including confirmatory or reflexive testing, have yet to be determined. These algorithms will be affected by test performance characteristics and prevalence of disease across all presentations of infected individuals.
Importantly, we still do not know the extent to which positive results by serology reflect a protective immune response.24 Future functional studies are critical to determine whether specific antibody responses predict virus neutralization and protection against re-infection. Until this is established, conventional antibody assays should not be used as predictors of future infection risk.
High specificity testing is crucial in low-prevalence settings. One approach to increase specificity would employ confirmatory testing with an independent assay (perhaps recognizing a distinct epitope or antigen). Our comparison of UCSF and MGH data suggests that reclassifying faint bands as negative or inconclusive would increase specificity, albeit at the expense of sensitivity. Detection cutoffs should be selected recognizing the likely tradeoff between specificity and sensitivity, the population prevalence, and the goals of testing. Accurate LFA use also depends on adequate reader training and standardized intensity cutoffs. Our findings highlight the need for rigorous evaluation of alternative implementation algorithms in multiple settings.
Our study also reinforces the need for assay validation using standardized sample sets with: 1) known positives from individuals with a range of clinical presentations at multiple time points after onset of symptoms, 2) pre-COVID-19 outbreak samples for specificity, and 3) samples from individuals with other viral and inflammatory illnesses as cross-reactivity controls. Coordinated efforts to validate and ensure widespread availability of such standardized sample sets would facilitate effective utilization. We will continue to evaluate serology assays, and provide updated data on a dedicated website (https://covidtestingproject.org). Current and future studies by our group and others will provide an essential evidence base to guide serological testing during the COVID-19 pandemic.
Data Availability
All data used in the study are available in an interactive format at https://covidtestingproject.org/
Competing Interests
This work was supported by gifts from Anthem Blue Cross Blue Shield, the Chan Zuckerberg Biohub, and anonymous philanthropy. C.Y.C. is the director of the UCSF-Abbott Viral Diagnostics and Discovery Center, receives research support funding from Abbott Laboratories and is on the Scientific Advisory Board of Mammoth Biosciences, Inc. C. J. Y. is cofounder of DropPrint Genomics and serves as an advisor to them. M.S.A. holds stock in Medtronic and Merck. P.D.H. is a cofounder of Spotlight Therapeutics and serves on the board of directors and scientific advisory board, and is an advisor to Serotiny. P.D.H. holds stock in Spotlight Therapeutics and Editas Medicine. A.M. is a cofounder of Spotlight Therapeutics and Arsenal Biosciences and serves on their boards of directors and scientific advisory boards. A.M. has served as an advisor to Juno Therapeutics, is a member of the scientific advisory board at PACT Pharma, and is an advisor to Trizell. A.M. owns stock in Arsenal Biosciences, Spotlight Therapeutics and PACT Pharma. RY owns stock in Abbvie, Bluebird Bio, Bristol Myers Squibb, Cara Therapeutics, Editas Medicine, Esperion, and Gilead Sciences. Unrelated to this current work, the Marson lab has received sponsored research support from Juno Therapeutics, Epinomics and Sanofi, and a gift from Gilead.
Acknowledgements
We thank all members of the Marson lab and the Hsu lab, Peter Kim, Scott Boyd, Joe DeRisi, Steve Quake, Bryan Greenhouse, Christina Tato, Jennifer Doudna, Fyodor Urnov, David Friedberg, David Neeleman, John Hering, Cindy Cheng, Neal Khosla, Matt Krisiloff, Lachy Groom, Chenling Xu, Dave Fontenot, Jim Karkanias, Gajus Worthington, Bill Burkholder, Charlie Craik, XPrize Pandemic Alliance, Warris Bokhari, Zem Joaquin, Siavash Sarlati, Scott Nesbit, William Poe, Sam Broder, Verily, Charlie Kim, Aleksandra Kijac, Marc Solit and the Coronavirus Standards Working Group, Diane Havlir, Joanne Engel, Peter Farley, Jeff MacGregor, Kimberly Hou, Bob Sanders, Sarah Yang, and Sean Parker. We thank Yagahira Elizabeth Castro-Sesquen for sharing her semi-quantitative LFA scale, which was adapted for use in our current study. The work was supported by gifts from Anthem Blue Cross Blue Shield, the Chan Zuckerberg Biohub, and anonymous philanthropy. We thank the following sources for donation of test kits: the manufacturers of Bioperfectus, Decombio, Sure-Bio, UCP Biosciences; David Friedberg; John Hering; Henry Schein (Melville, NY). The Wilson Lab has received support from the Rachleff Family Foundation. The Hsu lab has received support from S. Altman,
V. and N. Khosla, D. and S. Deb, the Curci Foundation, and Emergent Ventures. P.D.H. holds a Deb Faculty Fellowship from the UC Berkeley College of Engineering and is the recipient of the Rainwater Foundation Prize for Innovative Early-Career Scientist. The Marson lab has received gifts from J. Aronov, G. Hoskin, K. Jordan, B. Bakar, the Caufield family and funds from the Innovative Genomics Institute (IGI), the Northern California JDRF Center of Excellence and the Parker Institute for Cancer Immunotherapy (PICI). We thank the National Institutes of Health for their support (J.D.W. R38HL143581; A.E.G. F30AI150061; D.N.N. L40 AI140341; S.P.B. NHLBI R38HL143581; T.A.M. 1F30HD093116; D.W NIH 1F31NS106868-01; J.G.C. R01 AI40098; MSTP students supported by T32GM007618). R.Y. was supported by an AP Giannini Postdoctoral Fellowship. J.A.S. was supported by the Larry L. Hillblom Foundation (2019-D-006-FEL). A.M. holds a Career Award for Medical Scientists from the Burroughs Wellcome Fund, is an investigator at the Chan–Zuckerberg Biohub and is a recipient of The Cancer Research Institute (CRI) Lloyd J. Old STAR grant.
Footnotes
↵* Co-first author