ABSTRACT
Commercial availability of serological tests to evaluate immunoglobulins (Ig) towards severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has grown exponentially since the onset of COVID-19 (Coronavirus Disease 2019) outbreak. Their thorough validation is of extreme importance before using them as epidemiological tools to infer population seroprevalence, and as complementary diagnostic tools to molecular approaches (e.g. RT-qPCR). Here we assayed commercial serological tests (semiquantitative and qualitative) from 11 suppliers in 126 samples collected from hospitalized COVID-19 patients, and from 36 healthy and HIV-infected individuals (collected at the pre-COVID-19 pandemic). Specificity was above 95% in 9 tests. Samples from COVID-19 patients were stratified by days since symptoms onset (<10, 10-15, 16-21 and >21 days). Tests sensitivity increases with time since symptoms onset, and peaks at 16-21 days for IgM and IgA (maximum: 91.2%); and from 16-21 to >21 days for IgG, depending on the test (maximum: 94.1%). Data from semiquantitative tests show that patients with severe clinical presentation have lower relative levels of IgM, IgA and IgG at <10 days since symptoms onset in comparison to patients with non-severe presentation. At >21 days since symptoms onset the relative levels of IgM and IgG (in one test) are significantly higher in patients with severe clinical presentation, suggesting a delay in the upsurge of Ig against SARS-CoV-2 in those patients.
This study highlights the high specificity of most of the evaluated tests, and sensitivity heterogeneity. Considering the virus genetic evolution and population immune response to it, continuous monitoring of commercially available serological tests towards SARS-CoV-2 is necessary.
INTRODUCTION
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a large RNA virus from the Coronaviridae virus family that is currently globally spread (1,2). Considering the absence of an effective vaccine or treatment for SARS-CoV-2 infection, early diagnosis of infection and isolation of infected individuals is critical to control the ongoing pandemic (3). Most efforts for case detection involve collection of swab samples from the upper respiratory tract, and the amplification of viral nucleic acids sequences by RT-qPCR. These sequences include genes encoding for the viral proteins: envelop (E), RdRp, nucleocapsid (N) 1 and 2, and spike (S) (4). However, RT-qPCR-based diagnosis is time-consuming, expensive and requires highly trained professionals. Serological tests arise as interesting complementary diagnostic tools, but also as means to detect the presence of antibodies towards SARS-CoV-2 at the population level.
Following SARS-CoV-2 infection, most patients produce detectable immunoglobulins (Ig) against a set of viral antigens, particularly to the immunodominant N and S proteins (5-8). Current evidence suggests that Ig produced against these antigens may confer protection against SARS-CoV-2 infection (9-11). Nevertheless, there is still insufficient data on the timing of Ig production upon infection. Literature suggests that, considering the timing since symptoms onset, blood IgA and IgM are detected after 6-8 days; IgA increases continuously up to 20-22 days, and IgM peaks at 10-14 days (5,12-15). IgG seroconversion seems to occur slightly later than IgM, at 9-10 days since symptoms onset. However, many patients seroconvert for both IgM and IgG simultaneously, peaking at around 21 days (13,16-18). Duration and magnitude of Ig response likely correlates with disease severity, and it is yet debatable whether Ig levels remain at sufficient protective levels for long periods of time after viral clearance (17,18).
Serological studies have the potential to help in understand individual and herd immunity to a viral infection (19). Several studies evaluated and compared the performance of serological assays (12,20,21). Notably, the performance (sensitivity and specificity) of these assays can be affected by many variables including: timing of assessment since symptoms/infection, course of COVID-19 (Coronavirus Disease 2019) (from asymptomatic to lethal) and, potentially, population and virus genetics (22,23). It is thus unequivocally important to evaluate the performance of available serological tests in distinct populations and countries, to be able to select the most adequate tests.
Herein, we evaluate the performance of serological tests (3 semiquantitative and 8 qualitative) from 11 suppliers using plasma samples of hospitalized patients with COVID-19 from the Minho region, in the North of Portugal. These tests were chosen considering previous reports on their sensitivity, specificity, and availability.
METHODS
Study population
Patients living in the Minho region of Portugal, followed-up as inpatients at Senhora da Oliveira Hospital (Guimarães) and Braga Hospital, diagnosed with COVID-19 (by RT-qPCR at a reference laboratory; at least two positive RT-qPCR results were obtained from each patient) were invited to participate in the study. This study was approved by the Ethics Committees of both participating Hospitals (Senhora da Oliveira Hospital: 25/2020; Braga Hospital: 37/2020). An explanation of the project was provided to those individuals, and the ones that agreed to participate signed an informed consent form. The informed consent was prepared according to the Declaration of Helsinki principles, the Oviedo Convention and according to the General Data Protection Regulation – Regulation (EU) 2016/679. Patients’ blood samples were collected throughout their hospitalization, at different timepoints following symptoms onset. The number of samples available from each participant vary depending on the duration of their hospitalization.
For sensitivity calculation, COVID-19 patients were stratified based on the number of days since symptoms onset as follows: <10 days; 10 to 15 days; 16 to 21 days; >21 days. Days since symptoms onset were calculated based on each patient’s self-report of symptoms manifestation. COVID-19 patients were further categorized according to the severity of their clinical presentation. Patients given oxygen therapy above 10 L/min and/or needing mechanical ventilation (non-invasive or invasive) were considered as having a severe clinical presentation. All other patients (needing supplementary oxygen therapy below 10 L/min and not requiring mechanical ventilator support) were considered as having non-severe clinical presentation. SARS-CoV-2 non-infected controls were selected from banked human plasma samples from two studies at pre-COVID-19 pandemic (the first COVID-19 case in Portugal was reported in early March 2020): i) a study with healthy individuals older than 55 years of age (samples collected between April 2019 and January 2020); ii) a study with HIV-infected patients on antiretroviral therapy for 54 to 60 months (samples collected between January 2016 and August 2018) (24). In both cases, matched samples were selected based on individuals’ sex and age. Control samples were collected, processed and preserved at -80 °C using a similar protocol as the one used for samples from COVID-19 inpatients (bellow).
Data was handled anonymously. Individuals’ sex, age and comorbidities are summarized in Table 1.
Sample processing
From each patient, venous blood was collected into K2EDTA collecting tubes and processed on the same day: blood collecting tubes were centrifuged at 2000g for 15 min, at 20 °C. Plasma was aliquoted into screw-cap tubes and frozen at -80°C until tested.
Immunoassays
Semiquantitative [enzyme linked immune-absorbent assays (ELISA) and chemiluminescence immunoassays (CLIA)] and qualitative assays [lateral flow immunoassays (LFIA)] from 11 different suppliers were tested according to manufacturer’s instructions (Supplementary Table S1). At least two different tests were performed for each sample (Supplementary Table S2, S3 and S4).
Data analysis
For each test, specificity was calculated as the percentage of negative tests among the pre-COVID-19 controls, and sensitivity as the percentage of positive tests among the SARS-CoV-2 confirmed cases. Sensitivity was calculated upon stratification in days since symptoms onset (<10, 10-15; 16-21 and >21 days). Positive predictive value (PPV) was calculated as the proportion of true positive cases (i.e. positive serological test on confirmed SARS-CoV-2 infection by RT-qPCR) among the total positive tests, and the negative predictive value (NPV) as the proportion of true negative cases (i.e. negative serological test on pre-COVID-19 pandemic samples) among the total negative tests. Whenever the same patient was tested multiple times within the same time range during the course of the disease, only one plasma sample was considered to calculate sensitivity, PPV and NPV (Supplementary Table S2). The 95% confidence intervals for sensitivity, specificity, PPV and NPV were calculated using the Wilson score with continuity correction method. Tests agreement was evaluated using Cohen’s Kappa. For the semiquantitative tests, receiver-operator characteristic (ROC) curves were constructed and used to calculate the area under the curve (AUC) of the different serologic tests. All variables analysed had a non-normal distribution, as verified by Shapiro-Wilk tests. Comparisons of the relative amounts of Ig at the various time ranges were performed using the Kruskal-Wallis test, followed by Dunn’s post-hoc tests. Comparisons of the relative amount of Ig in the groups of patients with severe and non-severe clinical presentation were performed using the Mann-Whitney U test. Differences were considered significant when p <0.05. Statistical analyses were performed on GraphPad Prism version 8.4.3 (La Jolla, San Diego, CA, USA).
RESULTS
Study population
This study includes 89 hospitalized infected with SARS-CoV-2 (diagnosed by RT-qPCR). Plasma samples from 49 individuals were tested at a single timepoint, whereas 40 individuals were tested more than once during hospitalization (Supplementary Table S2 and S4). Most participants are women (57%; Table 1) with a median age of 71. None of the patients is positive for HIV nor was submitted to previous organ transplantation.
Performance of tests
Serologic tests from 11 suppliers were assayed for their performance. Specificity considering IgM and IgG combined range from 76.0% (Cellex) to 100.0% (Liming and Render). Considering each Ig class independently, 4 out of 10 tests assessing IgG, and 5 out of 9 tests assessing IgA or IgM show 100.0% specificity (Figure 1 and Table 2). Thirty-six samples were used as negative controls (collected during pre-COVID-19 pandemic); 10 negative controls are positive for at least 1 test; 5 of which tested positive in more than 1 test (Supplementary Table S3). As for the semiquantitative tests (Abbott, Euroimmun and Snibe), 2 of the negative control samples test positive in the Snibe IgG test and have a value close to the lower detection limit (1.124 and 1.652 AU/mL; cut-off 1.000 AU/mL; Supplementary Figure S1).
To analyse sensitivity, samples were stratified according to time since symptoms onset (<10, 10-15, 16-21 and >21 days). For each Ig, the lowest sensitivities are observed at <10 days since symptoms onset: the Leccurate IgG test presents the highest sensitivity (71.4%) and Snibe IgM the lowest (33.3%; Figure 1 and Table 2). Detection of IgA or IgM reached a maximum sensitivity at 16-21 days in 7 of the 9 tests performed. The maximum sensitivity for IgG peaked at 16-21 days in half of the tests; the other half at >21 days (Abbott, Cellex, Innovita, Leccurate, and Render). As for Getein, that detects total Ig towards SARS-CoV-2, the peak detection is at 16-21 days. Considering the combined Ig classes, at >21 days, the Euroimmun test (IgA and IgG) provides the highest sensitivity (91.4%), followed by the Snibe test (IgM and IgG; 90.6%). The combined IgM and IgG Leccurate test provides the lowest sensitivity value (75.0%).
Interestingly, the two tests with the overall highest sensitivity (Euroimmun and Snibe) show a moderate agreement index (Cohens’ Kappa) of 0.723 (Table 3). Euroimmun has the best agreement with SD (Cohens’ Kappa = 0.827), and Snibe correlates the best with Getein (Cohens’ Kappa = 0.874). For semiquantitative tests, most of the samples from COVID-19 patients with contradicting results present values far from the cut-off values (Supplementary Figure S1).
Four out of 10 tests, and 5 out of 9 tests have positive predictive values (PPV) of 100% for the detection of IgG and IgA/IgM, respectively (Supplementary Table S5). Considering the combined tests, 2 of them predict positive cases with 100% accuracy (Liming and Render); Cellex has the lowest PPV (90.2%). Euroimmun test (IgG + IgA) shows the best negative predictive value (NPV), correctly classifying a negative case in 68.5% of the cases tested. Focusing on each test independently, Abbott performs the best in identifying the negative cases based on IgG detection (57.8%) and SD based on IgM detection (61.0%); however, Euroimmun IgA is the test with the highest NPV (66.1%; Supplementary Table S5).
Analysis of the ROC curves of the semiquantitative tests reveals that Euroimmun is the test that best distinguishes SARS-CoV-2 non-infected controls from the SARS-CoV-2 confirmed cases, both for IgA and IgG [AUC (IgG) = 0.911; AUC (IgA) = 0.935; Figure 2].
Comparison of immunoglobulins levels in COVID-19 patients with distinct clinical presentation
Overall, all the Ig assayed start being detected at early days since symptoms onset (<10 days). IgM and IgG reach the maximum relative amount at 16-20 days or at >21days. A peak of IgA detection is observed at 16-21 days since symptoms onset (Supplementary Figure S2).
To further investigate the association of Ig levels with clinical presentation, COVID-19 inpatients were classified as severe or non-severe, based on the need of supplementary oxygen or mechanic ventilatory support, as specified in the Methods section. The relative amount of Ig detected on each of the semiquantitative tests was compared between the two subgroups (Figure 3). IgG relative amounts tend to be lower in the severe group at <10 days when compared with the non-severe group, reaching statistical significance for Abbot and Euroimmun [Abbott: U = 7.5, p = 0.0006; Euroimmun IgG: U = 23.5, p = 0.0045]. Accordingly, Euroimmun IgA and Snibe IgM also present lower relative amounts in the severe group at <10 days postsymptoms onset when compared with the non-severe group [Euroimmun IgA: U = 29.5, p = 0.0133; Snibe IgM: U = 33.5, p = 0,0446]. For Snibe, both IgM and IgG levels at >21 days are higher in the severe group [Snibe IgM: U = 145.5, p = 0.0106; Snibe IgG: U = 110, p = 0.0064]. No differences are observed in the other tests or time ranges analysed (Figure 3).
DISCUSSION
Serological testing to SARS-CoV-2 infection is a rapid, inexpensive, and easy diagnostic approach complementary to RT-qPCR, but also an essential tool to access the presence and profile of Ig anti-SARS-CoV-2, both individually and at the population level. The present study compares the performance of serological tests towards SARS-CoV-2 using well characterized COVID-19 inpatients at various moments of the disease course, and with diverse clinical presentation severity. This study is relevant since most commercially available tests emerged in the market soon after the COVID-19 outbreak and were based on the analysis of a limited number of patients. In the recent months, many studies exploring the production of Ig against SARS-CoV-2 have been reported, but few compare several tests between them using the same set of plasma samples. Comparison of tests performance, as presented here, is of relevance since it depends on inherent population genetic variations, on time during course of the disease and on disease severity.
Irrespective of the test methodology (ELISA, CLIA, LFIA), we observe here, as reported previously, that test sensitivity is dependent on time since symptoms onset. In addition, the combined detection of IgG and IgA/IgM against SARS-CoV-2 leads, in most cases, to a better performance than measurements of a single Ig class, regardless of the time range evaluated. These observations were previously reported by others, and are in agreement with the establishment of the immune response, where IgA, IgM and IgG have different dynamics throughout disease progression (5,12-18, 25). This dynamic nature of tests sensitivity needs to be considered when interpreting performance. Inconsistent reporting by manufactures, or lack of details about the timepoints used to establish tests’ performance, caused confusion about the expected sensitivity. This has contributed to the recent decision by the US Food and Drug Administration to remove some tests from the Emergency Use Authorization.
Previous reports concluded that N-protein/peptide-based tests present better sensitivities than the ones based on S-protein/peptide. This difference may result from an earlier immune response towards the N-protein/peptide in comparison to the one towards the S-protein/peptide, or related to the higher specificity of Ig towards the N-protein/peptide (26-28). It is not possible to confront these previous results with ours, as 4 of the assayed tests displayed no information regarding the target antigen, 4 target both proteins, 2 target only the N-protein/peptide alone; and 1 target only the S-protein.
Regarding seroconversion rate, it is striking to notice that Ig anti-SARS-CoV-2 are detectable at very early days since symptoms onset (at 3 to 5 days either for IgM, IgA or IgG; e.g. Patients 012, 033 and 070; Supplementary Table S2), though for others, detection using the same tests occurs only much later (>21 days; e.g. Patients 004, 037 and 064; Supplementary Table S2). Several factors might account for these observations. Antibody production kinetics during SARS-CoV-2 infection is not yet fully elucidated, nor are the factors responsible for differences in patients’ response. As others suggested previously (17,18), we also observe that patients with non-severe clinical presentation show higher relative amounts of IgG, IgA and IgM at the earlier period since symptoms onset. Identifying the variable responsible for distinct profiles of seroconversion is necessary to be able to fully understand false-negative results.
It is important to recall that the time since symptoms onset is self-reported, which might introduce variability taking into consideration the diversity of clinical manifestation of COVID-19 and of symptoms perception by each individual. However, we consider that it introduces less variability than time since disease diagnosis, which is performed for some patients before symptoms onset and for others several days after.
Due to the high specificity of most of the assayed tests, their PPV are relatively high (over 90%). Individuals with positive serodiagnosis tests have a high likelihood of being infected by SARS-CoV-2 (past or present infection). However, a negative serologic test in these settings should not be interpreted as not being infected with SARS-CoV-2 as the NPV are relatively low (38% to 68%).
Our results highlight that Ig levels, on a hospitalized COVID-19 population, depend on disease severity. In fact, our data suggests a delay in Ig detection in patients with worse disease presentation; they present lower relative amounts of Ig at the initial phase of the disease and higher at later stages, in comparison to patients with non-severe clinical presentation. This information is of outmost relevance in clinical settings and needs to be further explored in a larger cohort of patients evaluated longitudinally, since the very beginning of symptoms manifestation perception, and throughout longer time-periods.
Our results are based on measurements performed in a specific population of hospitalized COVID-19 patients and should not be extrapolated to the general population. Still, it provides the basis for an informed selection of a serological test to be assayed and applied in a greater population setting to evaluate the potential value of each test as a complement for COVID-19 diagnosis and to understand the dynamics of Ig production upon infection with SARS-CoV-2.
Data Availability
All data generated or analysed during this study are included in this published article and its supplementary information file.
AUTHOR’S CONTRIBUTION
CS-M, CN, SR, CC, JAP, PGC and MC-N conceptualized the study; CS-M, CN, SR and MC-N designed the experiments; JC, AR, MF, HS, OP, AC, CC and PGC recruited the patients and collected the clinical data; CS-M, CN, SR, JC-G, CSS, NV, PB-S and PA-P performed the experiments; CS-M, CN and SR prepared the figures and performed the statistical analysis; JAP, PGC and MC-N supervised the study; and all authors discussed the results and contributed to the final manuscript.
ACKNOWLEDGMENTS
This work was funded by National funds, through the Foundation for Science and Technology (FCT): R4COVID (596694995), POCI-01-0145-FEDER-016428, UIDB/50026/2020 and UIDP/50026/2020; and by the projects NORTE-01-0145-FEDER-000013 and NORTE-01-0145-FEDER-000023, supported by Norte Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (ERDF). CN, SR and NV are junior researchers under the scope of the FCT Transitional Rule DL57/2016. JC-G is supported by a FCT PhD grant, in the context the Doctoral Program in Aging and Chronic Diseases (PhDOC; PD/BD/137433/2018); CSS is supported by a FCT PhD grant, in the context of the Doctoral Program in Applied Health Sciences (PD/BDE/142976/2018). The authors gratefully acknowledge Drs. AC Braga and R Menezes (University of Minho) for discussions of the statistical properties of the data, Dr. Qi Huan (Nanjing Tembusu New Material Research Institute) for assistance in obtaining the Medomics kits, Dr. Lei Wu (INL) for assistance in obtaining the Getein kits and Hao Tu (Overseas Business, Getein) for providing the Getein kits for this project free of charge. We are thankful to all study participants.