Abstract
Saliva is easily obtainable non-invasively and potentially suitable for detecting both current and previous SARS-CoV-2 infection. We established 6 standardised enzyme linked immunosorbent assays (ELISA) capable of detecting IgA and IgG antibodies to whole SARS-CoV-2 spike protein, to its receptor binding domain region and to nucleocapsid protein in saliva. In test accuracy (n=320), we found that spike IgG performed best (ROC AUC: 95.0%, 92.8-97.3%), followed by spike IgA (ROC AUC: 89.9%, 86.5-93.2%) for discriminating between pre-pandemic and post COVID-19 saliva samples. Using machine learning, diagnostic performance was improved when a combination of tests was used. As expected, salivary IgA was poorly correlated with serum, indicating an oral mucosal response whereas salivary IgG responses were predictive of those in serum. When deployed to 20 household outbreaks undergoing Delta and Omicron infection, antibody responses were heterogeneous but remained a reliable indicator of recent infection. Intriguingly, unvaccinated children showed evidence of exposure almost exclusively through specific IgA responses in the absence of evidence of viral infection. We have provided robust standardisation, evaluation, and field-testing of salivary antibody assays as tools for monitoring SARS-CoV-2 immune responses. Future work should focus on investigating salivary antibody responses following infection and vaccination to understand patterns of SARS-CoV-2 transmission and inform ongoing vaccination strategies.
Introduction
Antibody detection has proven critical for conducting epidemiological surveillance, investigating the natural history of SARS-CoV-2 and assessing novel vaccine candidates throughout the ongoing COVID-19 pandemic1–3. Serological studies have demonstrated antibodies to be correlates of protection against (re)infection4, with antibodies specific to spike protein and its RBD region demonstrated to neutralise viral binding and entry5. Currently widely used COVID-19 vaccines generate immune responses to the spike protein and serological studies have been central to vaccine evaluation. A corollary is that antibody responses to the nucleocapsid (N-protein) offer a means to differentiate infected from vaccinated individuals in settings where the vaccines utilised include only the spike antigen.
While it has been shown that antibodies to SARS-CoV-2 can be measured in saliva 6–8, there has been limited evaluation of the suitability and utility of salivary immunoassays for detecting recent infection in populations of children and adults, particularly during more recent months when new variants have been circulating and vaccination coverage is high in many countries9. In saliva, secretory IgA (sIgA) and IgG are the principal antibody classes: IgA is mostly produced by local, mucosal plasma cells while IgG is mostly derived from the blood by passive diffusion, mainly across gingival crevicular epithelium10,11. The tropism of SARS-CoV-2 for cells in the respiratory tract suggests that consequent local generation of mucosal IgA antibodies may play an important role in protection and limiting onward transmission, while salivary IgG may be a proxy for systemic immunity12–15.
As SARS-CoV-2 transitions to endemicity, monitoring infection, individual and population immunity and re-infection through antibody responses will be important for mitigating against future outbreaks, with success, in part, dependent on robust, well-characterised assays which can be used on easily obtained samples. Despite this, large-scale epidemiological studies using mucosal saliva samples are uncommon. Challenges exist in collection and handling of specimens to prevent degradation by sample proteases, as well as in individual variation in salivary production and composition10. Furthermore, mucosal immunoassays can suffer from increased assay background due to non-specific, high avidity binding of multimeric immunoglobulin16 for low-affinity antigen, reducing discriminatory power in diagnostic tests and reproducibility.
To investigate mucosal immune responses to SARS-CoV-2 and estimate rates of past infection, we developed six salivary enzyme linked immunosorbent assays (ELISA). We aimed to evaluate assay performance for detecting recently confirmed SARS-CoV-2 infection in a blinded test accuracy study. To understand salivary antibody responses further and facilitate their deployment to cohorts with unknown infection status, we correlated antibody levels measured by the assays in paired serum and saliva samples. Finally, we sought to field-test the best performing assays in a household transmission setting to investigate mucosal antibody responses in recently exposed adults and children. Our study provides robust standardisation and evaluation of saliva as a sample for SARS-CoV-2 antibody detection and provides insights into characterising mucosal immune responses following infection.
Results
Development of salivary immunoassays for SARS-CoV-2 antibody detection
Single-dilution salivary ELISAs capable of detecting antibodies specific to SARS-CoV-2 full length spike protein, receptor binding domain (RBD) and the nucleocapsid protein (N-protein) were developed based on previously described methodology for serum1 (Figure 1). Assay operating conditions were optimised to reduce background, achieve maximum discrimination between positive and negative samples and retain a good dynamic range (Figure S1). We observed highest background on a high-binding hydrophilic plate (NUNC Maxisorp): on the remaining medium binding, hydrophobic plates, background was low and comparable, although optimum discrimination was found using the Griener plate (Figure S1 a). Optimal antigen coating concentration was determined to be 10µg/ml for each antigen (Figure S1 b). Using checkerboards, we determined optimum secondary antibody and sample concentrations (IgA: 1 in 10 saliva starting dilution and 1:20,000 secondary IgA antibody; IgG: 1 in 5 saliva starting dilution and 1:15,000 secondary IgG antibody, Figure S1 c, d). Heat inactivation (56°C for 30, 45 or 60 minutes) and freeze-thawing of samples (2, 4 or 8 cycles) did not affect ELISA signal (Figure S2), allowing for safe and practical handling of samples.
The demographics and clinical characteristics of 230 known negative (pre-pandemic) and 90 known positive (convalescent SARS-CoV-2 PCR-confirmed) donors are given in Table 1. Samples were randomised 50:50 across two sample sets: a threshold set, used to determine thresholds for positivity and a validation set, for evaluating assay performance, each containing samples from 115 known negative and 45 known positive individuals. Due to low sample volumes, the final number of samples tested for each of the 6 assays differed (Table S1). The distribution of antibody responses and associated performance for the 6 assays obtained on threshold set samples is shown in Figure 2 and Figure S3. Based on the threshold set, the spike assays performed best out of all antigens, as shown by the highest area under the curve (AUC): IgA (92%, CI 95%: 87.9-96.1) and IgG (94.8%, CI 95%: 91.5-98.2%) (Table S1). Discrimination between pre-pandemic and PCR-confirmed samples was poorer for N-protein and RBD assays compared to spike (Figure S3), reflected in lower performance estimates (AUC: 60.4%-85.9%, Table S1). All assays showed high levels of reproducibility as assessed by low intra-and inter-assay signal variation in internal serum and saliva controls (Table S2 and Table S3).
Evaluation of diagnostic performance for salivo-surveillance
Next, we evaluated assay performance in the blind validation set at each of the four pre-defined thresholds determined from the threshold set (Table S4 and Figure S3). Since estimates of test accuracy (AUC, sensitivity and specificity) were similar in the validation or combined threshold/validation sample set (Table S4), we present overall accuracy estimates based on the full combined sample set, to increase precision (Table 2 and Figure 2). Based on the combined sample set, the clearest discrimination between known negative and positive samples was shown for the Spike IgA and IgG assays (Figure 2). Poor discrimination was observed for both N-protein and RBD IgA: few known positives showed reactivity to N-protein and several known negatives exhibited reactivity to RBD (Figure 2 a, b). Given the poor performance of the RBD assays (both low specificity and sensitivity) in the threshold setting phase, and limited sample volume, we did not take forward these assays in evaluating performance at pre-defined thresholds. The best performing assays were spike IgG (across both sample sets combined: AUC 95.0%, 95% CI: 92.8-97.3%) and spike IgA (AUC 89.9%, 95% CI: 86.5-93.2%), followed by N-protein IgG (AUC 84.6%, 95% CI: 79.9-89.4%). N-protein IgA had the poorest performance (AUC 71.9%, 95% CI: 65.7-78.1%, Table 2). After evaluating each assay’s overall performance in the combined threshold/validation set, the threshold which provided optimal detection of PCR-confirmed cases (i.e., highest sensitivity) whilst maintaining at least 98% specificity in validation was selected (Table S4 and Figure S4). We observed that thresholds set for 98% specificity in the threshold set maintained this performance in the validation set. The highest sensitivity observed was for spike IgG (50.6%, 95% CI: 39.8-61.4%) and lowest sensitivity for N-protein IgA (8.6%, 95% CI: 3.7-14.8%) (Table 2, values based on both data sets combined). Taken together, primary infection with SARS-CoV-2 induces salivary antibody responses against spike IgA and IgG, whereas the N-protein and RBD responses were restricted largely to IgG.
Positive and negative predictive values were assessed at prevalence of 0.1 to 40% to evaluate how the tests might perform in practice. At 5% prevalence, PPV was higher (fewer false positives) for high specificity thresholds (97th-99th percentile) in the spike IgA, IgG and N-protein IgG assays. NPV was lower (increased false negatives) for N-protein IgA. NPV and PPV for high specificity thresholds were robust at 40% prevalence (Figure S5).
To further evaluate the performance of the assays in detecting recent PCR-confirmed cases, we examined the association of false positivity with age and sex (Table S7 and Table S8), and false negativity by time since symptom onset (11 to ≥ 71 days) and symptom status (Table S9). Considering age, specificity was highest in younger age groups (0-19 years) for the spike assays (IgA and IgG: 100%) compared to N-protein (IgA: 88.9-98.2%, IgG: 98.2-100%). The lowest observed specificity was for spike IgG in 30 to 39-year-olds (81.8%). There was no indication of specificity varying by sex for N-protein or spike (Table S8), which has been reported previously in other test accuracy studies17,18. In general, sensitivity declined with increasing time since symptom onset for all assays. No PCR-confirmed asymptomatic cases had N-protein antibody responses above the threshold for positivity (0/8): sensitivity was higher in these asymptomatic cases for spike IgA (33.3%) and IgG (11%). As pre-pandemic saliva samples were tested from various collections, we assessed for clustering in antibody responses to spike, RBD and N-protein across the different pre-pandemic cohorts. For all assays, responses were similar across cohorts comprised only of children or cohorts comprised only of adults: however, signal was statistically increased for adults compared to children(Figure S6).
Combining assays to predict positive and negative individuals
Heterogeneity in responses across antigens was observed and there was discordance in the isotype response for the same antigen: individuals appeared to have predominantly SARS-CoV-2 specific IgA or IgG, whilst few had high levels for both (Figure 3a). Consistent with our earlier analysis (Figure 2c, f), the spike IgG, and to a lesser extent the spike IgA assay achieves the best discrimination. Given the heterogeneity in response, we speculated that combining readings across multiple assays could improve sensitivity for recent infection. To test this hypothesis, we trained AdaBoost classifiers19 to predict positive and negative individuals using either one or a combination of the 6 assays. The best performing model was trained with data from the N-protein, RBD and spike IgG assays (mean ROC AUC score = 0.94; Figure 3b). The performance of this model was substantially better than the performances of the models trained with the individual assays (mean ROC score between 0.54 for N-protein IgA and 0.82 for spike IgG). The model trained with the spike IgA and IgG assays (mean ROC AUC score = 0.88) performed somewhat better than those trained using the individual spike IgA (mean ROC AUC score = 0.76) and spike IgG (mean ROC AUC score = 0.82) assays (Figure 3b). Combining N-protein IgG and spike IgG assays (mean ROC AUC score = 0.89) gave very similar performance to combining both spike assays. Performance of all models based on assays individually or combined is shown in Table S10.
Salivary IgA antibody indicates mucosal antibody responses and IgG, systemic antibody responses
Salivary antibody responses were compared with serum antibody to investigate the mucosal immunological profiles in individuals with recent SARS-CoV-2 infection (Figure 4). Among the 320 available samples, 97 individuals had had saliva and serum collected on the same day, of whom 83 were PCR-confirmed and 14 pre-pandemic (see ‘Methods’ for further details). Results from samples collected from PCR-confirmed cases were positively correlated for all 6 assays, but all the IgA assays were less well correlated between saliva and serum than the IgG assays (Tau = 0.11, 0.23, 0.22: Tau = 0.58, 0.33, 0.39 N-protein, RBD and spike IgA and IgG, respectively), with several individuals having specific salivary IgA in the absence of detectable serum IgA antibody. N-protein IgG responses exhibited the strongest positive saliva-serum correlation (Tau = 0.58, p < 0.001, n = 73), whereas N-protein IgA exhibited the weakest correlation (Tau = 0. 11, p = 0.14, n = 78). Fewer matched pre-pandemic samples were available but are plotted for visual reference. For salivary samples assigned as positive for spike IgA (n=28) or spike IgG (n=40) based on validated thresholds, we explored the distribution of antibody responses in relation to time since symptom onset and age, and how these salivary responses correlated to serum (Figure S8). Salivary antibodies were detectable up to 123-and 133-days post onset of symptoms for spike IgA and IgG respectively. In general, trends were similar between the two sample types for both isotypes and there were no marked apparent differences associated with age or time since symptom onset, although sample sizes were small.
Field-testing assays in SARS-CoV-2 household outbreaks
Spike and N-protein assays were deployed on samples collected following recent household transmission events (13/07/2021 to 22/02/2022) to evaluate their utility in monitoring SARS-CoV-2 infections under field conditions (when Delta and Omicron variants were prevalent in the UK). Twenty households consisting of 19 index cases (10 child and 9 adult primary cases first presenting with a positive RT-PCR test), and 48 household contacts self-sampled twice weekly for 4 weeks (Figure 1 and Table 3). All households included at least 1 child. Viral shedding profiles for 36 PCR positive individuals are shown in Supplementary Figure S9: all 19 index cases and 11 contacts were PCR positive on Day 0 (prevalent infection); 5 contacts became PCR +ve during the study (incident infection); and 28 contacts remained uninfected. Of the PCR +ve cases, 23/36 (63.9%) reported symptoms. Four participants reported a previous PCR-confirmed infection (between 30 to 73 days prior to Day 0). One vaccinated individual reporting prior infection was re-infected during the study when Omicron was dominant (January 2022).
We measured the antibody responses both among household members with PCR-confirmed infection during the study and those who remained uninfected (PCR -ve). Most PCR +ve cases (34/36, 94.4%) mounted salivary spike IgA or IgG responses, whilst fewer than half raised antibodies to N-protein (Table 4). Of the PCR +ve cases that were asymptomatic, most were antibody positive (11/13, 84.6%). The two PCR +ve cases who did not have detectable specific antibody were asymptomatic children (<10 years) and were PCR +ve on Day 0 only. In this setting, combining IgA and IgG results for both antigens increased sensitivity for PCR +ve cases, although no improvement was seen when combining antigens, as the few individuals that raised anti-N-protein antibody had also raised anti-spike antibody (Table 4). Spike antibody positivity detected ongoing household infections, as rates of anti-spike (IgA or IgG) increased through the 26-day period in PCR +ve cases but remained relatively constant among PCR -ve contacts (Figure 5b, d). Similarly, for all assays, rates of salivo-conversion (i.e. antibody negative at Day 0 and positive on at least one timepoint subsequently) were higher for PCR +ve cases than PCR –ve contacts (Table 5). For example, spike IgG conversion rates were 79.2% and 27.8% for PCR +ve and PCR -ve household members respectively. Antibody was detected among some PCR +ve cases and PCR -ve contacts on Day 0, suggestive either of pre-existing antibody or early mucosal responses generated post exposure/infection shortly before study enrolment (Figure 5).
Next, we investigated antibody responses in the context of different prior exposures (vaccination and/or infection) in adults and children (Figure S10 and Table S11). All unvaccinated individuals were children (27/67, 40.3%, Table 3). Unvaccinated, PCR -ve children mostly raised spike and N-protein IgA antibody responses (4/11, 36.4%), with one individual also spike IgG positive. Unvaccinated PCR +ve children predominately raised antibody to spike IgA or IgG (14/16, 87.5%), with fewer N-protein IgA or IgG positive (4/16, 25.0%). Vaccinated, PCR -ve individuals predominately raised spike IgG (17/20, 85.0%) antibody, with fewer individuals positive for spike IgA (9/20, 45.0%), or N-protein IgA/IgG (9/20, 45.0%). Vaccinated PCR +ve individuals exhibited the highest rates of antibody positivity, all cases converted to spike IgG (20/20, 100%) and 70% were positive for spike IgA (14/20). The highest rates of positivity to N-protein IgA or IgG were seen in this group (12/20, 60.0%).
Discussion
In this study, we demonstrate that saliva (spit) samples can easily be collected and used reliably to detect recent SARS-CoV-2 infection in children and adults via the measurement of SARS-CoV-2 specific antibodies. In an unvaccinated population, we found assays measuring responses to the spike protein provided better discrimination between known negative (pre-pandemic) and known positive (PCR-confirmed) samples than anti-RBD and nucleoprotein assays. Machine learning analyses suggested that combining assays detecting the same antibody isotype against different antigens (N-protein, RBD and spike), particularly IgG, can further improve diagnostic performance, and to a lesser extent combining anti-spike IgA and IgG assays likewise. As expected, our observations suggest that detectable salivary IgA largely reflects mucosal immune responses following infection, whereas IgG may primarily reflect systemic immune responses. When field tested in household outbreaks, salivary antibody responses were a reliable indicator of recent infection and exposure. Our methods and results support the importance and feasibility of using saliva as a mucosal sample for monitoring SARS-CoV-2 infection and immunity both in individuals and in populations at scale.
The reported accuracy of antibody tests depends in part on the samples used in validation. We used a large and varied collection of 230 pre-pandemic samples collected from both children and adults in the UK and Europe across multiple years. Using these diverse cohorts, we established robust thresholds optimised to maximise specificity (∼98%), which were maintained when evaluated in a second set of samples. Intriguingly, we observed increased background reactivity in adults compared to children across the 5 pre-pandemic cohorts tested for all assays. This finding contrasts with others who have reported higher cross-reactivity with serum antibody to seasonal HCoVs in younger populations (children and adolescents) than in adults26, whilst others report no association with age20. These differences may reflect different trends in circulating viruses at the time of sample collection for each of the cohorts and/or differences between saliva and serum.
We observed significantly greater sensitivity for recent SARS-CoV-2 infection using assays for anti-spike compared to anti-RBD and N-protein, in line with other studies using serum and saliva21,7,9. The poorer performance of the RBD assays was surprising and contrasts with findings in serum where RBD can be used as a specific antigen for detection of SARS-CoV-2 infection, with responses mirroring those for spike1. This poor discriminatory performance was particularly notable for the RBD IgA assay. The cause of this is unclear, but others have reported similar findings with saliva samples22. One possible explanation may be that the pH of saliva alters antigen conformation, promoting non-specific binding. We report lower test sensitivity compared to serological tests (50.6% cf. ∼98%). This finding is perhaps expected given the intrinsic variation and lower antibody concentrations associated with mucosal samples21,9,12. Despite this, salivary samples offer a unique opportunity to measure both systemic and mucosal responses non-invasively, as well as directly to detect and quantify levels of respiratory virus23. Further work should consider alternative testing platforms that may provide improved test accuracy over ELISA24.
In households undergoing SARS-CoV-2 infection, salivo-conversion was observed as soon as 4 days post infection. Notably, most unvaccinated PCR -ve household members, who were all children, mounted detectable salivary IgA responses in the absence of IgG responses. Moreover, most (11/13) asymptomatic PCR +ve cases salivo-converted. Taken together, this suggests an early role for mucosal antibody in limiting infection25,12 and offers potential for enhanced surveillance in settings where PCR testing is limited26,12, 26.
This study has highlighted several considerations for future deployment of salivary antibody assays to SARS-CoV-2 and other infections. We observed variation in the type of salivary antibody responses and dynamics both within and between individuals, and both in magnitude and duration. Given the high reproducibility of the assays and control over sample collection methods, it is likely at least some of this reflects intrinsic variability in saliva as a biological sample: there are intra-and inter-individual differences in salivary flow rate, hydration state and gingival health11. Others have suggested to control for this by normalising to total immunoglobulin6,9,27,28, but this could be subject to the same inconsistencies, so that normalisation could amplify errors and/or mask specific responses29. Expressing concentration of antibody as a normalised OD (a ratio to a serum standard) is a simple expression that minimises intrinsic assay variation and laboratory workload for high-throughput surveillance. Subsequent interpolation and reporting in international binding antibody units/ml (BAU/ml) would allow for cross-laboratory comparisons and assay standardisation30.
We demonstrate that saliva samples are robust to sample handling and processing (heat inactivation and freeze-thawing): this has implications for immediate testing but also provides assurance for retrospectively analysing existing collections of samples with similar test platforms. Finally, using wild-type antigen1 we demonstrated applicability of assays to recent outbreaks when variants of concern (Delta and Omicron) were dominant. This has implications for future assay design, suggesting that, to date, wild-type antigen is robust in the face of new variants.
Our study has several limitations. We did not evaluate analytical specificity to other seasonal human coronaviruses (HCoVs) nor other respiratory viruses using our large pre-pandemic collection, where presence of antibodies to other confirmed coronaviruses may account for some false-positive results21. However, anti-spike salivary antibody responses have been demonstrated to be highly specific by others9,31. When we performed the test accuracy aspects of this study, we were unable to obtain 200 samples from recovered PCR-confirmed individuals as per MHRA guidelines, so estimates of test sensitivity are uncertain32. Finally, deployment of the best performing anti-spike assays for salivo-surveillance in vaccinated populations presents challenges: these assays cannot distinguish infected from vaccinated individuals, while anti-N-protein salivo-conversion appears to occur infrequently in infected individuals. Nonetheless, we did observe clear increases in salivo-positivity following infection in vaccinated individuals (in the household study), offering a potential means to identify periods of transmission when deployed in a mixed population.
Our findings emphasise the need for further work on understanding factors associated with SARS-CoV-2 mucosal antibody profiles and the heterogeneity in responses observed. Ongoing monitoring of mucosal antibody responses is essential for understanding transmission of SARS-CoV-2 and informing vaccination strategies, especially if future candidate vaccines are to be administered intranasally28,33,34. The rapidly increasing complexity of COVID-19 epidemiology globally requires tools to guide difficult policy decisions, especially for vaccination35, and for countries with limited data on population immunity. Antibody assays should continue to be evaluated in the populations they are deployed to, particularly in landscapes with high numbers of infections and varying levels of pre-existing immunity. Multiplex salivary immunoassays could achieve the best diagnostic discrimination36, and if developed to be affordable and high throughput, could offer a means for long-term salivo-surveillance in hard-to-reach settings. In summary, we present methods for detecting salivary antibody and demonstrate feasibility of approach for large scale salivo-epidemiology. This approach for monitoring infection and immunity, using saliva as an easily obtainable non-invasive sample, that can be assayed simply and affordably, has the potential to gather data in places where information is scarce.
Methods
Study participants
Individuals donating samples following confirmed, suspected or no SARS-CoV-2 infection were convenience samples, donated to the Bristol BioBank. ‘Known positives’ were those with PCR-confirmed SARS-CoV-2 infection, sampled at least 10 days post-test confirmation who responded to local, workplace advertisement. Details of symptoms, tests and other demographic and clinical information relating to the donor and their COVID-19 status were collected at the point of sampling using a case report form. Samples collected prior to SARS-CoV-2 emergence (‘known negatives’) were also accessed through the Bristol BioBank, alongside associated clinical and demographic information. Household members undergoing SARS-CoV-2 outbreaks were eligible to participate in the household study if one household member or close contact self-identified as SARS-CoV-2 positive (by PCR or lateral flow test). Participants were sampled as part of the CoMMinS study (COVID-19 Mapping and Mitigation in Schools; https://commins.org.uk). All household members were invited to take part for one week. If one or more saliva samples from the family were PCR positive in week 1, all participating family members were invited to continue sampling for 4 weeks. Details of symptoms, previous SARS-CoV-2 infection, vaccination history and other donor information were collected at consent using an online questionnaire and symptoms continued to be reported throughout the sampling period alongside saliva sampling.
Sample collection and processing
Ethics
Whole saliva from healthy donors (pre-and during the COVID-19 pandemic) was obtained via the Bristol BioBank (NHS REC 20/WA/0273) under the use application U-0042. Pre-pandemic (PP) sample cohorts were obtained in two ways. PP cohort 1 samples were collected in Portugal under local Ethics for a specific research study, remaining samples were stored and used for this work under NHS REC 13/NW/0439. PP cohorts 2-5 were collected under further Bristol BioBank deposit applications, and upon study completion these sample sets were deposited into the Bristol BioBank and released to this project under use application U-0042. Saliva samples were collected from household outbreaks during the CoMMinS study under NHS REC 20/HRA/4876.
COVID-19 samples
Whole saliva was collected from individuals who had recovered from COVID-19 (PCR-confirmed infections), suspected COVID-19 cases and healthy donors through the Bristol BioBank (NHS REC 20/WA/0273). Participants were instructed to not eat/drink/brush teeth/chew gum/use mouthwash for 30 min prior to saliva collection. Participants collected their own saliva by drooling into a funnel (Isohelix, Cell Projects UK,) over the top of a sterile collection tube up to a 2mL mark. Instructions provided included explanation of the difference between saliva and sputum. Collected specimens were promptly held at 4°C for ≤4h and transported to the laboratory for long term storage at -70°C. The standard operating procedure for saliva collection is given in the supplementary methods (Figure S9). Peripheral blood was collected into a SST vacutainer (BD Biosciences, USA) for serum extraction. Household members likewise collected their own saliva in the CoMMinS study; the technique was described to participants by telephone and an instruction leaflet was also provided.
Pre-pandemic samples
PP Cohort 1
In March 2014, paired nasopharyngeal swab and saliva samples were collected from children (aged 4 months – 6 years) attending day care centres in Coimbra, Portugal. Saliva samples were collected using foam polygon swabs (Rocialle UK), decanted into storage tubes and stored at -70ºC. Prior to sample collection participants were requested not to eat, drink or chew gum.
PP Cohort 2
During 2012-2013 fifty children aged 2–11 years were recruited to a longitudinal study, where, as part of the study, saliva samples were collected mainly using foam polygon swabs (Rocialle, UK), or some older children spat directly into a Falcon tube (Corning, USA). Saliva samples were transported at 4°C and frozen at -70°C within 4 hours. Saliva samples were collected at baseline when the child was admitted for routine adenoidectomy or adenotonsillectomy at Bristol Royal Hospital for Children, and then monthly at five subsequent time points at their home by a Research Nurse.
PP Cohort 3
Between 2006-2007 healthy adults were recruited to a study in which saliva samples were collected at four time points using foam polygon swabs (Rocialle, UK).
PP Cohort 4
Between 2007 – 2008 thirty-two healthy adults aged 18-40 years were recruited to a study in which saliva samples were collected pre and post vaccination with a meningococcal ACWY conjugate vaccine, using foam polygon swabs (Rocialle, UK).
PP Cohort 5
In August 2019 saliva samples were collected from six healthy adults on sequential days during the working weeks of that month. Participants drooled into a funnel (Isohelix, Cell Projects UK) that was placed inside a collection tube. Samples were frozen within 4 hours of collection at -70°C.
Conduct of immunoassays
Sample processing
Prior to running immunoassays, saliva was thawed on ice and centrifuged at room temperature for 5 minutes at 13,000 g. The supernatant was aspirated and aliquoted for heat inactivation. All saliva samples (pre-pandemic and convalescent) were heat inactivated at 56ºC for 30 minutes in a digital heat block (Sci-Quip, UK and Labnet USA) using validated methods.
Production of protein for ELISA
Production of antigens was performed according to methods performed and described previously37. SARS-CoV-2 trimeric spike protein ectodomain and the RBD of the spike protein were produced in insect cells as described 38. The spike construct consists of amino acids 1 to 1213 and with a C-terminal thrombin cleavage site, a T4-foldon trimerization domain followed by a hexahistidine tag for affinity purification. The polybasic cleavage site has been removed (RRAR to A) in this construct38. RBD from spike protein was also produced as described in Toelzer et al38. This construct contains SARS-CoV-2 spike amino acids R319 to F541, preceded by the native spike signal sequence (amino acid sequence MFVFLVLLPLVSSQ) at its N-terminus and followed by a C-terminus octa-histidine tag for purification. A codon-optimized, N-terminal His6 tagged full length nucleocapsid protein of SARS-CoV-2 was synthesized and cloned by GenScript into a pET28a bacterial expression plasmid, (called here pET28a-NP-FL). The pET28a-NP-FL plasmid was transformed into E. coli strain BL21 (DE3) and expressed.
Saliva ELISA
ELISAs were performed as previously described in Goenka et al37. Salivary antibodies specific for whole SARS-CoV-2 spike protein, for its receptor binding domain region and for the viral nucleocapsid protein were detected with an ELISA based on methodology described for serum1. Modifications were made following optimisation of assay parameters described below. Final assay conditions were as follows. Antigens were diluted in PBS and MICROLON® plates (Griener Bio-One) were coated with 10 mg/mL spike protein overnight at 4°C. Saliva supernatants were assayed singly, diluted at either 1 in 10 (IgA) or 1 in 5 (IgG) to a final volume of 100 mL per well. Secondary antibodies were used as follows with the dilution factor indicated: HRP conjugated anti-human IgG (Southern Biotech: 1 in 15,000) and IgA (Sigma: 1 in 20,000). Plates were developed with 1-StepUltra TMB-ELISA Substrate Solution (Thermo Fisher) for 20 minutes and the reaction was quenched with 2M H2SO4 (Merck). All incubations were temperature controlled at 24°C. Optical density (OD) was read at 450 nm (to measure signal) and 570 nm (background) using a BMG FLUOstar OMEGA plate reader with MARS Data Analysis software. The OD readings at 450 nm for each well were subtracted from the OD at 570 nm then corrected for the average signal of blank wells from the same plate; ODs reported are an average of duplicate wells per sample.
Serum ELISA
ELISAs were performed as previously described in Goenka et al 13 and Halliday et al, based on methodology described previously1. Spike, RBD and nucleocapsid were each diluted in sterile PBS (Sigma) and MaxiSorp plates (NUNC) were coated with either 10 mg/ml (spike) or 20 mg/ml (RBD; nucleocapsid protein) of protein overnight at 4°C before use. Plates were blocked with a 1-hour incubation in 3% Bovine Serum Albumin (BSA) (Sigma-Aldrich) in PBS with 0.1% Tween-20 (Sigma-Aldrich) (PBS-T) at room temperature. Serum samples were thawed on ice before use, tested in duplicate and diluted to a final volume of 100 µl per well at a pre-optimized dilution, either at 1 in 50 (IgA) or 1 in 450 dilution (IgG), in dilution buffer (1% BSA in PBS-T). All samples were tested on a single plate for each antigen and antibody isotype combination. Secondary antibodies were used as follows with the dilution factor indicated: HRP conjugated anti-human IgG (Southern Biotech: 1 in 25,000) and IgA (Sigma: 1 in 6,000-10,000). SIGMA FAST TM OPD (o-phenylenediamine dihydrochloride) (Sigma-Aldrich) was used to develop plates and reactions were stopped after 30 minutes with 3M HCl. ODs were read at 492 nm and 620 nm using the same reader used for salivary ELISAs.
QC material
Pooled sera and saliva
To facilitate assay standardisation and longitudinal monitoring of results, a serum standard pool of known antibody level was run on all serum and saliva ELISA plates. This was generated by combining sera from 3 individuals with PCR-confirmed COVID-19 infections. Aliquots of this standard were created and stored at -70°C to ensure consistent performance. High and low saliva quality control pools were generated to enable assay variation to be monitored between plates and over time. The saliva high control pool was generated using large sample volumes collected from three individuals with PCR-confirmed COVID-19 infections. The low saliva control pool was generated from saliva from two healthy donors who had no known COVID-19 infection history and low antibody levels on all assays. Inter-assay variation was monitored in serum ELISA using two serum standards of differing antibody levels.
Salivary ELISA development
Comparison of plate type
Reactivity and background binding was compared for five different plates using the spike IgA assay: MaxiSorp, (Fisher Scientific, USA), Immulon 1B (Thermo Scientific, USA), MICROLON® plates (Griener Bio-One, Austria), Polysorb (Thermo Scientific, USA) and Universal binding (Thermo Scientific, USA). MaxiSorp plates were hydrophillic with high binding potential, whereas the remaining 4 plates were hydrophobic with medium binding potential. Saliva was assayed at a single dilution (1 in 10) in duplicate on either an uncoated plate (no antigen; coated with PBS only) or coated with spike protein at 10 mg/mL. One negative (healthy donor) and one positive (clinically suspected COVID-19 donor) saliva sample were each assayed in duplicate. The plate which exhibited the lowest background binding when uncoated, as well as enhanced discrimination when coated with antigen, was selected as optimum (MICROLON® plate, Griener Bio-One, Austria).
Optimisation of assay conditions
Antigen coating concentration was optimised based on responses to N-protein, spike and RBD IgA by testing saliva collected from negative (healthy donor) and positive (clinically suspected or for N-protein, PCR-confirmed) donors, together with a positive serum pool (3 PCR-confirmed donors) over 4 different antigen coating concentrations: 1, 5, 10 and 20µg/ml. By selecting the point on the dose-response curve where the quantity of antigen saturated the plate, 10µg/ml was determined to be optimal for each antigen. Using checkerboard titrations, we determined the optimum secondary antibody and sample concentration by choosing the combination which gave the best discrimination between negative and positive samples. Secondary antibody was titrated from 1 in 5,000 to 1 in 30,000; sample was diluted 3-fold from 1 in 3 to 1 in 2,430. A TMB development time of 20 minutes was optimised to allow for optimal discrimination between positive and negative samples and high throughput plate processing.
Effect of heat inactivation and multiple freeze-thaw cycles on reactivity
To test the effect of inactivation on antibody signal, and the sensitivity of samples to modifications in the duration of heat inactivation, we assayed saliva samples either untreated; heat inactivated according to standard biosafety conditions: 56°C for 30 minutes; or for increased durations of 56°C for 45 minutes and 56°C for 60 minutes. All samples were covered with parafilm during heat inactivation, centrifuged briefly to release condensation from the lid, then transferred to wet ice before returning to the freezer. To test the effect of freeze-thawing saliva samples on antibody signal, we subjected saliva samples to either 2, 4 or 8 rounds of freeze-thaw. Samples of equal volume (65µl each) were frozen at -70°C and thawed on wet ice (∼60 minutes) and remained thawed on wet ice for 1h before re-freezing at -70°C (total time thawing/thawed = 2h).
Threshold setting and evaluation of assay performance in a prospective test accuracy study
The test accuracy component of this study is reported following STARD guidelines. The completed STARD checklist is given in Supplementary Table S14.
Allocation of samples to the threshold and validation set
Sample numbers were decided by the availability of samples required to address the study aims, with awareness of MHRA guidance stipulating a requirement of at least 200 confirmed positive cases and 200 confirmed negative cases to estimate ≥98% sensitivity and ≥98% specificity32. Saliva samples collected pre-pandemic (known negatives) and from recent PCR-confirmed cases (known positives) were spilt 50:50 across two sample sets: a threshold set, used to determine thresholds for positivity and a validation set, for evaluating assay performance. A total of 346 saliva samples belonging to 228 unique donors, of which 52 donors had repeat samples were considered in allocations. We assigned 84/346 (24.3%) of these samples to the threshold set as they were assayed during assay development. Samples not assayed as part of development were randomised to the threshold set so that 50% of total cases and 50% of total controls appeared in threshold and validation sets. Stratified random sampling considered the following strata and the number of samples randomly sampled from each stratum to the threshold set: asymptomatic PCR-confirmed (n=4); symptomatic PCR-confirmed (n=12); adult pre-pandemic (n=22) and child pre-pandemic (n=61). The final allocation of samples and characteristics in the threshold and validation set is shown in Table 1.
Setting thresholds for positivity
Threshold set samples (n=160) were assayed in a four-point 3-fold dilution series singly starting at either 1 in 10 for IgA or 1 in 5 for IgG against N-protein, RBD and spike. Discrimination between positive and negative samples by each antigen/secondary combination was largely independent of dilution and discrimination was slightly improved at higher concentrations without reaching saturation, thus informing proceeding with the top dilution in validation set testing. Receiver operator characteristic (ROC) curves were constructed for each of the 6 assays using threshold set samples and four thresholds were set: those to achieve 97%, 98% and 99% specificity among the known negative population, and that which maximised the Youden’s index. ROC curves were used to evaluate trade-offs in sensitivity and specificity of threshold set samples.
Estimation of test accuracy
Validation set samples (n=160) were assayed at a single point dilution in duplicate (1 in 10 for IgA; 1 in 5 for IgG). Clinical information and index test results were not available to the assessors of the reference standard. This was facilitated by assaying validation samples in a blinded fashion. RBD IgA and IgG assays were dropped from evaluation due to poor performance in the threshold set. Performance was evaluated for the N-protein and spike IgA and IgG assays using ROC curve analysis on validation set samples only, or to increase precision, threshold and validation set samples combined (n=320). A sensitivity analysis was performed comparing the validation vs full sample set to assess the impact of combining samples on performance estimates. Individuals with multiple samples were not de-duplicated and all samples were included in estimates of test accuracy. A sensitivity analysis was performed comparing estimates of assay performance (AUC, specificity and sensitivity) including all samples (i.e., the primary analysis) with results based on analysis of the first sample donated by each individual only. Repeat samples from the same donor were found to have little impact on test performance in sensitivity analysis, so all samples were included to estimate test accuracy accordingly (Table S5 and Table S6). There were no indeterminate index or reference standard test results since this was not a category, test results were either positive or negative. Samples with volumes too low to assay were excluded from ROC analysis. Positive and negative predictive values at population prevalences of 0.1, 1, 5, 10, 20 and 40% previous SARS-CoV-2 infection were modelled. The variability in diagnostic accuracy was assessed by examining the association of false positivity with age and sex, and false negativity by time since symptom onset and symptom status (categorised as asymptomatic; 11 – 21 days post symptom onset; 22 – 43 days, 44 – 70 days; and ≥71 days).
Correlating mucosal and systemic antibody
To investigate salivary and serum responses in paired samples, serum samples for which saliva was collected on the same day were assayed for antibody specific for SARS-CoV-2. Due to low sample volumes, the final number of samples tested for each of the 6 assays differed: spike protein IgA = 97 and IgG, n = 81; RBD IgA = 35 and IgG = 33; nucleocapsid protein IgA = 91 and IgG = 80.
Detection of SARS-CoV-2 infection by RT-qPCR on saliva
Saliva samples collected in household outbreaks were tested for the presence or absence of SARS-CoV-2 using a PCR protocol that was developed and optimised in-house. In brief, a 90µl aliquot of each neat saliva sample was chemically lysed using L6 Lysis Buffer (20-8600-15, Severn Biotech Ltd.). A MS2 RNA bacteriophage internal control was added, and samples were extracted using the QIAsymphony SP automated system (QIAGEN) or KingFisher Flex Purification System (ThermoFisher Scientific) following the manufacturers’ instructions. Total nucleic acid was eluted in 60µl or 50µl of which 10µl was used in RT-qPCR using the SARS-CoV-2 N6/E and MS2 probe and gene primers (Metabion). SARS-CoV-2 E gene primers and probe were as previously described39. SARS-CoV-2 N6 gene primers and probes were designed using Primer3 and a consensus multiple sequence alignment of 658 SARS-CoV-2 N gene sequences downloaded from GenBank240. Full sequences and primer/probe concentrations are given in Table S15. Each PCR reaction well contained 6.25µl of TaqPath 1-Step RT-qPCR Master Mix, CG (ThermoFisher Scientific), 1µl of 25X primer and probe mix, 7.75µl of molecular grade water and 10µl of total nucleic acid extract. The QuantStudio 7 Real-Time PCR System (Applied Biosystems) was used for RT-qPCR where thermal cycling consisted of: 25°C for 2 mins, 50°C for 15 minutes and 40 cycles of 95°C for 10 seconds, 60°C for 30 seconds. Samples producing a cycle threshold (Ct) ≤35 were considered positive.
Case definition
For assay development and test accuracy, healthy donors self-reported no SARS-CoV-2 history or symptoms; suspected cases reported symptoms with an epidemiological link but SARS-CoV-2 infection unconfirmed; PCR-confirmed cases reported a RT-PCR positive test performed on a nose/throat swab through NHS testing; pre-pandemic controls were collected at least 6 months prior to SARS-CoV-2 emergence. For analysis of household outbreaks, we categorised index cases and household contacts into PCR positive or PCR negative at any point in the study. PCR was performed on the same saliva sample tested for antibody; positivity was set on a Ct value ≤35. Index cases were those that originally self-reported a positive PCR or LFT result and on enrolment had two consecutive PCR positives.
Data and statistical analysis
All statistical analyses were performed using the R-studio environment, with the library ‘tidyverse’ for data manipulation and summary statistics, ‘pROC’ for ROC analysis and ‘binom’ for estimating binomial confidence intervals. The libraries ‘ggplot2’, ‘patchwork’, ‘cowplot’ and ‘ggstatsplot’ were used for data visualisation. Antibody levels were expressed as a normalised optical density (Norm OD) by dividing the mean background-corrected OD of duplicate test samples by the mean background-corrected OD of the duplicate top dilution of the standard. Assay reproducibility was assessed by calculating the coefficient of variation for controls tested in duplicate on the same plate (intra-assay variation) and between plates (inter-assay variation) using plates run in the household study. 95% confidence intervals for AUC were calculated using DeLong’s method41 or computed with 10,000 stratified bootstrap replicates for sensitivity and specificity estimates. Antibody responses were compared across multiple groups using the Kruskal-Wallis test with post-hoc testing using Dunn’s test. A Bonferroni correction was applied for multiple pairwise comparisons. Significance was defined as p≤0.05. Kendall’s Tau correlation coefficient and associated P value were calculated for salivary and serum antibody correlations. AdaBoost classifiers were trained to predict positive and negative individuals and model performance was measured by calculating ROC AUC (area under the receiver operating characteristic curve) scores. Model training and testing were performed as part of a 5-fold cross-validation loop. Machine learning analysis is available as a Jupyter notebook at https://github.com/Bristol-UNCOVER/Saliva_data_ML_analysis/blob/main/Saliva_dataset_analysis.ipynb.
The AdaBoost algorithm was imported into the notebook from the Python package scikit-learn42, full details on dataset construction, classifier training and performance is given in supplementary information ‘Methods for machine learning analysis’. To determine rates of salivo-positivity in the household study, the proportion of individuals with antibody above the threshold for positivity (final thresholds given in Table 2) were divided by the total number of individuals sampled, stratified by infection status (PCR positive/negative during the study) and/or vaccination. Rates of salivo-conversion were calculated based on an individual becoming antibody positive following antibody negativity at Day 0; those who were antibody positive at Day 0 were removed from the denominator.
Data Availability
All data produced in the present study are available upon reasonable request to the authors. Machine learning analysis is available as a Jupyter notebook at https://github.com/Bristol-UNCOVER/Saliva_data_ML_analysis/blob/main/Saliva_dataset_analysis.ipynb.
https://github.com/Bristol-UNCOVER/Saliva_data_ML_analysis/blob/main/Saliva_dataset_analysis.ipynb
Data availability statement
The data shown in the manuscript is available upon request from the corresponding author.
Conflict of interest
AF is a member of the Joint Committee on Vaccination and Immunisation, the UK national immunisation technical advisory group and is chair of the WHO European regional technical advisory group of experts (ETAGE)on immunisation and ex officio a member of the WHO SAGE working group on COVID vaccines. He is investigator on studies and trials funded by Pfizer, Sanofi, Valneva, the Gates Foundation and the UK government.
Author Contributions
AT, EBP, AF, MB and AH conceived the study. AT, EO, HB, JS, BH, UO, AH, HA and DS performed ELISA experiments. KG, NB, KV, FR, ATo and IB produced antigen. AT, KS, AH, HB, AL, EO, HJ, MB and EBP carried out computational analysis. BMA, KD and AD performed PCR experiments. AT, EO, JS, BH, JO, BMA, FR, RB, LC, GGL, HD, AG and the CoMMinS Study Team collected and managed samples. RB, LC, NG, GGL, HD, AG and the CoMMinS Study Team collected clinical/demographic data. EBP, CR, ATo, DW, IB, AD and KGi contributed resources. EBP, MB, AF and AH supervised. AT, EO, HB, KS, HJ, MB, EBP and AH prepared the original draft. All authors interpreted data, reviewed and edited the manuscript.
Acknowledgements
We thank the Bristol BioBank for enabling access to pre-pandemic samples, the researchers involved in originally collecting these samples, and the study participants who agreed to donate them for future research. We acknowledge Helen Thompson for supporting organisation of COVID-19 Bristol BioBank clinics and Drs Jane Metz, Khuen Foong Ng, Charlie Plumptre and Jill King for supporting sample collection. We acknowledge Drs Philippa Lait and Chris Helps for support in establishing and evaluating SARS-CoV-2 RT-qPCR testing. We thank Bristol UNCOVER group for supporting discussion of method development and interpretation of results. We acknowledge funding support from The University of Bristol and the Elizabeth Blackwell Institute supported by Bristol Alumni and Friends for equipment and reagents to conduct assay development and test accuracy studies. Deployment of assays to household outbreaks in the CoMMinS study was supported by the MRC [MR/V028545/1]. AT is supported by the Wellcome Trust (217509/Z/19/Z) and UKRI through the JUNIPER consortium MR/V038613/1 and CoMMinS study MR/V028545/1. E.B.P. was partly supported by the NIHR Health Protection Research Unit (HPRU) in Behavioural Science and Evaluation. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health. The NIHR had no role in writing the manuscript or the decision to publish it. E.B.P. is funded via the JUNIPER Consortium (MRC grant no. MR/V038613/1) and MRC grant no. MC/PC/19067. NJT is a Wellcome Trust Investigator (202802/Z/16/Z), is the PI of the Avon Longitudinal Study of Parents and Children (MRC & WT 217065/Z/19/Z), is supported by the University of Bristol NIHR Biomedical Research Centre (BRC-1215-2001), the MRC Integrative Epidemiology Unit (MC_UU_00011/1) and works within the CRUK Integrative Cancer Epidemiology Programme (C18281/A29019).