Abstract
Little is known about the quality of polyclonal antibody responses in COVID-19 patients, and how it correlates with disease severity or patients’ prior exposure to other pathogens. The whole polyclonal antibody repertoire in a retrospective cohort of 538 individuals was mapped against SARS-CoV-2 spike (S) glycoprotein, the main target of antibody immune responses in SARS-CoV-2 infection. Bioinformatic predictions identified 15 major B cell epitopes for S of SARS-CoV-2. Several epitopes localised in RBD of S including those spanning the ACE2-binding site, the highly conserved cryptic epitope of the neutralizing antibody of SARS-CoV, and fusion/entry domains of HR1 and HR2 of S protein of SARS-CoV-2. Intriguingly, some of these epitopes have cross-reactivity to antigens of common pathogens, potentially affecting SARS-CoV-2 infection outcome. High level of anti-Spike SARS-CoV-2 seroreactivity in populations with no history of exposure to SARS-CoV-2 is of clinical relevance and could underpin better understanding of COVID-19 pathophysiology in different populations and provide a blueprint for design of effective vaccines and developing better strategies for antibody testing.
Intorduction
COVID-19 pandemic has caused unprecedented health and economic challenges. A large proportion of people infected by SARS-CoV-2 can be asymptomatic1, whilst in those with symptoms, the disease can cause a spectrum of clinical manifestations from mild upper respiratory tract illness to cytokine storm and severe pneumonia leading to respiratory failure and death2. Older age and a number of co-morbidities (e.g. diabetes, cardiovascular diseases)2-7 have been associated with the development of more severe disease however the underlying mechanisms of poor clinical outcome have remained elusive. Different patterns of immune response in COVID-19 patients are linked both to full recovery (e.g. emergence of antibody secreting cells and SARS-CoV-2 binding antibodies) and to poor clinical outcome (immune dysregulation and prolonged inflammation)8 and death. However, the factors driving development of beneficiary versus harmful immune responses are unclear. One of the key features of the human immune system is its memory. First exposure to many of the common human viruses usually occurs during childhood or adolescence that shapes our immune memory and the immune response to self and environmental antigens throughout life9,10. The immune system’s response to a new pathogen can be influenced by its memory of the response to a variant of that pathogen or a cross-reactive antigen. This is also known as ‘original antigenic sin’ (OAS)11, which could lead to either better protection against infection, or to a higher susceptibility and poor clinical outcome. This phenomenon is probably best shown in the context of many viral infections such as influenza and influenza vaccines12-14.
In this study, we investigated the presence of seroreactivity to SARS-CoV-2 major antigenic epitopes of spike (S) glycoprotein15,16 using a retrospective cohort of pre SARS-CoV2 patients (Table 1). Serum samples from different individuals with varied disease background, with a potential link to COVID-19 severity2,4-7, and healthy controls collected between 2011-2019 (see Methods) were used to investigate the existence of immunereactivity against SARS-CoV-2 main antigenic epitopes using Mimotope-variation analysis (MVA) (Methods and Supplementary Figure 1).
Results and Discussion
The more abundant and shared peptide antigens of MVA immunoprofiles were clustered into core sequences (motifs) either via group-discriminatory identification (FEP and FEP_HC, SZ and SZ_HC, MS) or non-discriminatory selection (PD-CAD, MI and MI_HC, T2D, BC). Altogether, a set of 22,949 motif sequences were selected, representing the most abundant and common antibody immune response features across age, gender, and different clinical backgrounds (Supplementary Information).
Using the data from more prominent immunoprofile features from 538 serum samples, we predicted 15 epitopes within S that were specifically immunodominant (Figure 1 and Supplementary Table 1). Using simulated random alignment data, epitopes predicted by MVA were statistically significant (p<0.05) (Figure 1). Of these, 10 overlapped with the 29 CoV-2 reactive B cell epitopes17 with average overlap of 60.1% per epitope, due to shorter length of MVA-predicted epitopes. Four of the epitopes (Epitopes 4, 5, 6, and 7, (Ep.4, Ep.5, Ep.6, Ep7)) were mapped to the receptor binding domain (RBD), spanning amino acids 319 to 542 of S that is a crucial antigenic attachment region for binding with ACE218 (the main receptor for host cell infection) (Figure 1 and Supplementary Figure 2). In addition to positioning within the RBD, Ep.5 and Ep.6 overlap with the key amino acids important for binding to human ACE218 (Figure 1). Ep.4 and Ep.7 overlap with either the N- or C-terminal region, respectively, of the discontinuous epitope described for CR3022, a neutralizing antibody previously isolated from a convalescent SARS-CoV patient which can also bind the RBD of SARS-CoV-218 (Figure 1). Interestingly, Ep.4 includes P384 residue, and Ep.7 contains H519 (mis valkude järjestuse järgi aminohappe pos on määratud?) which are non-conserved between SARS-CoV and CoV-2 (Supplementary Fig 2). Ep.14 co-localizes with the fusion/entry domains of HR1 and Ep.15 is directly N-terminal to HR2 of S SARS-CoV-219 (Figure 1).
Next, we characterized the immunoreactivity patterns of the predicted epitopes of S across the whole cohort divided by different comorbidities and age (Figure 2). All examined epitopes, Ep.1 to Ep.15 exhibited immunoreactivity profiles that were highly individual-specific (Figure 2A). Exploratory data analysis revealed associations between immunoreactivity profiles and subject age, gender or clinical backgrounds (Supplementary Table 2), warranting more focused analysis.
We did not detect any significant differences in neutralizing response (Ep.4 and Ep.7) among subjects older or younger than 61 years old (Figure 2B Kruskal-Wallis test, p>0.05 and Supplementary Figure 3). The abundance of antibodies predicted to interact with Ep.5 was higher among older subjects 61≤ than younger subjects aged 41-60 or ≤40 (Mann-Whitney U, ****p<0.0001) (Figure 3A). Intriguingly, Ep.5 is within the RBD of SARS-CoV-2 S protein18 (Figure 1) and contains overlapping key amino acids important for binding to ACE2 (Supplementary Figure 2). In Ep.5, the amino acids important for binding ACE2 are 100% conserved between SARS-CoV and CoV-2, with the exception of SARS-CoV-2 F456 -> SARS-CoV L443 transformation not targeted by aligned motifs (Supplementary Figure 2). Response against Ep.1 and Ep.8 of S was also found higher among older subjects aged 61 ≤ (Mann-Whitney U, **** p<0.0001) (Figure 3A). Response against Ep.6 of S was higher among younger subjects aged ≤40 or 41-60 than in subjects aged 61≤ (Mann-Whitney U, **** p<0.0001) (Figure 3B).
Antibody response against Ep.12 and Ep.14 of S2 of S was also statistically higher in younger subjects (Mann-Whitney U, * p<0.05, ** p<0.01, *** p<0.001) (Figure 3B), whereas response to remaining predicted S epitopes was not substantially different across age groups (Supplementary Figure 3).
Next we analysed frequency of antibody responses to different epitopes in subjects with hypertension (HT), cardiovascular diseases (CVD) and subjects who smoked cigarettes. Ep.2, Ep.5, Ep.8, Ep.9, Ep.11, and Ep.13 were statistically (Mann-Whitney U) higher among subjects with HT (Figure 4 and Supplementary Figure 4 and Figure 5). An interesting difference was observed in male and female subjects with HT (Figure 4 and Supplementary Figure 6). The higher response to Ep.5 of S RBD was preferentially in males with HT (Mann-Whitney U, * p<0.05, Figure 4), whereas the HT-associated higher response to Ep.8 was common to both sexes, however the male subjects with HT were found to have higher response to Ep.8 compared to female subjects with HT (Figure 4).
On the other hand, males with coronary artery disease (CAD) or MI showed statistically higher response to Ep.3 of S compared to females, whereas females with either CAD or MI showed significantly reduced response against Ep.6 of S compared to males (or healthy controls, Figure 4, Supplementary Figure 7). Cigarette smoking is a risk factor for both HT and CVD. However, some epidemiological observations suggest daily cigarette smokers have a lower probability of developing severe COVID-19 than the general population20,21. In the clinical cohort of CAD (n=96), predicted immune response to both Ep.6 and Ep.13 of S was identified as significantly higher among cigarette smokers than in non-smokers (Figure 4).
Taken together, our analysis predicited Ep.1 and Ep.3 (NTD of S), Ep.5 (RBD of S), Ep.8, and Ep.13 (S2 of S) as Spike epitopes linked to age, HT, CVD and tobacco use, all of which are confounding factors of COVID-19 severity. Although associated with tobacco use and gender-related cardiovascular condition, notably higher response to Ep.6 was correlated with younger age.
The observation of shared high immunogenicity of predicted epitopes of S suggested that regions in S contained evolutionarily and/or structurally related immunogenic targets. B cell-specific response potentially able to cross-react with S of SARS-CoV-2 might arise from exposure to other (beta-) coronaviruses commonly infecting humans22, or common human pathogen not evolutionarily related to coronaviruses but with protein structural similarity14,23. Sequence alignment analysis with common human beta-coronaviruses and across the human viral pathogenome (https://www.uniprot.org/help/uniprotkb) revealed substantial sequence similarity conservation of SARS-CoV-2 epitopes with other endemic CoVs (i.e. CoVs OC43 and HKU1) and other common pathogenic antigens, most notably human cytomegalovirus and human herpesvirus 6B (Supplementary Figure 8, Table 3), offering a mechanism for pathogenic cross-reactive mimicry. Exposure to many of the common human viruses usually occurs during childhood or adolescence that shapes our immune memory and modulates the subsequent response to self and environmental antigens later in life9,10.
The cell biology of SARS-CoV-2 is so far less known than that of other coronaviruses. Emerging evidence show that binding of SARS-CoV-2 to ACE2 on the surface of target cells such as lung endothelium is essential for viral infectivity. Our data reveal major epitope-specific responses to SARS-CoV-2 in an infection naïve population. Most interestingly, some of these antibodies overlap with crucial antigenic attachment region of SARS-CoV-2 S binding with ACE2, some with that of the neutralizing antibody (CR3022) of SARS-CoV-2, and some with cell fusion/entry domains HR1 and HR2. Our data also highlight the age-linked differential patterns of immune response to S, with individuals aged >60 being more apt to (suboptimal) antibody responses potentially affecting ACE2-S protein RBD of SARS-CoV-2 interactions (Ep.6, Figure 1 and 3).
These findings indicate that it is difficult to distinguish exposure to SARS-CoV-2 from other CoVs and common pathogens in serological studies using SARS-CoV-2 S as antigen. This implies that epitope-specific diagnostic assays will need to be designed. Our data further lends credence to the view that immunological memory covers families of antigens rather than a particular member only involved in the primary response, causing a rapid anamnestic response, at the expense of specificity24,25. These results also underscore the possibility of antigenic sin, shaping the immune response to SARS-CoV-2, which in turn could influence disease progression and the clinical outcome of COVID-1914.
Methods
Clinical cohorts
To investigate pre-existing immunoreactivity able to cross-react with the SARS-CoV-2 main antigenic epitopes on S protein, we examined the antibody immune profiles of several relevant clinical cohorts (Table 1). Blood samples were collected prior to SARS-CoV-2 emergence (2011 to September 2019) under local ethical approvals.
Mimotope-variation analysis
For qualitative and quantitative characterization of humoral immune response from blood samples, we used an in-house developed mimotope-variation analysis (MVA) method as described previously26. In brief, a random 12-mer peptide phage library (Ph.D.-12, NEB, UK) was used according to the manufacturer’s protocol. Two μl of serum samples were incubated with 2.5μl library (~5 × 1011 phage particles) and immunoglubulin G (IgG) fraction was recovered by using protein G-coated magnetic beads (Thermo Fisher Scientific). Captured phage DNA was analysed by using next generation sequencing (Illumina HiSeq, 50-bp single end reads) with barcoded primers for sample multiplexing. To evaluate the reproducibility of proteomics data, we compared peptide abundance in two replicates using Pearson correlation coefficient, which was 0.985 (p<0.0001) (Supplementary Figure 1).
Data analysis
Peptides in a sample dataset were trimmed to 3 million reads (RPM units). SPEXS2 exhaustive pattern search algorithm26 was used to cluster similar peptides and reveal recognition patters (epitope motifs) that would be enriched in studied peptide sets (Documentation: https://github.com/egonelbre/spexs2)). The identification of core motif sequences was either performed discriminatorily, where selected peptide sets were studied for enrichment compared to a control group, or non-discriminatorily, where enrichment was identified compared to a random-generated peptide set (hypothesis-free). In brief, for each clinical group, 100,000s of distinct peptide antigens were used as input, from which 1000s of distinct core epitope motifs were identified with SPEXS2 algorithm. The motifs were collected together to generate a broad screening set of 22,949 motifs, representing different age groups, genders, and clinical background. The 22,949 motif sequences were aligned to primary sequence of SARS-CoV-2 Spike glycoprotein (S) (UniProtKB accession P0DTC2). A detailed description for clustering workflow is in Supplementary Information.
Sequence alignment
SARS-CoV-2 proteome and Spike (S) glycoprotein (P0DTC2) were obtained from www.viralzone.expasy.org/8996 (date accessed 25.03.2020). To predict immunogenic regions on S, in-house built custom algorithms were used for sequence alignment. Primary sequence of S was scanned with 22,949 distinct motifs with criterium of ≥4 exact-matching amino acid positions. Random reference profile was generated by scanning with shuffled-sequence motifs (≥4 exact positions). Random alignment distribution was simulated with 3 independent alignments with shuffled motifs. Specific alignments above 0.95 percentile of simulated distribution were statistically significant (p<0.05). Alignment profiles were visualized using custom Excel VBA scripts and MS Office Excel.
Predicted B cell epitopes for the SARS-CoV-2 virus were published by Grifoni and colleagues17. Amino acids on Spike glycoprotein (S) of SARS-CoV-2 important for binding to human ACE2 were recently reported by different groups and anti-Spike RBD neutralizing antibody CR3022 discontinuous epitope were reported by Yuan and colleagues18. Domains of S were accessed from recently published data15,18.
SARS-CoV S glycoprotein primary sequence (P59594) was accessed from UniProtKB (date accessed 25.03.2020). To assess conservation of S from SARS-CoV-2, SARS-CoV, OC43 and HKU1 sequence alignment was performed with Clustal Omega tool (version 1.2.4) with default settings (https://www.ebi.ac.uk/Tools/msa/clustalo/).
Reference proteins for SARS-CoV-2 Spike glycoprotein (S) alignment with common human pathogens were obtained from UniProtKB with keywords “human” + “virus” (date accessed: 16.12.2019). The primary sequences of reference proteins, were first scanned with 111 SARS-CoV-2 S protein motifs with the criterium of an exact match. The reference proteins, which gave a match with S protein motif alignment, were consequently scanned with predicted S protein epitope sequences with the criteria of ≥5 exact-matching amino acid positions. Alignment procedures were carried out using custom Excel VBA scripts and MS Office Excel. Known epitopes of reference proteins that showed potential for cross-reactions with SARS-CoV-2 S protein were obtained from Immune Epitope Database’s full epitope list (epitope_full_v3.zip) (accessed 29.04.2020).
Statistical analysis
All statistical analyses (ANOVA, t-Test, correlation analyses, ROC analysis) were done using R statistical programming language and RStudio environment (RStudio Team, 2015; R Core Team, 2017). Boxplot graphs were produced and visualized using packages “reshape2”, “dplyr”, and “ggplot2” 2020 versions 27. For visualization of peptide abundance across the sample set, peptide frequencies were converted to base 10 logarithmic values and intensity plots were graphed in MS Office Excel.
Data Availability
The data that support the findings of this study are available on request from the corresponding author (KP). The data are not publicly available due to containing information that could compromise research participant privacy/consent. Any materials that can be shared will be released via a material transfer agreement.
Data Availability
The data that support the findings of this study are available on request from the corresponding author (KP). The data are not publicly available due to containing information that could compromise research participant privacy/consent. Any materials that can be shared will be released via a material transfer agreement.
Ethics Statement
Immunoprofiling of the samples has been approved by the institutional ethics committees, and all subjects provided written informed consent.
Author Contributions
AR and NP contributed equally to the work as 2nd authors. KP, MJ, AR, NP, and AMG conceived and designed the study. AP, MP, LH, EV, MB, FL, EK, PP, PT, AV and DL collected samples and acquired the clinical data. KP, MJ, AR, NP, AP, HS, AA and AMG analyzed and interpreted the data. EV, PT, DL, TT and KP supervised the study. KP and AMG drafted the manuscript. All authors revised and approved the final manuscript for submission.
Funding
This study was supported by institutional research funding grants of Protobios (5.1-4/20/170, and PRG573) from the Estonian Research Council and H2020-MSCA-RISE-2016 (EU734791) from the European Union, Helsinki University Hospital grants, Mary and Georg C. Ehrnroth Foundation, Finnish Eye Foundation. KP, AA, and TT were partially supported by the institutional research grant IUT19-18 of TUT from the Estonian Research Council. This project has also received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement 760921 (PANBioRA).
Acknowledgments
Mirjam Luhakooder, Helen Verev, Susan Pihelgas and Maila Rähn are thanked for excellent technical assistance.