Abstract
On the backdrop of ongoing Delta variant infection and vaccine-induced immunity, the emergence of the new Variant of Concern, the Omicron, has again fuelled the fears of COVID-19 around the world. Currently, a very little information is available about the S glycoprotein mutations, transmissibility, severity and immune evasion behaviour of the Omicron variant. In the present study, we have performed a comprehensive analysis of the S glycoprotein mutations of 309 strains of the Omicron variant and also discussed the probable effects of observed mutations on several aspects of virus biology based on known available data of mutational effects on S glycoprotein structure, function, and immune evasion characteristics.
Introduction
Increased transmissibility and high mutation rate of SARS-CoV-2, the causative agent of COVID-19, led to the emergence of multiple variants of concern (VOCs) characterized by the presence of genetic changes which are known to affect virus characteristics such as transmissibility, disease severity, immune escape, and diagnostic or therapeutic escape. Till mid-November 2021, there were four VOCs called Alpha (B.1.1.7), Beta (B.1.351), Gamma (P.1), and Delta (B.1.617.2). On 26 November 2021, WHO designated the SARS-CoV-2 variant B.1.1.529 as a new VOC and named it Omicron. This decision was made based on the advice of WHO’s Technical Advisory Group on Virus Evolution (TAG-VE) recommendation that Omicron has numerous mutations, some of which are shared with other VOCs, that may enhance transmissibility, disease severity, immune escape, and diagnostic or therapeutic escape, though no direct evidences exist. WHO was informed about the emergence of the variant B.1.1.529 on 24 November, 2021 from South Africa. South Africa is currently dealing with the third wave of COVID-19, which is mainly dominated by the Delta variant. Infections have increased dramatically in recent weeks in South Africa, coinciding with the discovery of the B.1.1.529 strain. However, whether this increased case of COVID-19 is due to Omicron or other factors needs to be verified by genome sequencing and epidemiological studies. The first known infection of B.1.1.529 was confirmed from a specimen collected on 9 November 2021 [1, 2].
SARS-CoV-2 genome codes for multiple proteins, including the spike (S) glycoprotein that protrudes from the virus envelope [3]. The S glycoprotein plays crucial role in the very early stage of virus life cycle that includes virus attachment to the host cell surface, membrane fusion and entry into the host cell [3-10]. The S glycoprotein, as a surface protein, is the primary target of neutralizing antibodies elicited by the host adaptive immune response [4, 11-14]. In the constant tug of war between the host and the virus, virus strains with S glycoprotein mutations that facilitate virus entry and/or help the virus escape neutralizing antibodies are frequently selected and eventually predominate. In the lights of crucial role of S glycoprotein in virus infection and host immune evasion, scientists have prioritized mutations that have been emerged within the S glycoprotein of circulating SARS-CoV-2 strains and also investigated the biological significance of those mutations [15, 16]. In the present study, we performed a comprehensive analysis of S glycoprotein mutations of the Omicron variant and also classified them into different groups based on different combination of coexisting S glycoprotein mutations.
Materials and Methods
Retrieval of genome sequences of the Omicron variant deposited in GISAID
For retrieval of SARS-CoV-2 genome sequences of the Omicron variant, we accessed to Global Initiative on Sharing All Influenza Data (GISAID) on 2 December 2021 [17]. By applying filter on Variant (VOC Omicron GR/484A), we observed that a total 309 genome sequences of the Omicron variant has been submitted. We downloaded all these genome sequences from GISAID for further analysis. Name of all the Omicron variants were presented in Table S1. The genome sequences of the prototype SARS-CoV-2 strain hCoV-19/Wuhan/WIV04/2019 (GISAID accession no. EPI_ISL_402124) was also downloaded from the GISAID database for the purpose of mutational analysis.
Mutational analysis of the S glycoprotein
For performing mutational analysis, the S glycoprotein protein coding region of 309 genome sequences of the Omicron variant as well as the prototype genome (hCoV-19/Wuhan/WIV04/2019) were translated to amino acid sequences by using TRANSEQ nucleotide-to-protein sequence conversion tool (EMBL-EBI, Cambridgeshire, UK). Next, the S glycoprotein protein sequences of all the SARS-CoV-2 strains including the prototype variant and the Omicron variants were aligned by MEGA software (Version X) and subsequently observed for amino acid substitutions in the S protein of the Omicron variants with compared to the prototype strain [18]. The amino acid substitution observed at a particular location of the S glycoprotein of the Omicron variant was marked with the number according to its position with reference to the first amino acid of the S glycoprotein of the prototype strain.
Phylogenetic analysis
The phylogenetic analysis of the 155 sequences of the Omicron variant was performed with the Ultafast Sample Placement of Existing Trees (UShER) that has been integrated in the UCSC SARS-CoV-2 Genome Browser [19]. We accessed to the UCSC SARS-CoV-2 Genome Browser (https://genome.ucsc.edu/cgi-bin/hgPhyloPlace) and uploaded the sequence name of 155 sequences (Table S2) for the construction of the phylogenetic tree. UShER is a program that rapidly places new samples onto an existing phylogeny using maximum parsimony. It is particularly helpful in understanding the relationships of newly sequenced SARS-CoV-2 genomes with each other and with previously sequenced genomes in a global phylogeny.
Results
Geographical distribution of the Omicron variant
Geographical distribution showed that 309 sequences of the Omicron variant have been deposited to the GISAID from 21 different nations across six continents (Table 1). Majority of these sequences (N=223) were submitted from Africa, which includes 4 representing countries South Africa (N=170), Ghana (N=33), Botswane (N=19) and Reunion (N=1). There were 62 sequences submitted from 11 different countries (United Kingdom, N=18; Portugal, N=13; Netherlands, N=12; Austria, N=5; Germany, N=5; Italy, N=3; Belgium, N=1; Czech Republic, N=1; Spain, N=1; Sweden, N=1; Ireland, N=1) in Europe. Among the rest 24 sequences, 9 were submitted from three Asian countries (Hong Kong, N=6; Japan, N=2; Israel, N=1), 9 from Australia of Oceania, 3 from Canada of North America, and 3 from Brazil of South America.
Mapping of the S glycoprotein mutations of the Omicron variant
Mutational analysis of the S glycoprotein of the Omicron variant revealed the presence of 37 dominant mutations which ranges in frequency from 59% to 100% (Table 2). The S1 domain of the S glycoprotein contains 29 different mutations that encompasses 11 mutations (A67V, ΔH69, ΔV70, T95I, G142D, ΔV143, ΔY144, ΔY145, ΔN211, L212I and ins214EPE) within the N-terminal domain (NTD), 15 mutations (G339D, S371L, S373P, S375F, K417N, N440K, G446S, S477N, T478K, E484A, Q493R, G496S, Q498R, N501Y and Y505H) within the Receptor binding domain (RBD) and 3 mutations (T547K, D614G and H655Y) at the C-terminus of the S1 subunit. Interestingly, among the 15 mutations of the RBD, 10 mutations (N440K, G446S, S477N, T478K, E484A, Q493R, G496S, Q498R, N501Y and Y505H) were observed within the Receptor binding motif (RBM). The S2 subunit of the S glycoprotein was found to have 8 mutations, of which 3 (N679K, P681H and N764K) were present at the N-terminus of the S2 subunit, D796Y was present within the Fusion peptide (FP), 3 (Q954H, N969K, and L981F) were present within the Heptad repeat sequence 1 (HR1), and N856K was found within the region between FP and HR1 (Figure 1). By comparing the S glycoprotein mutations of four VOCs Alpha, Beta, Gamma, and Delta, it has been observed that the Omicron contains 25 unique mutations (A67V, ΔV143, ΔN211, L212I, ins214EPE, G339D, S371L, S373P, S375F, N440K, G446S, S477N, E484A, Q493R, G496S, Q498R, Y505H, T547K, N679K, N764K, D796Y, N856K, Q954H, N969K, and L981F), whereas 12 mutations (ΔH69, ΔV70, T95I, G142D, ΔY144, ΔY145, K417N, T478K, N501Y, D614G, H655Y, and P681H) were shared with other four VOCs (Figure 1).
Based on coexisting mutations, we also classified 309 strains of the Omicron variant into 60 different groups, each group representing a different set of coexisting S glycoprotein mutations (Table 3, Supplementary file 1). More than half of the Omicron variants (N=170) were found to belong within Group 1, with all the 37 different mutations, and Group 2, with all the mutations except K417N, N440K, G446S and N764K (Group1, N=118; Group 2, N=52). Whereas, rest of the 62 groups represented only 139 strains. Presence of multiple groups demonstrated the high mutational diversity of the S glycoprotein of the Omicron variant.
Phylogenetic analysis of the Omicron variant
We performed phylogenetic analysis of a total 2160 SARS-CoV-2 strains which includes 155 strains of the Omicron variant from 14 different countries and 2005 strains from 19 different clades (20H/Beta, N=13; 20I/Alpha, N=475; 20J/Gamma, N=38; 20A/Delta, N=1049; 21B/Kappa, N=2; 21C/Epsilon, N=29; 21D/Eta, N=6; 21F/Iota, N=19; 21G/Lambda, N=5; 21H, N=5; 19A, N=6; 19B, N=4; 20A, N=97; 20E/EU1, N=65; 20G, N=47; 20C, N=54; 20B, N=77; 20D, N=3; 20F, N=10) by UShER. The metadata of the phylogenetic tree has been provided in the Supplementary file 2. The dendrogram showed that 155 strains of the Omicron variant formed a new cluster that emerged from the 20B clade (also known as GR clade or Pango lineage B.1.1). Interestingly, this new cluster of the Omicron variant further divided into multiple sub-clusters depending on the coexisting mutations of the S glycoprotein (Figure 2).
Discussion
Presence of more than 35 mutations in the S glycoprotein, especially in the NTD and RDB of the S1 subunit, of the Omicron variant has again fuelled the fears of COVID-19 around the world. The S glycoprotein, which mediates viral attachment to ACE2 receptor and entry into the host cell, is subdivided into two functional subunits, known as S1 and S2, which form non-covalent interaction after being cleaved by furin during synthesis [3, 4]. The RBD and NTD are the two crucial domains of the S1 subunit that are responsible for interacting with host cell receptor (ACE2) and recognizing various attachment factors, respectively [3-9]. The fusion mechanism is housed in the S2 subunit, which undergoes large-scale conformational changes to force fusion of the virus and host membranes, allowing genome delivery and initiation of infection [10]. As RBD domain is immuno-dominant and also required for ACE2 attachment, any mutation in RBD domain could affect the neutralization efficacy of antibodies generated in convalescent and vaccinated individual as well as the binding affinity of the S glycoprotein to ACE2 receptor [4, 11-14]. There are 37 dominant mutations in the S glycoprotein of the Omicron variant, raising concerns about whether it is more infectious or pathogenic than other four VOCs, and whether it can evade the natural immunity or vaccine-induced immunity. Despite the lack of definitive immunological and clinical data, we can provide preliminary indications on pathogenicity, transmissibility, and immune evasion capabilities of the Omicron variant based on the known impacts of previously identified mutations. Twelve mutations of the Omicron variant (ΔH69, ΔV70, T95I, G142D, ΔY144, ΔY145, K417N, T478K, N501Y, D614G, H655Y, and P681H) overlap with those in the Alpha, Beta, Gamma, and Delta. All these mutations have previously been linked with high transmissibility, increased viral binding affinity, and immune evasion [15, 20-22]. Higher transmissibility and increased immune evasion are anticipated from the Omicron variant if these overlapping VOCs mutations maintain their known effects. The functional implications of the remaining 25 mutations (A67V, ΔV143, ΔN211, L212I, ins214EPE, G339D, S371L, S373P, S375F, N440K, G446S, S477N, E484A, Q493R, G496S, Q498R, Y505H, T547K, N679K, N764K, D796Y, N856K, Q954H, N969K, and L981F) of the Omicron variant are unknown, leaving a lot of questions about how the whole set of mutations may affect viral fitness and vulnerability to natural and vaccine-mediated immunity.
Several studies on epitope mapping and antibody foot printing have showed that serum neutralizing antibodies of infected and vaccinated individuals mainly target RBD domain of the S glycoprotein [11, 23-26]. Role of 4 RBD mutations of the Omicron variant (K417N, K477N, T478K, and E484A) have previously been described in the context of immune evasion. The K417N, previously detected in Beta and Delta plus, has been shown to reduce the neutralization efficacy of some monoclonal antibodies [27, 28]. Residues E484 and T478 are the part of immuno-dominant site of RBD [11-13]. The E484K, previously observed in Beta and Gamma, has been shown escape antibody neutralization, and also been found to emerge as escape mutation during exposure to monoclonal antibodies and convalescent plasma [29-31]. Four mutant viruses with E484A, E484D, E484G and E484K were also found to be resistant against neutralization by each of the four convalescent sera tested [32]. The E484Q has been shown to reduce serum neutralizing antibody titers [33-35]. However, the T478K, previously detected in Delta, does not affect neutralization by monoclonal antibodies [16].The K477N has been shown to confer resistant against monoclonal antibodies, but not convalescent plasma [32]. Therefore, the presence of three immune escape mutations K417N, K477N and E484A in the RBD is likely to improve the immune evasion ability of the Omicron variant. Although RBD domain is immuno dominant, NTD of the S glycoprotein can also elicit antibody response upon infection and vaccination [8’ 36, 37]. The NTD domain contains an ‘antigenic supersite’ which comprises of N-terminus (residues 14-20), a β-hairpin (residues 140-158) and a loop (residues 245-264) [8]. Among 11 mutations of the NTD of the Omicron variant, 4 mutations (G142D, ΔV143, ΔY144, ΔY145) reside within the β-hairpin region of the antigenic supersite and likely to contribute immune evasion significantly.
A recent study has illustrated the functional significance of all the RBD mutations of SARS-CoV-2 on ACE2 binding [38]. Among the 15 mutations found within the RBD domain of the Omicron variant, 4 mutations (G339D, N440K, T478K and N501Y) were demonstrated to enhance the affinity of RBD towards ACE2, whereas rest 11 mutations (S371L, S373P, S375F, K417N, G446S, S477N, T478K, E484A, Q493R, G496S, Q498R, and Y505H) were demonstrated to reduce the affinity of RBD for ACE2. Notably, mutations at Q493, Q498 and N501 are very crucial for RBD and ACE2 interactions because residues Q493, Q498 and N501 of RBD participate in polar contact networks involving the ACE2 interaction hotspot residues K31 and K353. Amino acid substitutions with nonpolar amino acid at these sites enhance the affinity of RBD to ACE2. However, in the Omicron variant, glutamine (Q) substitution with more polar amino acid arginine (R) at positions 493 and 498 is suspected to reduce the affinity of RBD to ACE2. In contrast, substitution of asparagine at position 501 with less polar amino acid tyrosine will enhance the affinity of RDB to ACE2. The overall affinity of RBD of the Omicron variant to ACE2 will be determined by the magnitude of 4 affinity enhancing mutations and 11 affinity reducing mutations.
The Omicron variant is also expected to maintain the high transmissibility due to the presence of D614G and P681H mutations which were previously described as a key mutation for enhance transmissibility of the virus [22, 39]. Currently, it is not clear whether the Omicron has higher transmissibility than the Delta. Although preliminary data suggests that the Omicron variant is spreading rapidly against a backdrop of ongoing Delta variant infection and natural as well as vaccine-induced immunity, indicating high transmissibility and potency to make breakthrough infections. If the current trend continues, omicron will supplant delta as the most common variation in South Africa and other part of the world very rapidly and may lead to another wave of COVID-19.
Data Availability
All data produced in the present study are available upon reasonable request to the authors
Author contribution
RS and MCS conceived and designed the research. RS, ML and RS performed sequence retrieval, mutational analysis, and figure and table preparation. SD and MCS guided the project and gave valuable scientific inputs. RS and MCS wrote the manuscript. All authors read and approved the manuscript.
Conflict of Interest
Authors declare no conflict of interest
Acknowledgement
We would like to acknowledge the scientists, researchers and laboratory staffs in India for their valued contribution in SARS-CoV-2 genome sequencing and deposition in GISAID. We would also like to applaud GISAID consortium for allowing us the open access to the deposited SARS-CoV-2 sequences.