Evolutionary and phenotypic characterization of spike mutations in a new SARS-CoV-2 Lineage reveals two Variants of Interest ============================================================================================================================ * Paula Ruiz-Rodriguez * Clara Francés-Gómez * Álvaro Chiner-Oms * Mariana G. López * Santiago Jiménez-Serrano * Irving Cancino-Muñoz * Paula Ruiz-Hueso * Manuela Torres-Puente * Maria Alma Bracho * Giuseppe D’Auria * Llúcia Martinez-Priego * Manuel Guerreiro * Marta Montero-Alonso * María Dolores Gómez * José Luis Piñana * SeqCOVID-SPAIN consortium * Fernando González-Candelas * Iñaki Comas * Alberto Marina * Ron Geller * Mireia Coscolla ## Abstract Molecular epidemiology of SARS-CoV-2 aims to monitor the appearance of new variants with the potential to change the virulence or transmissibility of the virus. During the first year of SARS-CoV-2 evolution, numerous variants with possible public health impact have emerged. We have detected two mutations in the Spike protein at amino acid positions 1163 and 1167 that have appeared independently multiple times in different genetic backgrounds, indicating they may increase viral fitness. Interestingly, the majority of these sequences appear in transmission clusters, with the genotype encoding mutations at both positions increasing in frequency more than single-site mutants. This genetic outcome that we denote as Lineage B.1.177.637, belongs to clade 20E and includes 12 additional single nucleotide polymorphisms but no deletions with respect to the reference genome (first sequence in Wuhan). B.1.177.637 appeared after the first wave of the epidemic in Spain, and subsequently spread to eight additional countries, increasing in frequency among sequences in public databases. Positions 1163 and 1167 of the Spike protein are situated in the HR2 domain, which is implicated in the fusion of the host and viral membranes. To better understand the effect of these mutations on the virus, we examined whether B.1.177.637 altered infectivity, thermal stability, or antibody sensitivity. Unexpectedly, we observed reduced infectivity of this variant relative to the ancestral 20E variant *in vitro* while the levels of viral RNA in nasopharyngeal swabs did not vary significantly. In addition, we found the mutations do not impact thermal stability or antibody susceptibility in vaccinated individuals but display a moderate reduction in sensitivity to neutralization by convalescent sera from early stages of the pandemic. Altogether, this lineage could be considered a Variant of Interest (VOI), we denote VOI1163.7. Finally, we detected a sub-cluster of sequences within VOI1163.7 that have acquired two additional changes previously associated with antibody escape and it could be identified as VOI1163.7.V2. Overall, we have detected the spread of a new Spike variant that may be advantageous to the virus and whose continuous transmission poses risks by the acquisition of additional mutations that could affect pre-existing immunity. Keywords * SARS-CoV-2 * Spike * HR2 * variants * homoplasy * and antibody escape. ## Introduction Genomic surveillance of viral mutations is the first step in detecting viral changes that could impact public health by interfering with diagnostics, modifying pathogenicity, or altering susceptibility to existing immunity or treatments. In many countries, the challenge of detecting new mutations of interest in SARS-CoV-2 is approached by sequencing representative genomes from circulating viruses, sharing sequence information on public databases (e.g. GISAID1), and analysing them in real-time using platforms, such as Nextstrain2. While mutations appear randomly, their fate in the population depends on a combination of the conferred fitness advantage as well as stochastic and demographic processes. A first step in assessing the potential public health impact of mutations is to decipher if their increase in frequency is due to chance or adaptation. If found to be adaptive, it is important to evaluate whether their adaptation is linked to an improved ability to replicate, colonize, transmit, or evade antiviral hosts defences3. An important challenge in the field is to decipher which of all the variants that appear should be monitored to implement measures to mitigate their risk to public health. Genotypes that are phenotypically different from a reference isolate or have mutations that lead to changes associated with either established or suspected phenotypes could be considered Variants of Interest (VOI) if they also fit one of the following criteria: i) cause community transmission/multiple COVID-19 clusters or ii) have been detected in multiple countries4. Among VOI, only those genotypes that are associated with higher transmissibility, with detrimental changes in COVID-19 epidemiology, with increased virulence, with changes to clinical presentation, or with decrease effectiveness of public health measures, diagnostics, vaccines or therapeutics are further categorized as variants of concern4. Mutations in SARS-CoV-2 have been reported since the early stages of the epidemic5–7. While no signs of recombination have been detected so far among SARS-CoV-2 variants8, the most common mutations described are single nucleotide polymorphisms (SNPs) and small deletions9–11. Genomic surveillance of mutations has been mostly focused on the Spike (S) protein because of its key roles in viral entry and immunity12, as well as the fact that this protein constitutes the basis of numerous SARS-CoV-2 vaccines13. S is a homotrimeric protein, whose heavily glycosylated ectodomain protrudes from the viral membrane, showing a bat-like shape with a N-terminal globular head portion connected to the membrane by an elongated stalk14. The S protein is proteolytically processed by the cellular furin protease into the S1 and S2 subunits15, 16. Additional proteolytic cleavage occurs following S protein binding to the host receptors, facilitating S1 subunit release. The C-terminal S2 subunit remains trimeric in the viral membrane but undergoes conformational changes that promote viral membrane fusion with the host cell17. A key role in these conformational changes is played by two heptad repeat motifs, HR1 and HR2 that, starting from the head and stalk regions in the pre-fusion state of S protein, form a HR1-HR2 six-helix bundle in the post-fusion state that is critical for viral entry18. The first mutation that was identified as of potential concern was an aspartic acid to a glycine mutation in the S1 subunit of the S protein at position 614 (D614G). D614G emerged early in the epidemic, became predominant in most countries within 2 months, and completely dominated the epidemic by August 202019. As with any mutant, the initial spread of this mutation could have resulted from stochastic events, the dynamics of epidemic, or an intrinsically higher viral fitness. More than six months after the initial report of this mutation, several studies have found evidence in favour of higher transmission efficacy in animal models and human populations5, 20–22. This variant replicates better in some cell culture and animal models20, 21, 23, and is associated with higher viral loads in infected individuals19; importantly, however, it does not impact diagnostics or vaccine efficacy. Following the first wave of the pandemic, additional variants have been reported from many countries. Among the first of these was the A222V mutation at the N-terminal domain (NTD) of the S1 subunit, which occurred in the background of the D614G S protein mutation. This variant, termed 20E, was first sequenced in Spain and expanded throughout Europe6. Other variants have been reported since, such as the so-called “cluster 5”, which harbours a combination of 3 SNPs and single deletion related to mink farms in Denmark24. One of the SNPs is in the S protein of this variant, Y453F, occurs in the receptor binding domain (RBD) and may increase binding to cell receptors in mink25. Transmission of this variant between humans and minks has been reported7, highlighting a possible risk of expansion in the human population, which resulted in proposals for large scale culling of mink populations in Denmark. Studies to assess the biological impact of this mutant have not been reported but there is no evidence for its wide spread over the course of >6 months since its description26. By December 2020, three variants of concern (VOCs) were described, all of which share the N501Y amino acid replacement in the RBD of the S protein: 20I/501Y.V1 (also called Lineage B.1.1.7) was originally described in the UK11, 20H/501Y.V2 (B.1.351) in South Africa, and 20J/501Y.V3 (P1) in Brazil. These variants are of particular concern because they are more transmissible27–29 and, although data on antigenicity and disease severity is not conclusive, could result in reduced susceptibility to neutralization by existing immunity and affect vaccine efficacy30–33. Importantly, all of these variants have spread outside of the country where they were initially identified and are estimated to spread faster than other co-circulating genotypes27, 34, 35. In this work, we have performed a detailed phylogenomic analysis of the appearance, spread and evolution of two mutations involving amino acid positions 1163 and 1167 of the HR2 functional motif of S protein. Our results provide evidence of repeated, independent emergence, suggesting these mutations contribute to increased viral fitness. In addition, we have evaluated the biological relevance of these mutations to viral infectivity, virion stability, and neutralization by sera from convalescent and vaccinated individuals. ## Results ### Multiple and independent mutations in amino acid positions 1163 and 1167 of the Spike protein SARS-CoV-2 genetic variation has been monitored by the Spanish sequencing consortium SeqCOVID to follow the expansion of mutations that could potentially result in a change of the biological properties of the virus. We focused on mutations in the S protein because of its relevance for infection and immunity12. We detected two mutations in the *spike* gene: G25049T (D1163Y) and G25062T (G1167V), which appeared in Spain as early as March and April 2020, respectively (Supplementary Fig.1). These mutations continued arising independently of each other and, by the end of June, when the predominant circulating genotypes from the first wave in Spain had already been replaced by other variants36, were also observed together (Supplementary Fig.1). Both positions have mutated multiple times independently and to different amino acids at a lower frequency. On the one hand, D1163 appears mutated at least 99 times (D1163Y: 84, D1163V: 4, D1163G: 3, D1163A: 2, D1163E: 2, D1163H: 2, D1163N: 1, and D1163H/Y: 1) in 47 lineages according to the PANGO scheme37. On the other hand, G1167 appears mutated at least 54 times (G1167V: 39, G1167D: 4, G1167C: 3, G1167R: 3, G1167S: 3, G1167F: 1, and G1167A: 1) in 20 PANGO lineages including B.1 (Supplementary Fig.2e) and its derivatives B.26, B.40 (Supplementary Fig.2c), and D.2 (Supplementary Fig.2f). ![Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/03/12/2021.03.08.21253075/F1.medium.gif) [Figure 1.](http://medrxiv.org/content/early/2021/03/12/2021.03.08.21253075/F1) Figure 1. Sequences mutated at positions 1163 and 1167 of the S protein. **a.** The number of mutation events for amino acid replacement D1163Y (light orange) or another D1163 amino acid replacements (dark orange). **b**. The number of mutation events for amino acid replacement G1167V (light turquoise) or another G1167 amino acid replacements (dark turquoise). Bars coloured in magenta indicate the appearance of both the D1163Y and G1167V amino acid replacements in the same sequences. **c.** Maximum-likelihood phylogeny of 10,450 SARS-CoV-2 genomes. The inner circle of the rings represents sequences with amino acid changes in position D1163 of the S protein. The external circle represents sequences with amino acid changes in position G1167 of the S protein. Branches are coloured in magenta for B.1.177.637, green for clade 20E, and orange for cluster 1163.654. The scale bar indicates the number of nucleotide substitutions per site. **d.** Temporal distribution and frequency of sequences with variant B.1.177.637 coloured by geographical origin. ### Clusters of transmission with amino acid changes in positions 1163 and 1167 of Spike Positions 1163 and 1167 in the S protein have mutated independently multiple times in SARS-CoV-2. The majority of mutated sequences (94.43%) were found in transmission clusters (see methods for definition of clusters; Figure 1a,b), with a small minority not belonging to a transmission cluster due to either incomplete sampling or failure to spread. While different amino acids changes have been detected at both positions, only one change at each position appeared in most clusters: D1163Y in 83.33% and G1167V in 69.23% clusters. D1163Y appeared in 22 transmission clusters (Figure 1a) and G1167V in 8 clusters (Figure 1b). Interestingly, the biggest cluster included both the D1163Y and G1167V mutations together. We denote this cluster as B.1.177.637, which was detected in 65 sequences from Spain until December 2020, representing 1.17% of the Spanish sequences. Globally, this cluster, which is within lineage 20E (also described as 20A.EU16, and B.1.17737), includes 1,627 sequences (Figure 1c). B.1.177.637 is characterized by nine nonsynonymous and six synonymous mutations with respect to the reference sequence from Wuhan (Supplementary table 2), but no deletions were shared among B.1.177.637 sequences. Amino acid changes were found in A222V, D614G, D1163Y, and G1167V in the S protein, A220V and P365S in the N protein, V30L in ORF10, L67F in ORF14, and P4715L in ORF1ab (Supplementary Fig.3 and Supplementary table 2). Synonymous mutations were also observed in the *orf1ab*, *n* and *m* genes (Supplementary Fig.3 and Supplementary table 2). All evidence support we consider this cluster a new Lineage, and we requested to be under the name B.1.177.637 ([https://github.com/cov-lineages/pango-designation/issues/22](https://github.com/cov-lineages/pango-designation/issues/22)). ![Figure 3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/03/12/2021.03.08.21253075/F2.medium.gif) [Figure 3.](http://medrxiv.org/content/early/2021/03/12/2021.03.08.21253075/F2) Figure 3. Comparison of the infectivity and stability of different S genotypes. **a, b.** The infectivity of VSV particles pseudotyped with each S protein genotype in either Vero (**a**) or human A549 cells expressing ACE2 and TMPRSS2 (**b**). The mean and standard deviation of three replicates is plotted. **c.** Comparison of cycle threshold (Ct) values for the *n* gene from patients infected with viruses encoding different S protein variants. Data is derived from 2,534 sequences from SeqCOVID consortium. The number of observations (N) analysed for each genotype is indicated. **d.** The thermal sensitivity of VSV pseudotyped with different S genotypes following incubation at 15 minutes. Data are standardized to the surviving fraction following incubation at 30°C, and the three-parameter log-logistic equation is plotted. FFU: focus forming units. Within 20E, the second largest cluster including any of these mutations was observed in 34 sequences with E654Q and D1163Y in S protein plus 7 nonsynonymous and 6 synonymous mutations (Supplementary table 2 and Supplementary Fig.3). We denote this second cluster, which is also embedded within lineage 20E, cluster 1163.654 (Figures 1c and S3). Cluster 1163.654 appeared first in Ireland on 2020-07-23 and subsequently appeared in Spain and England. However, after three months, cluster 1163.654 is no longer being detected. A large cluster within Lineage 20E is formed by 37 sequences with the mutation G1167F. Sequences for this cluster were obtained in Wales between the end of October and the beginning of November 2020 (dark pink in the external circle in Supplementary Fig.2g). We found additional clusters involving mutations in position 1163 of the S protein; the largest cluster is composed of 64 sequences within Lineage B.1. The majority of which are from Denmark but also includes sequences from England and Sweden (orange in external ring in Supplementary Fig.2e). Among other smaller clusters, a cluster of 10 sequences with G1167V (within Lineage B.1) appeared in Spain in March (indicated in cyan within the external circle of Supplementary Fig.2e). The cluster was detected in two regions in Spain (Valencia, and Galicia), but was controlled with lockdown measures imposed in Spain from March to May and was not detected after May 2020. The same mutation was found in a cluster of 28 sequences within 20E from England and Wales between October and November 2020 (indicated in cyan within the external circle of Supplementary Fig.2g). Similarly, another cluster of 11 sequences in Lineage B.1.1 and with the mutation G1167A was detected in early January 2021, encompassing sequences from Ecuador, Colombia, and Peru (Supplementary Fig.2f). Within Lineage B.53, we found a cluster of 12 sequences from Lithuania with mutation D1163Y. Finally, we found small clusters with mutations in 1163 or 1167 within Lineage B.40 and Lineage A (Supplementary Fig.2b, c). Because of the risk posed by VOC11, 26, 35, we examined whether mutations in 1163 and 1167 of the S protein were observed in the three VOC described to date. Indeed, mutations in these two positions were observed in two VOC, 20I/501Y.V1 and 20H/501Y.V2. Mutations involving amino acid positions 1163 and 1167 appeared independently in the background of 20I/501Y.V1 multiple times. Specifically, mutations in D1163 have occurred at least 13 times in 24 sequences, including four transmission clusters and nine unique sequences (Supplementary Fig.4), while mutations at G1167 were observed in at least five independent sequences. Interestingly, D1163Y and G1167V were observed together in only one individual within 20I/501Y.V1, although they were not fixed (relative frequency of 27% and 17% of the reads with D1163Y and G1167V, respectively; Supplementary table 3). Finally, only two sequences that harbour the amino acid replacement G1167V in the S protein were observed in the genomic background 20H/501Y.V2 (Supplementary table 3). ### Evolution of B.1.177.637 We explored the emergence and evolution of B.1.177.637, the largest and most successful cluster involving amino acid changes in positions 1163 and 1167 of the S protein. Lineage B.1.177.637 appeared in Spain in June 2020 in sequences from the Basque Country (Figure 1d, and Supplementary Video 1) and subsequently appeared in individuals from other countries, comprising a total of 1,627 sequences in GISAID (0.60% of 270,869 analysed sequences by 23rd of December 2020) (Figure 1d and Supplementary Video 1). The majority of the B.1.177.637 sequences were obtained from the United Kingdom, including England (1,058), Scotland (419), Wales (34) and Northern Ireland (5), but were also observed in Gibraltar (24 sequences), indicating successful migration and transmission (Supplementary Video 1). Although B.1.177.637 is not well represented in sequences from other countries, it has been found in multiple sequences from Denmark (9), Switzerland (8), Norway (2), and single sequences from Italy, France, Singapore, and Ireland. By the end of 2020, B.1.177.637 was still circulating in Europe (Figure 1d and Supplementary Video 1), and by the end of February 2021 it was represented by 1,923 sequences in GISAID (0.33% of submitted sequences). Within B.1.177.637, we detected additional SNPs in individual sequences or small groups of sequences. One of these changes is E484K in the RDB of the S protein, a mutation present in three VOCs (20I/501Y.V1, 20H/501Y.V2 and 20J/501Y.V3) that is implicated in increased ACE2 binding38 and reduced neutralization by antibodies39. In addition, we found another change associated with evasion of antibody immunity: a deletion of positions 141-144 in the S protein, which partially overlaps with a smaller deletion at 144 reported in VOC 20I/501Y.V140. This sub-cluster included five sequences along January 2021 from England and Wales (Supplementary table 2). The five sequences formed a monophyletic group embedded in B.1.177.637 (Supplementary Fig.5), identified as cluster B.1.177.637.V2, which displays other nonsynonymous and synonymous mutations (Supplementary table 2) and only two sites are polymorphic within B.1.177.637.V2. ![Figure 4.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/03/12/2021.03.08.21253075/F3.medium.gif) [Figure 4.](http://medrxiv.org/content/early/2021/03/12/2021.03.08.21253075/F3) Figure 4. Antibody neutralization of 20E and B.1.177.637 variants. The reciprocal titer at which infection with the 20E S genotype (A222V and D614G) or B.1.177.637 S genotype (20E plus D1163Y and G1167V) is reduced by 80% (ID80) by sera from individuals infected during the early stage of the pandemic in (**a**) or during a later stage of the pandemic (**b**) or from donors vaccinated with the BNT162b2 vaccine (**c**). The mean and standard error of three replicates is plotted. ### Positions 1163 and 1167 of the S protein are located in the heptad repeat 2 motif The S protein mediates both the binding to cellular receptors and entry into the host cells12. For the former, the RBD motif in the S1 subunit interacts with the cellular receptor in the pre-fusion state. In the post-fusion state, two heptads repeat sequences (HR1 and HR2) in the S2 subunit must form a six-helix bundle in order to bring the viral and cellular membrane into close proximity41, 42 (Figure 2). S protein positions 1163 and 1167 are both located within the HR2 domain. Specifically, 1167 is present at the beginning of the HR2 motif and 1163 in its upstream linker region (Figure 2a). Interestingly, this motif is highly invariable, showing 100% conservation across 14 related sarbecoviruses to which SARS-CoV-2 belongs to (Supplementary table 1)43, 44. Structural characterization of full-length ectodomain of S protein has shown that the stalk portion encompassing positions 1163 and 1167 presents intrinsic flexibility in the pre-fusion state18, precluding its atomic visualization. This has been recently confirmed by high-resolution cryo-electron tomographic reconstitution of SARS-CoV-214, where this region was observed to constitute a flexible hinge that acts as a “knee”, connecting two helical coiled-coil regions of the stalk (upper and lower legs; Figure 2b). Within this structure, the conformational freedom provided by the glycine residue at position 1163 should play a key role in the flexibility of the knee. In contrast, in the post-fusion state, this region shows high rigidity due to a strong structural rearrangement of the HR2 motif, which adopts an extended conformation and tightly packs along the central 3-helix bundle stem formed by the HR1 motif (Figure 2c). The resulting HR1-HR2 bundle plays a key role in the mechanism of viral-host membrane fusion18, 45 and mutations in this region could have significant impact on the function of the S protein. In addition, the HR2 region is highly glycosylated, which is observed to be regularly spaced in both the pre- and post-fusion states and to mostly align to the side of the helix bundle14, 18, 45. Of note, two of these branched sugars are placed at positions N1158 and N1173, hiding positions 1163 and 1167 (Figure 2b). Therefore, changes in stalk flexibility might have relevance in immunity by influencing both the intrinsic degree of exposure of this region and its sugar shielding. ![Figure 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/03/12/2021.03.08.21253075/F4.medium.gif) [Figure 2.](http://medrxiv.org/content/early/2021/03/12/2021.03.08.21253075/F4) Figure 2. The structure of 1163 and 1167 in the pre and post-fusion states of S protein. **a.** Schematic representation of the S protein. SP. Signal peptide; NTD, N-terminal domain; RBD, receptor-binding domain; SD1-2, subdomains 1 and 2; L-UH, Linker-Upstream helix; FP, fusion peptide; CR, connecting region; HR1, heptad repeat 1; CH-SD3, central helix subdomain 3; BH, β-hairpin; HR2, heptad repeat 2; TM, transmembrane; CD, cytoplasmic domain. Mutations D1163Y and G1167V are indicated in purple and other mutations described in the text in green. **b.** Cartoon representation (*left*) of a structural model of pre-fusion membrane-bound trimeric S protein46. In each subunit the RBD, HR1 and HR2 domains are coloured in different tones (light to dark) of blue, yellow, and green. The N-glycosylation of N1155 and N1176 are shown in stick representation and coloured as the corresponding subunit. Functional and structural regions are marked. A close-view (*right*) of the N-terminal portion of HR2 where D1163Y and G1167V mutations are found. The side-chains of mutated and hydrophobic residues in the HR2 region are shown in stick representation and coloured as corresponding subunit (mutated residues in lighter tone). **c.** Cartoon representation (left) of S2 subunit in post-fusion conformation with HR1 and HR2 regions coloured as in b and N-glycosylation around mutation position shown as sticks. A close-view of the region encompassing the mutations (right), showing in stick representation the mutated and hydrophobic residues from the HR2 region shown in panel b. Dotted lines highlight HR2 disordered regions in the Cryo-EM structure. Using the available structural information of the S protein in the pre- and post-fusion conformations18, we examined the possible implications of these mutations to viral infectivity. Based on these structures, G1167V mutation is predicted to confer significant rigidity to the structure in two ways. First, the introduction of a side chain strongly reduces the conformational freedom provided by the glycine residue. Second, the presence of the new aliphatic side chain provided by the valine residue strongly increases hydrophobicity, likely promoting the burial of this side chain in the HR1 helix 3-bundle stem in the post-fusion state or favouring its integration in the neighbour helical coiled-coil in the pre-fusion state (Figure 2b,c). Unlike position 1163, position 1167 is fully exposed to the solvent in both the pre- and post-fusion states (Figure 2b,c). Hence, the effect of D1163Y is likely to stem a change in nature of the side chain, switching from a charged aspartic acid residue at physiological pH to a polar group with hydrophobic properties in the tyrosine. ### Spike aminoacid changes D1163Y and G1167V do not increase viral infectivity Previous reports have indicated that mutations in the S protein can increase infectivity5, 21, 47–49. Because the highest transmission success for mutations in S positions 1163 and 1167 corresponds to the double mutant D1163Y and G1167V (characteristic of B.1.177.637), we explored whether these mutations in combination have an influence on infectivity. For this, we pseudotyped vesicular stomatitis virus lacking its glycoprotein and encoding GFP50 (VSVΔG-GFP) with different S genotypes: Wuhan (D614), D614G, 20E (A222V and D614G), or B.1.177.637 (A222V, D614G, D1163Y and G1167V). Infectious virus production was then assessed by limiting dilution and counting of GFP-positive cells in both Vero cells and A549 human alveolar basal epithelial cells expressing human ACE2 and TMPRSS2 (A549-hACE2-TMPRSS2). As previously reported19, 21, 51, the 20E S genotype enhanced infectivity relative to the Wuhan S genotype by 70% in both Vero (p-value = 0.005 by unpaired t-test; Figure 3a) and A549-hACE2-TMPRSS2 cells (p-value = 0.016 by unpaired t-test; Figure 3b). The 20E S genotype also showed a trend towards increased infectivity versus the D614G mutation alone (35% increase in both cell lines), as has been previously reported49, yet the difference was not statistically significant (p-value > 0.05 by unpaired t-test; Figure 3a,b). In contrast, B.1.177.637 S genotype significantly diminished virus infectivity versus the 20E genotype, reducing virus titers by 20% in Vero cells (p-value = 0.009 by unpaired t-test; Figure 3a) and 29% in A549-hACE2-TMPRSS2 (p-value = 0.03 by unpaired t-test; Figure 3b). This is in agreement with a potential stabilization of the HR2 helix (Figure 2), which should limit the ability of the S protein to sample different structural conformations that could be required for binding host-cell receptors. Hence, B.1.177.637 S genotype does not increase infectivity *in vitro*. To examine whether reduced infectivity could also be observed *in vivo*, we tested if individuals infected with B.1.177.637 have different viral loads. For this, we used the cycle threshold (Ct) of real-time PCR used for diagnosis as a surrogate. As previously reported19, we detected higher Ct values for D614 wild-type variant (Ct mean = 27.00) compared to genotypes encoding the D614G S protein mutation (Ct mean = 25.32; p-value < 0.01 by unpaired Wilcoxon test, Figure 3c). However, we did not find significant differences in viral loads between individuals infected with B.1.177.637 genotype and other genotypes within 20E (Ct mean = 21.14 vs Ct mean = 20.63, p-value = 0.72 by unpaired Wilcoxon test, Figure 3c). Interestingly, higher viral loads were observed in individuals infected with B.1.177.637 and others 20E (D614G and A222V) compared to the D614G alone (D614G Ct mean = 25.32, 20E Ct mean = 21.14, B.1.177.637 Ct mean = 20.63, p-value < 0.01 for both comparisons by unpaired Wilcoxon test, Figure 3c). This data suggests that, unlike the results of the in vitro studies, B.1.177.637 replicates as efficiently as other 20E genotypes *in vivo*, although mutations outside of the S protein could contribute to this result. ### Amino acid changes D1163Y and G1167V do not alter S protein stability As increased S protein stability could impact transmissibility by maintaining virion infectivity during the intra-host transmission period, we assessed the temperature sensitivity of the different S variants. For this, we subjected VSV particles pseudotyped with different S genotypes to a range of temperatures for 15 minutes, after which we evaluated the surviving fraction. Overall, no major differences in the degree to which the different S proteins lost infectivity upon heat exposure were observed, with all S proteins showing a 50% reduction in infectivity at a similar temperature range (39.8-42.2°C; p-value > 0.05 for all except Wuhan S genotype (D614) versus 20E S genotype (A222V and D614G), where p-value = 0.01; Figure 3d). Hence, the D1163Y and G1167V mutations do not seem to have a major impact in the thermal stability of the S protein. ### Spike D1163Y and G1167V modestly reduce sensitivity to neutralization by existing antibody immunity Positions 1163 and 1167 of the S protein have been reported to occur in both T and B cell SARS-CoV-2 epitopes52–54. Moreover, numerous studies have shown that mutations in the S protein can affect antibody neutralization30, 31. We therefore examined if the presence of D1163Y and G1167V alters the neutralization capacity of convalescent sera using VSV pseudotyped with either the 20E or B.1.177.637 S genotypes. In order to capture the potential influence of different infecting variants on antibody neutralization, we tested the sensitivity of these pseudotyped viruses to neutralization by sera from early (April 2020; First wave in Spain) or later (October 2020; Second wave in Spain) in the pandemic, when newer variants were dominant6, 36. Overall, B.1.177.637 genotype conferred a modest but statistically significant reduction in sensitivity to neutralization by six serum samples tested from the early stage of the pandemic, as measured by the titers required to inhibit viral entry by 80% (ID80; mean = 6.75, range: 1.30-17.68; p-value = 0.008 by paired t-test; Figure 4a). A statistically significant but smaller effect was observed when the titers required to inhibit viral entry by 50% were examined (ID50; mean = 2.27, range: 1.61-3.54; p-value < 0.001 by paired t-test; Supplementary Fig.6). In contrast, both 20E and B.1.177.637 were equally susceptible to sera from patients infected during the second wave (ID80; mean = 1.03, range: 0.87-1.23; p-value = 0.83 by paired t-test; Figure 4b). These results indicate that the D1163Y and G1167V mutations can provide some degree of escape from pre-existing antibody-based immunity relative to the 20E S genotype depending on the genomic background of the infecting genotype. As a modest reduction in titers was observed with sera from early in the pandemic (Figure 4a), which is more closely related to the current S genotype present in approved vaccines55, 56, we examined if B.1.177.637 S genotype resulted in reduced neutralization by sera from donors vaccinated with the BNT162b2 vaccine. No significant differences in susceptibility to antibody neutralization from vaccinated donors were observed between the two genotypes, indicating that VOI1163.7 is unlikely to alter the efficacy of vaccines based on the Wuhan S genotype (Figure 4c). ## Discussion SARS-CoV-2 success is linked to its ability to infect and be transmitted. Mutations that emerge independently several times and increase in frequency are likely to confer enhanced viral infectivity, transmission, or immune evasion. The identification of such mutants is of great importance, as they can significantly impact public health. In this work, we have identified two mutations in the S protein that are likely to be beneficial for the virus based on several lines of evidence. First, these mutations are highly variable within SARS-CoV-2 but conserved across the closely related coronaviruses. Second, the vast majority of sequences harbouring these mutations appear in clusters (Figure 1a and 1b). Third, the largest cluster, and therefore the most successful in terms of transmission, includes both mutations together (Figure 1a and 1b). Additionally, both positions have been reported as positively selected multiple times throughout the SARS-CoV-2 phylogeny indicating a fitness advantage57. Although either mutation in isolation could be advantageous, their co-occurrence in a large cluster that has been sustained for more than six months across Europe is suggestive of increased fitness when both mutations are present together. Positions 1163 and 1167 are found in the HR2 domain of the S protein, adjacent to the transmembrane domain. Examination of available structural data suggests that G1167V might alter the flexibility of the S protein stalk by both restricting the conformational freedom normally conferred by the glycine residue and by introducing a hydrophobic side chain that will favour burial in the HR2 coiled-coil leucine zipper of the pre-fusion state (Figure 2). This extensive flexibility of S prefusion stalk seems to be unique to the SARS-CoV-2 S protein and has not been reported for other class I fusion proteins18. The stalk flexibility has been suggested to increase avidity for the host receptors by allowing the engagement of multiple S proteins18. Therefore, stalk stabilization is likely to result in a reduced ability of S to bind receptors in the target cell. Indeed, we find the B.1.177.637 genotype to have reduced infectivity compared to the 20E genotype in both Vero and A549-hACE2-TMPRSS2 cells (Figure 3a, b). In contrast, viral load in individuals infected with different S genotypes indicated that 20E and B.1.177.637 reach similar degrees of viral replication in vivo (Figure 3c). This apparent discrepancy in the effect of the two mutations on infectivity may stem from the differences of SARS-CoV-2 infection *in vivo* and in the *in vitro* assay. A recent publication has suggested that the tyrosine-protein kinase receptor UFO (AXL) is an important mediator of SARS-CoV-2 entry and may be of higher relevance for infection of the lung than ACE258. The effect of entry via AXL in the two cell lines used may be overshadowed by high levels of ACE2 expression. Alternatively, additional factors could underlie this difference, including the presence of additional mutations outside of the S protein or differences of viral loads across sampling times i*n vivo* during the infection. Increased temperature stability can potentially confer a fitness advantage to the virus by reducing losses to infectivity during environmental transition between hosts. Hence, we also examined whether these mutations altered the temperature stability of the virions. Overall, no major difference in stability was observed between VSV pseudoparticles bearing the D614, D614G, 20E, or B.1.177.637 S protein genotypes (Figure 3d). Hence, changes in protein stability are unlikely to underlie the increased transmission of these variants. Finally, as the S protein is a major target of the immune response12, immune evasion represents one possible consequence of mutations in this protein. Both S positions 1163 and 1167 are embedded in experimentally confirmed T cell and B cell epitopes. For T cell epitopes, a predicted HLA-II epitopes including position 1163 and 1167 has been experimentally verified to bind to HLA DRB1*01:01, the prototype molecule for the DR supertype (epitope identifier in Immune Epitope Data Base: 900659). Additionally, D1163 is included in a SARS-CoV-2 T cell epitope eliciting T-cell responses in convalescent COVID-19 cases60 as well as in SARS-CoV-2-naïve individuals53, indicating cross-reactivity in epitopes involving these regions. B cell linear epitopes that span D1163 and G1167 have also been reported52, with D1163 belonging to a dominant linear B cell epitope recognized by more than 40% COVID-19 patients used in the assay54. D1163 is fully solvent exposed in available structure18, 45, making its side-chain easily accessible to antibodies, providing a potential mechanism for altering antibody binding. To directly examine whether the mutated S positions 1163 and 1167 influence susceptibility to pre-existing humoral immunity, we examined the neutralization capacity of convalescent sera against the 20E variant or B.1.177.637. For this, we used sera from both the first (April 2020) and second (October 2020) waves of the infection in Spain, because an almost complete replacement of SARS-CoV-2 S variants occurred between these two times of the pandemic in Spain36. When utilizing sera from donors infected during the first wave of the pandemic in Spain, we found a modest but statistically significant reduction in susceptibility to neutralization of the B.1.177.637 S genotype compared to the 20E S genotype of approximately 6-fold (Figure 4a). However, no difference in neutralization was observed between the two variants when sera from patients infected during the second wave was used (Figure 4b), highlighting variant-specific differences in antibody responses. Overall, the magnitude of the observed reduction in neutralization susceptibility to sera from individuals infected during the first wave was much less pronounced than that observed for other genotypes implicated in immune evasion31. Nevertheless, the degree of reduced neutralization required to confer a biologically relevant fitness advantage *in vivo* has not been established, and even relatively small reductions in susceptibility to antibody neutralizations could potentially confer a significant advantage to replication. Indeed, this has been suggested to be the case in an immunosuppressed individual treated with convalescent serum, where mutations selected during the course of treatment conferred a similar reduction to that observed in the current study61. Finally, no evidence was found for reduced neutralization by sera from donors immunized with the BNT162b2 vaccine (Figure 4c), which is based on the Wuhan S genotype, indicating antibody immunity elicited by Pfizer-BioNTech COVID-19 vaccine (BNT162b2; February 2021) would not be affected by amino acid replacements in 1163 and 1167 sites in S protein. Although further experiments are needed to decipher the mechanisms by which the two S mutations identified could confer a selective advantage to the virus, the evidence presented supports B.1.177.637 as a VOI (VOI1163.7.V1) according to recently published criteria4. First, amino acid replacements in S protein: D1163Y and G1167V lead to changes associated with suspected phenotypic change because of the rigidity it poses to the S protein. Second, the genotype showed moderate but significantly lower antibody susceptibility compared to 20E S genotype. And third, it has been identified to cause community transmission, appearing in multiple COVID-19 cases, and detected in multiple countries. Additionally, a subgroup within B.1.177.637 (denoted as VOI1163.7.V2) includes two additional mutations leading to amino acid changes in S with established phenotypic impact on humoral immunity: E484K62 and 141-144Del40, 63. Whether VOI1163.7.V1 and VOI1163.7.V2 will continue to increase in frequency and accumulate additional mutations that could improve its fitness and/or present challenges to vaccines or diagnostics remains to be seen. However, their characterization as VOI would help to discover if enough evidence holds to consider them VOC and therefore required monitoring. ## Methods ### Whole-genome sequencing and genome assembly of SeqCOVID consortium sequences A total of 5,017 clinical samples were received, sequenced, and analysed by the SeqCOVID consortium from all autonomous communities of Spain. These samples were confirmed as SARS-CoV-2 positive by RT-PCR carried out by Clinical Microbiology Services from each hospital. Sequencing of the samples has been approved by the ethics committee: Comité Ético de Investigación de Salud Pública y Centro Superior de Investigación en Salud Pública (CEI DGSP-CSISP) N° 20200414/05. All sequences are available at GISAID under the accession numbers detailed in Supplementary table 4. For sequencing, RNA samples were retro-transcribed into cDNA. SARS-CoV-2 complete genome amplification was performed in two multiplex PCR, according to the protocol developed by the ARTIC network64, using the V3 multiplex primers scheme65. From this step, two amplicon pools were prepared, combined, and used for library preparation. The genomic libraries were constructed with the Nextera DNA Flex Sample Preparation kit (Illumina Inc., San Diego, CA) according to the manufacturer’s protocol, with 5 cycles for indexing PCR. Whole-genome sequencing was performed in the MiSeq platform (2×200 cycles paired-end run; Illumina). Reads obtained were processed through a bioinformatic pipeline based on iVar66, available at [https://gitlab.com/fisabio-ngs/sars-cov2-mapping](https://gitlab.com/fisabio-ngs/sars-cov2-mapping). The first step in the pipeline removed human reads with Kraken67; then fastq files were filtered using fastp68 v 0.20.1 (arguments employed: --cut tail, --cut-window-size, --cut-mean-quality, -max_len1, -max_len2). Finally, mapping and variant calling were performed with iVar v 1.2, and quality control assessment was carried out with MultiQC69. ### Analysis of the *spike* gene of sarbecoviruses related to SARS-CoV-2 14 sequences including SARS-COV-2 belonging to sarbecoviruses, sequences were annotated with annotation files available at NCBI database in order to locate the *spike* gene coordinates (accession numbers are available at supplemental table 1). For each sequence, nucleotide coordinates belonging to this gene were extracted with EMBOSS70. The 14 sequences harbouring the *spike* gene were concatenated and aligned with MEGA-X71 using amino acids with ClustalW algorithm with default options (alignment is available at [https://github.com/PathoGenOmics/B.1.177.637_SARS-CoV-2](https://github.com/PathoGenOmics/B.1.177.637_SARS-CoV-2)) ### Sampling SARS-CoV-2 from non-Spanish consortium sequences To build the global alignment, sequences were downloaded from GISAID1 including all the pandemic periods since the first known case sequenced (from 24 December 2019) until the last sample on 22 December 2020. We used two filters to select the dataset: sequences with more than 29,000 bp, and sequences with known dates of sampling. Sequences downloaded from GISAID were aligned against the SARS-CoV-2 reference genome72 using MAFFT73, omitting all insertions and getting an alignment length of 29,903 bp. The final alignment constructed included 270,869 sequences, all sequences with GISAID ID used for this study are available in Supplementary Table 5. ### Frequency and detection of mutated positions Single nucleotide variants were detected using the global dataset alignment, generating a VCF file with SNP-sites74 v 2.5.1 (argument employed: -v), using the reference genome as the reference bases for detecting mutations. This VCF file was processed with a Python script (available at [https://github.com/PathoGenOmics/B.1.177.637_SARS-CoV-2](https://github.com/PathoGenOmics/B.1.177.637_SARS-CoV-2)) to assess all mutated samples by position, calculating the frequencies of the global dataset and annotating sequences with the detected mutations. After that, the mutated positions were annotated with snpEff75 v 5.0 using SARS-CoV-2 reference (Wuhan first sequenced) database annotation (arguments employed: -c, -noStats, -no-downstream, -no-upstream, NC_045512.2). Genotypes detected that involved mutations in 1163 and 1167 such as B.1.177.637 and cluster 163.654 were represented in a circos plot with the R package circlize76 v 0.4.12.1004. Nucleotide coordinates of SARS-CoV-2 are plotted in a non-closed circle, circle is annotated and coloured with genes of the virus, and mutated nucleotide positions of each genotype are connected between them through lines. ### Alignments For the phylogenetic analysis, a reduced dataset was selected from the 270,869 sequences. Duplicated sequences were removed with seqkit v 0.13.2 (arguments employed: rmdup -s). 8,397 sequences were selected at random with the same temporal distribution by month as the initial dataset by Python scripting (available at [https://github.com/PathoGenOmics/B.1.177.637_SARS-CoV-2](https://github.com/PathoGenOmics/B.1.177.637_SARS-CoV-2)). The 8,397 sequences were concatenated with 2,053 sequences selected as indicate above because harboured amino acid replacements in D1163 and G1167 of the S protein and resulted in an alignment of 10,450 sequences (Supplementary Table 6). The dataset to represent 20I/501Y.V1 phylogenetic relationships include 3,067 randomly selected samples identified by Pangolin typing system ([https://github.com/cov-lineages/pangolin](https://github.com/cov-lineages/pangolin)) as lineage B.1.1.7 plus the 33 sequences with amino acid replacements in D1163 and/or G1167 (Supplementary Table 7). For all the alignments, problematic positions reported by Lanfear, R.77 were masked for the phylogenetic reconstruction using masked_alignment.sh script. ### Phylogenetic analysis Maximum-likelihood phylogenies in Figure 1 and supplementary Supplementary Fig.2, S4 and S5 were reconstructed from the masked alignment using IQ-TREE78 v 1.6.12 with GTR model and collapsing near-zero branches (arguments employed: -czb, -m GTR). The phylogenies were rooted in the reference sequence from Wuhan72 on 2019-12-24. The phylogenies were annotated and visualized with iTOL79 v 4. The phylogeny in Supplementary Video S1, composed by 10,450 sequences, was build up with Nextstrain pipeline (available at [https://github.com/nextstrain/augur](https://github.com/nextstrain/augur)) in order to monitor and visualize temporal and geographical transmission of B.1.177.637. This dataset file is available at [https://github.com/PathoGenOmics/B.1.177.637_SARS-CoV-2](https://github.com/PathoGenOmics/B.1.177.637_SARS-CoV-2). ### Clusters of transmission involving 1163 and 1167 S amino acid replacements We used the phylogeny of 10,450 sequences enriched with all sequences mutated in 1163 and 1167 to quantify the minimum number of mutational events involving positions 1163 and 1167 in S protein. We first defined which mutations characterize internal nodes using R packages: tidytree v 0.3.3 and treeio v 1.14.380. We then depicted monophyletic clusters sharing at least one of the two mutations. Transmission clusters were defined as all sequences that: i) are derived from an internal node characterized by the same nucleotide mutation involving 1163 or 1167 amino acid replacements, ii) include more than one sequence, and iii) at least 95% of sequences share the nucleotide mutation. Additionally, redundant nodes were eliminated, keeping the ancestral node of the cluster (harbouring the largest number of leaves). Sequences with at least one mutation but not in clusters were counted as single events of mutation in the phylogeny. ### Structural analysis of 1163 and 1167 S amino acid replacements The atomic coordinates for S protein in pre-fusion state was retrieved from the CHARMM-GUI COVID-19 Archive ([http://www.charmm-gui.org/docs/archive/covid19](http://www.charmm-gui.org/docs/archive/covid19)). The atomic coordinates for S protein in post-fusion sate were retrieved from Protein Data Bank (PDB: 6XRA18 and PDB: 6LXT81). Mutations were introduced using single mutation tool embedded in COOT82and figures were generated with PyMOL ([www.pymol.org](http://www.pymol.org)). ### SARS-CoV-2 pseudotyped vesicular stomatitis virus production, titration, and thermal stability evaluation Mutations were introduced into a plasmid encoding a codon-optimized S protein16 by site directed mutagenesis (see Supplementary table8 for primers). All mutations were verified by Sanger sequencing (see Supplementary table 9 for primers). To evaluate the efficiency of virus production, three transfections in HEK293 cells (CRL-1573 from ATCC) were performed for each plasmid to generate pseudotyped VSV harbouring the indicated S protein83. The titers of the virus produced were then assayed by serial dilution, followed by infection of either Vero cells (CCL-81 from ATCC) or A549 cells expressing ACE2 and TMPRSS2 (InvivoGen catalog code a549-hace2tpsa), and counting of GFP positive cells (focus forming units; FFU) at 16 hours post infection. Statistical comparisons were performed by unpaired t-test (R package: stats v 3.6.1) with normalized logarithmic data. For assessing thermal stability, 1000 FFU (as measured on Vero cells) were incubated for 15 minutes at 30.4, 31.4, 33, 35.2, 38.2, 44.8, 47, 48.6 or 49.6°C before addition to Vero cells previously seeded in a 96 well plate (10,000 cells/well). GFP signal in each well was determined 16 hours post-infection using an Incucyte S3 (Essen Biosciences). The mean GFP signal observed in several mock-infected wells was subtracted from all infected wells, followed by standardization of the GFP signal to the mean GFP signal from wells incubated at 30.4°C. Finally, a three parameter log-logistic function was fitted to the data using the drc package v 3.0-1 in R (LL.3 function) and the temperature resulting in 50% inhibition calculated using the drc ED function. Statistical differences in the temperature resulting in 50% reduction of infection was evaluated using the drc EDcomp function. The data and scripts for analysing the temperature resistance of the different mutants is available at [https://github.com/PathoGenOmics/B.1.177.637_SARS-CoV-2](https://github.com/PathoGenOmics/B.1.177.637_SARS-CoV-2). ### Evaluation of neutralization by convalescent sera and efficacy of virus particle production Pseudotyped VSV bearing 20E or B.1.177.637 S variants were evaluated for sensitivity to neutralization by convalescent sera as previously described83 with slight modifications. Briefly, 16-hours post-infection, GFP signal in each well was determined using an Incucyte S3 (Essen Biosciences). The mean GFP signal observed in several mock-infected wells was subtracted from all infected wells, followed by standardization of the GFP signal in each well infected with antibody-treated virus to that of the mean GFP signal from wells infected with mock-treated virus. Any negative values resulting from background subtraction were arbitrarily assigned a low, non-zero value (10-5). The serum dilutions were then converted to their reciprocal, their logarithm (Log10) was taken, and the dose resulting in 50% (ID50) or 80% (ID80) reduction in GFP signal was calculated in R using the drc package v 3.0-1. A two-parameter log-logistic regression (LL2 function) was used for all samples except when a three-parameter logistic regression provided a significant improvement to fit, as judged by the ANOVA function in the drc package (e.g. p < 0.05 following multiple testing correction using the Bonferroni method). The script for calculating the ID50 and ID80 as well as the standardized GFP signal for each condition is available at [https://github.com/PathoGenOmics/B.1.177.637_SARS-CoV-2](https://github.com/PathoGenOmics/B.1.177.637_SARS-CoV-2). For the first wave, serum samples and data from patients included in this study were provided by the Consorcio Hospital General de Valencia Biobank, integrated in the Valencian Biobanking Network, and they were processed following standard operating procedures with the appropriate approval of the Ethics and Scientific Committees. All first wave samples were obtained from donors that were admitted to the intensive care unit and were collected during April 2020. For the second wave donors, sera were obtained (October 2020) from severe COVID-19 patients requiring inpatient treatment at Hospital Universitario y Politécnico La Fe de Valencia. Similarly, samples from immunized donors were collected at Hospital Universitario y Politécnico La Fe de Valencia from hospital health-workers, with no previous history of SARS-COV-2 infection, and after receiving a second dose of Pfizer-BioNTech COVID-19 vaccine (BNT162b2; February 2021). All samples from Hospital Universitario y Politécnico La Fe de Valencia were collected after informed written consent and the project has been approved by the ethical committee and institutional review board (registration number 2020-123-1). ## Supporting information Supplementary video 1 [[supplements/253075_file03.mov]](pending:yes) Supplementary tables [[supplements/253075_file04.zip]](pending:yes) ## Data Availability Data referred to this manuscript is available as supplemental material, and code can be found at https://github.com/PathoGenOmics/B.1.177.637_SARS-CoV-2. The analysis pipeline used to map and analyze the sequences is available at https://gitlab.com/fisabio-ngs/sars-cov2-mapping [https://github.com/PathoGenOmics/B.1.177.637\_SARS-CoV-2](https://github.com/PathoGenOmics/B.1.177.637_SARS-CoV-2) [https://gitlab.com/fisabio-ngs/sars-cov2-mapping](https://gitlab.com/fisabio-ngs/sars-cov2-mapping) ## Funding M.C. and R.G. are supported by Ramón y Cajal program from Ministerio de Ciencia. This work was funded by the Instituto de Salud Carlos III project COV20/00140 and COV20/00437, Spanish National Research Council project CSIC-COV19-021 and CSIC-COVID19-082, and the Generalitat Valenciana (SEJI/2019/011 and Covid_19-SCI). Action co-financed by the European Union through the Operational Program of the European Regional Development Fund (ERDF) of the Valencian Community 2014-2020. ## Author contributions PRR: Conceptualization, Methodology, Formal Analysis, Investigation, Visualization, Writing Original Draft. CFG: Conceptualization, Methodology, Formal Analysis, Writing Original Draft. ACO: Formal Analysis, Review and edit draft. MGL: Formal Analysis, Review and edit draft. SJS: Software, Validation, Review and edit draft. ICM: Methodology, Resources, Review and edit draft. PRH: Investigation, Review and edit draft. MTP: Methodology, Resources, Review and edit draft. MAB: Investigation, Methodology, Review and edit draft. GD: Methodology, Software, Data Curation, Review and edit draft. LLM: Methodology, Project Administration, Review and edit draft. MG: Resources, Review and edit draft. MMA: Resources, Review and edit draft. MDG: Resources, Review and edit draft. JLP: Resources, Review and edit draft. FG: Funding, Project Administration, Supervision, Review and edit draft. IC: Funding, Project Administration, Supervision, Review and edit draft. AM: Formal analysis, Writing Original Draft. RG: Conceptualization, Methodology, Formal Analysis, Writing Original Draft, Supervision, Funding. MC: Conceptualization, Methodology, Investigation, Formal Analysis, Writing Original Draft, Supervision, Funding. ## List of SeqCOVID consortium members Iñaki Comas (icomas{at}ibv.csic.es), Fernando González-Candelas (fernando.gonzalez{at}uv.es), Galo A. Goig-Serrano (ggoig{at}ibv.csic.es), Álvaro Chiner-Oms (achiner{at}ibv.csic.es), Irving Cancino-Muñoz (icancino{at}ibv.csic.es), Mariana Gabriela López (mglopez{at}ibv.csic.es), Manoli Torres-Puente (mtorres{at}ibv.csic.es), Inmaculada Gómez (igomez{at}ibv.csic.es), Santiago Jiménez-Serrano (sjimenez{at}ibv.csic.es), Lidia Ruiz-Roldán (lidiarroldan{at}gmail.com), María Alma Bracho (bracho_alm{at}gva.es), Neris García-González (neris{at}uv.es), Llúcia Martínez Priego (martinez_lucpri{at}gva.es), Inmaculada Galán-Vendrell (galan_inm{at}gva.es), Paula Ruiz-Hueso (ruiz_pau{at}gva.es), Griselda De Marco (demarco_gri{at}gva.es), Mª Loreto Ferrús Abad (ferrus_mlo{at}gva.es), Sandra Carbó-Ramírez (carbo_sanram{at}gva.es), Mireia Coscollá (mireia.coscolla{at}uv.es), Paula Ruiz Rodríguez (ruizro5{at}alumni.uv.es), Giuseppe D’Auria (dauria_giu{at}gva.es), Francisco Javier Roig Sena (roig_fco{at}gva.es), Hermelinda Vanaclocha Luna (vanaclocha_her{at}gva.es), Isabel San Martín Bastida (isanmartin{at}rjb.csic.es), Daniel García Souto (danielgarciasouto{at}gmail.com), Ana Pequeño Valtierra (ana.pequeno{at}usc.es), Jose M. C. Tubio (jmctubio{at}gmail.com), Fco. Javier Temes Rodríguez (fjtemes{at}gmail.com), Jorge Rodríguez-Castro (jorge.rodriguez{at}usc.es), Martín Santamarina García (martin.santamarina.garcia{at}usc.es), Nuria Rabella Garcia (nrabella{at}santpau.cat), Ferrán Navarro Risueño (fnavarror{at}santpau.cat), Elisenda Miró Cardona (emiro{at}santpau.cat), Manuel Rodríguez-Iglesias (manuel.rodrigueziglesias{at}uca.es), Fátima Galán-Sanchez (fatima.galan{at}uca.es), Salud Rodríguez-Pallares (salud361{at}gmail.com), María de Toro (mthernando{at}riojasalud.es), María Pilar Bea-Escudero (mpbea{at}riojasalud.es), José Manuel Azcona-Gutiérrez (jmazcona{at}riojasalud.es), Miriam Blasco-Alberdi (mblasco{at}riojasalud.es), Alfredo Mayor (alfredo.mayor{at}isglobal.org), Alberto L. GarcÍa-Basteiro (alberto.garcia-basteiro{at}isglobal.org), Gemma Moncunill (gemma.moncunill{at}isglobal.org), Carlota Dobaño (carlota.dobano{at}isglobal.org), Pau Cisteró (pau.cistero{at}isglobal.org), Oriol Mitjà (omitja{at}flsida.org), Camila González-Beiras (cgonzalez{at}flsida.org), Martí Vall-Mayans (mvall{at}flsida.org), Marc Corbacho-Monné (mcorbacho{at}flsida.org), Andrea Alemany (ealemany{at}gmail.com), Darío García de Viedma (dgviedma2{at}gmail.com), Laura Pérez-Lago (lperezg00{at}gmail.com), Marta Herranz (m_herranz01{at}hotmail.com), Jon Sicilia (jsiciliamambrilla{at}gmail.com), Pilar Catalán (pcatalan.hgugm{at}salud.madrid.org), Julia Suárez (julia.suarez{at}iisgm.com), Patricia Muñoz (pmunoz{at}hggm.es), Cristina Muñoz-Cuevas (cristina.munozc{at}salud-juntaex.es), Guadalupe Rodríguez Rodríguez (guadalupe.rodriguez{at}salud-juntaex.es), Juan Alberola Enguídanos (juan.alberola{at}uv.es), Jose Miguel Nogueira Coito (Jose.M.Nogueira{at}uv.es), Juan José Camarena Miñana (juan.camarena{at}uv.es), Antonio Rezusta López (arezusta{at}unizar.es), Alexander Tristancho Baró (aitristancho{at}salud.aragon.es), Ana Milagro Beamonte (amilagro{at}salud.aragon.es), Nieves Martínez Cameo (nmcameo{at}gmail.com), Yolanda Gracia Grataloup (ygrataloup{at}yahoo.es), Elisa Martró (emartro{at}igtp.cat), Antoni E. Bordoy (aescalas{at}igtp.cat), Anna Not (anot{at}igtp.cat), Adrián Antuori (adrian.antuori{at}gmail.com), Anabel Fernández (afernandezn.ics{at}gencat.cat), Nona Romaní (nonaromani{at}gmail.com), Rafael Benito Ruesca (rbenito{at}unizar.es), Sonia Algarate Cajo (sonialgarate{at}gmail.com), Jessica Bueno Sancho (jbuenosan{at}salud.aragon.es), Jose Luis del Pozo (jdelpozo{at}unav.es), Jose Antonio Boga Riveiro (joseantonio.boga{at}sespa.es), Cristián Castelló Abietar (crcaab{at}hotmail.com), Susana Rojo Alba (ssnrj4{at}gmail.com), Marta Elena Álvarez Argüelles (martaealvarez{at}gmail.com), Santiago Melón García (santiago.melon{at}sespa.es), Maitane Aranzamendi Zaldumbide (maitane.aranzamendizaldumbide{at}osakidetza.eus), Óscar Martínez Expósito (“OSCAR.MARTINEZEXPOSITO{at}osakidetza.eus”), Mikel Gallego Rodrigo (“MIKEL.GALLEGORODRIGO{at}osakidetza.eus”), Maialen Larrea Ayo (“MAIALEN.LARREAAYO{at}osakidetza.eus”), Nerea Antona Urieta (“NEREA.ANTONAURIETA{at}osakidetza.eus”), Andrea Vergara Gómez (vergara{at}clinic.cat), Miguel J Martínez Yoldi (myoldi{at}clinic.cat), Jordi Vila Estapé (jvila{at}clinic.cat), Elisa Rubio García (elrubio{at}clinic.cat), Aida Peiró-Mestres (aida.peiro{at}isglobal.org), Jessica Navero-Castillejos (jessica.navero{at}isglobal.org), David Posada (dposada{at}uvigo.es), Diana Valverde (dianaval{at}uvigo.es), Nuria Estévez (nuestevez{at}uvigo.es), Iria Fernández-Silva (irfernandez{at}uvigo.es), Loretta de Chiara (Ldechiara{at}uvigo.es), Pilar Gallego (mgallego{at}alumnos.uvigo.es), Nair Varela (nvarela{at}alumnos.uvigo.es), Rosario Moreno Muñoz (moreno_rosmuny{at}gva.es), Mª Dolores Tirado Balaguer (tirado_dolbal{at}gva.es), Ulises Gómez-Pinedo (ulisesalfonso.gomez{at}salud.madrid.org), Mónica Gozalo Margüello (monica.gozalo{at}scsalud.es), Mª Eliecer Cano García (meliecer.cano{at}scsalud.es), José Manuel Méndez Legaza (josemanuel.mendez{at}scsalud.es), Jesús Rodríguez Lozano (jesus.rodriguez{at}scsalud.es), María Siller Ruiz (maria.siller{at}scsalud.es), Daniel Pablo Marcos (daniel.pablo{at}scsalud.es), Antonio Oliver (antonio.oliver{at}ssib.es), Jordi Reina (jorge.reina{at}ssib.es), Carla López-Causapé (carla.lopez{at}ssib.es), Andrés Canut Blasco (andres.canutblasco{at}osakidetza.eus), Silvia Hernáez Crespo (silvia.hernaezcrespo{at}osakidetza.eus), Mª Luz Cordón Rodríguez (marialuzalbina.cordonrodriguez{at}osakidetza.eus), Mª Concepción Lecaroz Agara (mariaconcepcion.lecarozagara{at}osakidetza.eus), Carmen Gómez González (carmen.gomezgonzalez{at}osakidetza.eus), Amaia Aguirre Quiñonero (amaia.aguirrequinonero{at}osakidetza.eus), José Israel López Mirones (joseisrael.lopezmirones{at}osakidetza.eus), Marina Fernández Torres (marina.fernandeztorres{at}osakidetza.eus), Mª Rosario Almela Ferrer (mariadelrosario.almelaferrer{at}osakidetza.eus), José Antonio Lepe Jiménez (josea.lepe.sspa{at}juntadeandalucia.es), Verónica González Galán (veronica.gonzalez.galan.sspa{at}juntadeandalucia.es), Ángel Rodríguez Villodres (angel.rodriguez.villodres.sspa{at}juntadeandalucia.es), Nieves Gonzalo Jiménez (gonzalo_nie{at}gva.es), Mª Montserrat Ruiz García (ruiz_mongar{at}gva.es), Antonio Galiana Cabrera (antoniogaliana1{at}gmail.com), Judith Sánchez-Almendro (judhsa{at}gmail.com), Gustavo Cilla Eguiluz (CARLOSGUSTAVOSANTIAGO.CILLAEGUILUZ{at}osakidetza.eus), Milagrosa Montes Ros (MARIAMILAGROSA.MONTESROS{at}osakidetza.eus), Luis Piñeiro Vázquez (LUISDARIO.PINEIROVAZQUEZ{at}osakidetza.eus), Ane Sorrarain (ane.sorarrain{at}biodonostia.org), José María Marimón Ortiz de Zarate (JOSEMARIA.MARIMONORTIZDEZ{at}osakidetza.eus), Mª Dolores Gómez Ruiz, (gomez.mdo{at}gmail.com), Eva González Barberá (gonzalez_evabar{at}gva.es), José Luis López Hontangas (lopez_jlu{at}gva.es), José María Navarro-Marí (josem.navarro.sspa{at}juntadeandalucia.es), Irene Pedrosa Corral (irene.pedrosa.sspa{at}juntadeandalucia.es), Sara Sanbonmatsu Gámez (saral.sanbonmatsu.sspa{at}juntadeandalucia.es), M. Carmen Perez Gonzalez (mcpergon{at}gobiernodecanarias.org), Francisco Javier Chamizo López (fchalop{at}gobiernodecanarias.org), Ana Bordes Benítez (aborben{at}gobiernodecanarias.org), David Navarro Ortega (david.navarro{at}uv.es), Eliseo Albert Vicent (eliseo.al.vi{at}gmail.com), Ignacio Torres (nachotfink{at}gmail.com), Mª Isabel Gascón Ros (gascon_isa{at}gva.es), Cristina Torregrosa Hetland (cjtorregrosahetland{at}hotmail.com), Eva Pastor Boix (pastor_eva{at}gva.es), Paloma Cascales Ramos (cascales_pal{at}gva.es), Begoña Fuster Escrivá (begona.fuster{at}gmail.com), Concepción Gimeno Cardona (concepcion.gimeno{at}uv.es), María Dolores Ocete Mochón (ocete_mar{at}gva.es), Rafael Medina González (rafa.medina.gonzalez{at}gmail.com), Julia González Cantó (juliagonzalez1992{at}hotmail.com), Olalla Martínez Macias (martinez_ola{at}gva.es), Begoña Palop Borrás (bpalop{at}hotmail.com), Inmaculada de Toro Peinado (inmadetoro{at}yahoo.es), Mª Concepción Mediavilla Gradolph (gradolphilla{at}hotmail.com), Mercedes Pérez Ruiz (mercedes.perez.ruiz.sspa{at}juntadeandalucia.es), Oscar González-Recio (gonzalez.oscar{at}inia.es), Mónica Gutiérrez-Rivas (mgrivas9{at}gmail.com), Encarnación Simarro Córdoba (mesimarro{at}sescam.jccm.es), Julia Lozano Serra (jlozanos{at}sescam.jccm.es), Lorena Robles Fonseca (lrobles{at}sescam.jccm.es), Adolfo de Salazar (adolsalazar{at}gmail.com), Laura Viñuela (lauravinuelagon{at}gmail.com), Natalia Chueca (naisses{at}yahoo.es), Federico García (fegarcia{at}ugr.es), Cristina Gomez-Camarasa (gomezcamarasa{at}gmail.com), Ana Carvajal (ana.carvajal{at}unileon.es), Vicente Martín (vicente.martin{at}unileon.es), Juan Fregeneda (juan.fregeneda{at}unileon.es), Antonio J. Molina (ajmolt{at}unileon.es), Héctor Arguello (hector.arguello{at}unileon.es), Tania Fernandez-Villa (tferv{at}unileon.es), Amparo Farga Martí (farga_amp{at}gva.es), Rocío Falcón (falcon_roc{at}gva.es), Victoria Domínguez Márquez (m.victoria.dominguez{at}uv.es), José Javier Costa Alcalde (jose.javier.costa.alcalde{at}sergas.es), Rocío Trastoy Pena (rocio.trastoy.pena{at}sergas.es), Gema Barbeito Castiñeiras (gema.barbeito.castineiras{at}sergas.es), Amparo Coira Nieto (amparo.coira.nieto{at}sergas.es), María Luisa Pérez del Molino Bernal (maria.luisa.perez.del.molino.bernal{at}sergas.es), Antonio Aguilera (antonio.aguilera.guirao{at}sergas.es), Anna M. Planas (anna.planas{at}iibb.csic.es), Álex Soriano (asoriano{at}clinic.cat), Israel Fernández-Cádenas (israelcadenas{at}yahoo.es), Jordi Pérez-Tur (jpereztur{at}ibv.csic.es), Mª Ángeles Marcos Maeso (mmarcos{at}clinic.cat), Carmen Ezpeleta Baquedano (cezpeleb{at}navarra.es), Ana Navascués Ortega (ana.navascues.ortega{at}navarra.es), Ana Miqueleiz Zapatero (ana.miqueleiz.zapatero{at}navarra.es), Manuel Segovia Hernández (msegovia{at}um.es), Antonio Moreno Docón (a.moreno{at}um.es), Esther Viedma Moreno (ester.viedma{at}salud.madrid.org), Jesús Mingorance (jesus.mingorance{at}idipaz.es), Juan Carlos Galán Montemayor (juancarlos.galan{at}salud.madrid.org), Iván Sanz Muñoz (isanzm{at}saludcastillayleon.es), Diana Pérez San José (dianaperezsj{at}gmail.com), Maria Gil Fortuño (gil_marfor{at}gva.es), Juan B. Bellido Blasco (bellido_jua{at}gva.es), Alberto Yagüe Muñoz (yague_alb{at}gva.es), Noelia Henández Pérez (hernandez_noeper{at}gva.es), Helena Buj Jordá (helenitabuj{at}hotmail.com), Óscar Pérez Olaso (perez_oscola{at}gva.es), Alejandro González Praetorius (agonzalezp{at}sescam.jccm.es), Aida Esperanza Ramírez Marinero (aramirezm{at}lrc.cat), Eduardo Padilla León (epadillal{at}lrc.cat), Alba Vilas Basil (avilasb{at}lrc.cat), Mireia Canal Aranda (mcanala{at}lrc.cat), Albert Bernet Sánchez (abernet.lleida.ics{at}gencat.cat), Alba Bellés Bellés (abelles.lleida.ics{at}gencat.cat), Eric López González (elopezg.lleida.ics{at}gencat.cat), Iván Prats Sánchez (iprats{at}gss.cat), Mercè García González (mgarciag.lleida.ics{at}gencat.cat), Miguel Martínez Lirola (miguelj.martinez.lirola.sspa{at}juntadeandalucia.es), Maripaz Ventero Martín (ventero_marmar{at}gva.es), Carmen Molina Pardines (molina_carpar{at}gva.es), Nieves Orta Mira (orta_nie{at}gva.es), María Navarro Cots (navarro_dia{at}gva.es), Inmaculada Vidal Catalá (vidal_inm{at}gva.es), Isabel García Nava (garcianava{at}hotmail.com), Soledad Illescas Fernández-Bermejo (msillescasf{at}sescam.jccm.es), José Martínez-Alarcón (jmalarcon{at}sescam.jccm.es), Marta Torres-Narbona (mtorresn{at}sescam.jccm.es), Cristina Colmenarejo (ccolmenarejo{at}sescam.jccm.es), Lidia García-Agudo (lgagudo{at}sescam.jccm.es), Jorge Alfredo Pérez García (jorgep{at}sescam.jccm.es), Martín Yago López (yago_mar{at}gva.es), María Ángeles Goberna Bravo (mariangoberna{at}gmail.com), Carolina Pla Cortes (caly.cortes{at}gmail.com), Noelia Lozano Rodríguez (lozano_noe{at}gva.es), Nieves Aparici Valero (aparici_nie{at}gva.es), Sandra Moreno Marro (moreno_sanmar{at}gva.es), Agustín Irazo Tatay (iranzo_agu{at}gva.es), Isabel Mariscal Pieper (mariscal_isa{at}gva.es), Mª Pilar Ramos (ramos_marrei{at}gva.es), Mónica Parra Grande (parra_mongra{at}gva.es), Bárbara Gómez Alonso (gomez_baralo{at}gva.es), Francisco José Arjona Zaragozí (arjona_fra{at}gva.es), Amparo Broseta Tamarit (amparo2400{at}gmail.com), Juan José Badiola Díez (badiola{at}unizar.es), Alicia Otero García (aliciaogar{at}unizar.es), Eloísa Sevilla Romeo (esevillr{at}unizar.es), Belén Marín González (belenm{at}unizar.es), Mirta García Martínez (gmmirta{at}hotmail.com), Marina Betancor Caro (mbetancorcaro{at}gmail.com), Diego Sola Fraca (diegosola95{at}gmail.com), Sonia Pérez Lázaro (soniperez97{at}gmail.com), Eva Monleón Moscardó (emonleon{at}unizar.es), Marta Monzón Garcés (mmonzon{at}unizar.es), Cristina Acín Tresaco (crisacin{at}unizar.es), Rosa Bolea Bailo (rbolea{at}unizar.es), Bernardino Moreno Burgos (bmoreno{at}unizar.es), Amparo Broseta Tamarit (abroseta{at}hospitalmanises.es), Carlos Gulin Blanco (cgulin{at}hospitalmanises.es) ## Supplementary Material **Supplementary video 1.** Geographical transmission of B.1.177.637 visualized with Nextstrain build. ## Supplementary Tables **Supplementary table 1.** Accession numbers for analysed sarbecoviruses. **Supplementary table 2.** Defining SNPs for B.1.177.637, B.1.177.637.V2, and VOI1163.654. **Supplementary table 3.** Accession numbers of sequences of interest for mutated amino acids 484, 501, 1163 and 1167 of the S protein. **Supplementary table 4.** Accession numbers of 5,017 Spanish sequences from SeqCOVID-SPAIN consortium. **Supplementary table 5.** Accession numbers of 270,869 analysed sequences. **Supplementary table 6.** Accession numbers of 10,450 analysed sequences. **Supplementary table 7.** Accession numbers of 3,067 analysed sequences belonging to 20I/501Y.V1. **Supplementary table 8.** Primers for site directed mutagenesis of plasmid encoding codon-optimized S protein. **Supplementary table 9.** Primers for sanger sequencing to detect mutations of interest. ## Supplementary Figures ![Supplementary Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/03/12/2021.03.08.21253075/F5.medium.gif) [Supplementary Figure 1.](http://medrxiv.org/content/early/2021/03/12/2021.03.08.21253075/F5) Supplementary Figure 1. Temporal distribution of mutated samples coloured by region. **a.** Distribution of S protein amino acid replacement D1163Y over the pandemic (N=1874). **b.** Distribution of S protein amino acid replacement G1167V over time (N=1708). ![Supplementary Figure 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/03/12/2021.03.08.21253075/F6.medium.gif) [Supplementary Figure 2.](http://medrxiv.org/content/early/2021/03/12/2021.03.08.21253075/F6) Supplementary Figure 2. Maximum-likelihood phylogenies for different PANGO lineages. **a.** Complete phylogeny coloured by the PANGO lineages. **b.** Phylogeny of B.53 lineage. The circle represents sequences with D1163 amino acid replacement. **c.** Phylogeny of B.53 lineage. The inner circle represents sequences with D1163 amino acid replacements, and the external circle represents sequences with G1167 amino acid replacements. **d.** Phylogeny of A lineage. The circle represents sequences with D1163 amino acid replacements. **e.** Phylogeny of B.1 and derived lineages. The inner circle represents sequences with D1163 amino acid replacements, and the external circle represents sequences with G1167 amino acid replacements. **f.** Phylogeny of B.1.1 and derivative D.1 lineages. The inner circle represents sequences with D1163 amino acid replacements, and the external circle represents sequences with G1167 amino acid replacements. **g.** Phylogeny of B.1.177 and B.1.177.637 lineages. The inner circle represents sequences with D1163 amino acid replacements, and the external circle represents sequences with G1167 amino acid replacements. The scale bar of each indicates the number of nucleotide substitutions per site. The legend of positions 1163 and 1167 is common for all represented panels. ![Supplementary Figure 3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/03/12/2021.03.08.21253075/F7.medium.gif) [Supplementary Figure 3.](http://medrxiv.org/content/early/2021/03/12/2021.03.08.21253075/F7) Supplementary Figure 3. Whole genome mutated positions in genotypes with changes 1163Y and 1167V of the S protein. B.1.177.637 is coloured in magenta and cluster 1163.654 is coloured in orange. Other less frequent genotypes (found in at least 20 sequences) that include changes in S position 1163 and/or 1167 are coloured in turquoise. Cluster 1163.654 is coloured in navy blue. Line width is proportional to the frequency of the genotype. Sites 1163 and 1167 in the S protein are indicated by magenta star symbols, and position 484 in S, whose mutations are associated with antigenicity changes, is indicated by a navy blue star. ![Supplementary Figure 4.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/03/12/2021.03.08.21253075/F8.medium.gif) [Supplementary Figure 4.](http://medrxiv.org/content/early/2021/03/12/2021.03.08.21253075/F8) Supplementary Figure 4. Maximum-likelihood phylogeny of 3,067 genomes belonging to 20I/501Y.V1, rooted with Wuhan reference sequence. Mutated sequences with 1163 and/or 1167 amino acids of the S protein are coloured in the circle. The scale bar indicates the number of nucleotide substitutions per site. The biggest clades are collapsed with isosceles triangles. ![Supplementary Figure 5.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/03/12/2021.03.08.21253075/F9.medium.gif) [Supplementary Figure 5.](http://medrxiv.org/content/early/2021/03/12/2021.03.08.21253075/F9) Supplementary Figure 5. Maximum-likelihood phylogeny of 3,266 SARS-CoV-2 genomes representing 20E clade rooted with Wuhan reference sequence. Sequences from B.1.177.637 are coloured in magenta, sequences not identified as B.1.177.637 are coloured in green, and cluster B.1.177.637.V2 (S protein amino acid replacements: A222V, D614G, E484K, D1163Y, and 141-144Del) is coloured in blue. The scale bar indicates the number of nucleotide substitutions per site. The biggest clades are collapsed, represented as isosceles triangles. ![Supplementary Figure 6.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/03/12/2021.03.08.21253075/F10.medium.gif) [Supplementary Figure 6.](http://medrxiv.org/content/early/2021/03/12/2021.03.08.21253075/F10) Supplementary Figure 6. Neutralization of the different mutated S protein variants by convalescent sera from six individuals infected during the first epidemic wave. The reciprocal titer at which each of the different convalescent sera neutralizes the different variants by 50% is indicated. Plotted are the mean and standard error of (n=3). ## Acknowledgments We want to particularly acknowledge the patients and the Consorcio Hospital General de Valencia Biobank integrated in the Valencian Biobanking Network for their collaboration, as well as the patients and hospital staff at Hospital Universitario y Politécnico La Fe de Valencia. In addition, the authors would like to thank Gert Zimmer (Institute of Virology and Immunology, Mittelhäusern/Switzerland), Stefan Pohlmann, and Markus Hoffmann (German Primate Center, Infection Biology Unit, Goettingen/Germany) for providing the reagents required for the generation of VSV pseudotyped viruses and the codon-optimized S plasmid. Also, we want to acknowledge all the efforts from different laboratories and authorities submitting all possible sequences of SARS-CoV-2 worldwide and making them available on the GISAID platform. * Received March 8, 2021. * Revision received March 8, 2021. * Accepted March 12, 2021. * © 2021, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at [http://creativecommons.org/licenses/by-nc-nd/4.0/](http://creativecommons.org/licenses/by-nc-nd/4.0/) ## References 1. 1.Elbe, S. & Buckland-Merrett, G. Data, disease and diplomacy: GISAID’s innovative contribution to global health. Glob Chall 1, 33–46 (2017). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/gch2.1018&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=31565258&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F03%2F12%2F2021.03.08.21253075.atom) 2. 2.Hadfield, J. et al. Nextstrain: real-time tracking of pathogen evolution. Bioinformatics 34, 4121–4123 (2018). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10/gdkbqx&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F03%2F12%2F2021.03.08.21253075.atom) 3. 3.Lauring, A.S. & Hodcroft, E.B. Genetic Variants of SARS-CoV-2-What Do They Mean? Jama 325, 529–531 (2021). 4. 4.WHO. COVID-19 Weekly Epidemiological Update 2021-02-25 . ([https://www.who.int/docs/default-source/coronaviruse/situation-reports/20210225\_weekly\_epi\_update\_voc-special-edition.pdf](https://www.who.int/docs/default-source/coronaviruse/situation-reports/20210225\_weekly_epi_update_voc-special-edition.pdf), 2021). 5. 5.Volz, E. et al. Evaluating the Effects of SARS-CoV-2 Spike Mutation D614G on Transmissibility and Pathogenicity. Cell 184, 64–75.e11 (2020). 6. 6.Hodcroft, E.B., et al. Emergence and spread of a SARS-CoV-2 variant through Europe in the summer of 2020. medRxiv, 2020.10.25.20219063. Preprint at [https://www.medrxiv.org/content/10.1101/2020.10.25.20219063v2](https://www.medrxiv.org/content/10.1101/2020.10.25.20219063v2) (2020). 7. 7.Oude Munnink, B.B., et al. Transmission of SARS-CoV-2 on mink farms between humans and mink and back to humans. Science 371, 172 (2021). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEyOiIzNzEvNjUyNS8xNzIiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMS8wMy8xMi8yMDIxLjAzLjA4LjIxMjUzMDc1LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 8. 8.Richard, D., Owen, C.J., van Dorp, L. & Balloux, F. No detectable signal for ongoing genetic recombination in SARS-CoV-2. bioRxiv, 2020.12.15.422866. Preprint at [https://www.biorxiv.org/content/10.1101/2020.12.15.422866v1](https://www.biorxiv.org/content/10.1101/2020.12.15.422866v1) (2020). 9. 9.Welkers, M.R.A., Han, A.X., Reusken, C. & Eggink, D. Possible host-adaptation of SARS-CoV-2 due to improved ACE2 receptor binding in mink. Virus Evol 7, veaa094 (2021). 10. 10.Young, B.E. et al. Effects of a major deletion in the SARS-CoV-2 genome on the severity of infection and the inflammatory response: an observational cohort study. Lancet 396, 603–611 (2020). [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F03%2F12%2F2021.03.08.21253075.atom) 11. 11.Chand, M., et al. Investigation of novel SARS-COV-2 variant Variant of Concern 202012/01. (Public Health England, [https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment\_data/file/959438/Technical\_Briefing\_VOC\_SH\_NJL2\_SH2.pdf](https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment\_data/file/959438/Technical\_Briefing\_VOC_SH_NJL2_SH2.pdf), 2020). 12. 12.Walls, A.C., et al. Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein. Cell 181, 281–292.e6 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=doi.org/10.1016/j.cell.2020.02.058&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F03%2F12%2F2021.03.08.21253075.atom) 13. 13.Salvatori, G. et al. SARS-CoV-2 SPIKE PROTEIN: an optimal immunological target for vaccines. Journal of Translational Medicine 18, 222 (2020). 14. 14.Turoňová, B. et al. In situ structural analysis of SARS-CoV-2 spike reveals flexibility mediated by three hinges. Science 370, 203 (2020). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEyOiIzNzAvNjUxMy8yMDMiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMS8wMy8xMi8yMDIxLjAzLjA4LjIxMjUzMDc1LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 15. 15.Hoffmann, M., Kleine-Weber, H. & Pöhlmann, S. A Multibasic Cleavage Site in the Spike Protein of SARS-CoV-2 Is Essential for Infection of Human Lung Cells. Mol Cell 78, 779–784.e5 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.molcel.2020.04.022&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F03%2F12%2F2021.03.08.21253075.atom) 16. 16.Hoffmann, M. et al. SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor. Cell 181, 271–280.e8 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.cell.2020.02.052&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F03%2F12%2F2021.03.08.21253075.atom) 17. 17.Kielian, M. Mechanisms of Virus Membrane Fusion Proteins. Annu Rev Virol 1, 171–89 (2014). 18. 18.Cai, Y. et al. Distinct conformational states of SARS-CoV-2 spike protein. Science 369, 1586 (2020). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEzOiIzNjkvNjUxMS8xNTg2IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjEvMDMvMTIvMjAyMS4wMy4wOC4yMTI1MzA3NS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 19. 19.Korber, B. et al. Tracking Changes in SARS-CoV-2 Spike: Evidence that D614G Increases Infectivity of the COVID-19 Virus. Cell 182, 812–827.e19 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.cell.2020.06.043&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F03%2F12%2F2021.03.08.21253075.atom) 20. 20.Hou, Y.J. et al. SARS-CoV-2 D614G variant exhibits efficient replication ex vivo and transmission in vivo. Science 370, 1464–1468 (2020). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEzOiIzNzAvNjUyMy8xNDY0IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjEvMDMvMTIvMjAyMS4wMy4wOC4yMTI1MzA3NS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 21. 21.Plante, J.A., et al. Spike mutation D614G alters SARS-CoV-2 fitness. Nature (2020). 22. 22.Zhou, B., et al. SARS-CoV-2 spike D614G change enhances replication and transmission. Nature (2021). 23. 23.Daniloski, Z. et al. The Spike D614G mutation increases SARS-CoV-2 infection of multiple human cell types. Elife 10(2021). 24. 24.Larsen, H.D. et al. Preliminary report of an outbreak of SARS-CoV-2 in mink and mink farmers associated with community spread, Denmark, June to November 2020. Euro Surveill 26(2021). 25. 25.Rodrigues, J. et al. Insights on cross-species transmission of SARS-CoV-2 from structural modeling. PLoS Comput Biol 16, e1008449 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pcbi.1008449&link_type=DOI) 26. 26.Hodcroft, E.B. CoVariants: SARS-CoV-2 Mutations and Variants of Interest. ([https://covariants.org/variants/](https://covariants.org/variants/), 2021). 27. 27.Leung, K., Shum, M.H., Leung, G.M., Lam, T.T. & Wu, J.T. Early transmissibility assessment of the N501Y mutant strains of SARS-CoV-2 in the United Kingdom, October to November 2020. Euro Surveill 26(2021). 28. 28.Galloway, S., et al. Emergence of SARS-CoV-2 B.1.1.7 Lineage — United States, December 29, 2020–January 12, 2021. MMWR. Morbidity and Mortality Weekly Report 70(2021). 29. 29.Davies, N.G. et al. Estimated transmissibility and impact of SARS-CoV-2 lineage B.1.1.7 in England. Science, eabg3055 (2021). 30. 30.Wang, Z., et al. mRNA vaccine-elicited antibodies to SARS-CoV-2 and circulating variants. Nature (2021). 31. 31.Wibmer, C.K., et al. SARS-CoV-2 501Y.V2 escapes neutralization by South African COVID-19 donor plasma. bioRxiv, 2021.01.18.427166. Preprint at [https://www.biorxiv.org/content/10.1101/2021.01.18.427166v2](https://www.biorxiv.org/content/10.1101/2021.01.18.427166v2) (2021). 32. 32.Muik, A., et al. Neutralization of SARS-CoV-2 lineage B.1.1.7 pseudovirus by BNT162b2 vaccine-elicited human sera. bioRxiv, 2021.01.18.426984. Preprint at [https://www.biorxiv.org/content/10.1101/2021.01.18.426984v1](https://www.biorxiv.org/content/10.1101/2021.01.18.426984v1) (2021). 33. 33.Madhi, S.A., et al. Safety and efficacy of the ChAdOx1 nCoV-19 (AZD1222) Covid-19 vaccine against the B.1.351 variant in South Africa. medRxiv, 2021.02.10.21251247. Preprint at [https://www.medrxiv.org/content/10.1101/2021.02.10.21251247v1](https://www.medrxiv.org/content/10.1101/2021.02.10.21251247v1) (2021). 34. 34.Washington, N.L., et al. Genomic epidemiology identifies emergence and rapid transmission of SARS-CoV-2 B.1.1.7 in the United States. medRxiv, 2021.02.06.21251159. Preprint at [https://www.medrxiv.org/content/10.1101/2021.02.06.21251159v1](https://www.medrxiv.org/content/10.1101/2021.02.06.21251159v1) (2021). 35. 35.Tegally, H., et al. Emergence and rapid spread of a new severe acute respiratory syndrome-related coronavirus 2 (SARS-CoV-2) lineage with multiple spike mutations in South Africa. medRxiv, 2020.12.21.20248640. Preprint at [https://www.medrxiv.org/content/10.1101/2020.12.21.20248640v1](https://www.medrxiv.org/content/10.1101/2020.12.21.20248640v1) (2020). 36. 36.López, M.G., et al. The first wave of the Spanish COVID-19 epidemic was associated with early introductions and fast spread of a dominating genetic variant. medRxiv, 2020.12.21.20248328. Preprint at [https://www.medrxiv.org/content/10.1101/2020.12.21.20248328v1](https://www.medrxiv.org/content/10.1101/2020.12.21.20248328v1) (2020) (2020). 37. 37.Rambaut, A. et al. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nature Microbiology 5, 1403–1407 (2020). 38. 38.Nelson, G. et al. Molecular dynamic simulation reveals E484K mutation enhances spike RBD-ACE2 affinity and the combination of E484K, K417N and N501Y mutations (501Y.V2 variant) induces conformational change greater than N501Y mutant alone, potentially resulting in an escape mutant. bioRxiv, 2021.01.13.426558. Preprint at [https://www.biorxiv.org/content/10.1101/2021.01.13.426558v1](https://www.biorxiv.org/content/10.1101/2021.01.13.426558v1) (2021). 39. 39.Liu, Z., et al. Landscape analysis of escape variants identifies SARS-CoV-2 spike mutations that attenuate monoclonal and serum antibody neutralization. bioRxiv, 2020.11.06.372037. Preprint at [https://www.biorxiv.org/content/10.1101/2020.11.06.372037v2](https://www.biorxiv.org/content/10.1101/2020.11.06.372037v2) (2021). 40. 40.McCallum, M., et al. N-terminal domain antigenic mapping reveals a site of vulnerability for SARS-CoV-2. bioRxiv, 2021.01.14.426475. Preprint at [https://www.biorxiv.org/content/10.1101/2021.01.14.426475v1](https://www.biorxiv.org/content/10.1101/2021.01.14.426475v1) (2021). 41. 41.Huang, Y., Yang, C., Xu, X.-f., Xu, W. & Liu, S.-w. Structural and functional properties of SARS-CoV-2 spike protein: potential antivirus drug development for COVID-19. Acta Pharmacologica Sinica 41, 1141–1149 (2020). 42. 42.Xia, S., et al. Fusion mechanism of 2019-nCoV and fusion inhibitors targeting HR1 domain in spike protein. Cellular & Molecular Immunology 17, 765–767 (2020). 43. 43.Lu, R., et al. Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding. The Lancet 395, 565–574 (2020). 44. 44.Xia, S. et al. Peptide-Based Membrane Fusion Inhibitors Targeting HCoV-229E Spike Protein HR1 and HR2 Domains. Int J Mol Sci 19(2018). 45. 45.Fan, X., Cao, D., Kong, L. & Zhang, X. Cryo-EM analysis of the post-fusion structure of the SARS-CoV spike glycoprotein. Nature Communications 11, 3618 (2020). 46. 46.Woo, H. et al. Developing a Fully Glycosylated Full-Length SARS-CoV-2 Spike Protein Model in a Viral Membrane. The Journal of Physical Chemistry B 124, 7128–7137 (2020). [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32559081&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F03%2F12%2F2021.03.08.21253075.atom) 47. 47.Zhang, L. et al. SARS-CoV-2 spike-protein D614G mutation increases virion spike density and infectivity. Nature Communications 11, 6013 (2020). 48. 48.Yurkovetskiy, L. et al. Structural and Functional Analysis of the D614G SARS-CoV-2 Spike Protein Variant. Cell 183, 739–751.e8 (2020). 49. 49.Wang, Y., et al. The Infectivity and Antigenicity of Epidemic SARS-CoV-2 Variants in the United Kingdom. Research Square (2021). 50. 50.Berger Rentsch, M. & Zimmer, G. A vesicular stomatitis virus replicon-based bioassay for the rapid and sensitive determination of multi-species type I interferon. PLoS One 6, e25858 (2011). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pone.0025858&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21998709&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F03%2F12%2F2021.03.08.21253075.atom) 51. 51.Weissman, D. et al. D614G Spike Mutation Increases SARS CoV-2 Susceptibility to Neutralization. Cell Host & Microbe 29, 23–31.e4 (2021). 52. 52.Li, Y. et al. Linear epitopes of SARS-CoV-2 spike protein elicit neutralizing antibodies in COVID-19 patients. Cellular & Molecular Immunology 17, 1095–1097 (2020). 53. 53.Mateus, J. et al. Selective and cross-reactive SARS-CoV-2 T cell epitopes in unexposed humans. Science 370, 89–94 (2020). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjExOiIzNzAvNjUxMi84OSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIxLzAzLzEyLzIwMjEuMDMuMDguMjEyNTMwNzUuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 54. 54.Yi, Z. et al. Functional mapping of B-cell linear epitopes of SARS-CoV-2 in COVID-19 convalescent population. Emerg Microbes Infect 9, 1988–1996 (2020). 55. 55.Zhao, J. et al. COVID-19: Coronavirus Vaccine Development Updates. Frontiers in Immunology 11, 3435 (2020). 56. 56.Izda, V., Jeffries, M.A. & Sawalha, A.H. COVID-19: A review of therapeutic strategies and vaccine candidates. Clinical Immunology 222, 108634 (2021). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.clim.2020.108634&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=33217545&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F03%2F12%2F2021.03.08.21253075.atom) 57. 57.Pond, S. Natural selection analysis of global SARS-CoV-2/COVID-19 enabled by data from GISAID. ([https://observablehq.com/@spond/revised-sars-cov-2-analytics-page?collection=@spond/sars-cov-2](https://observablehq.com/@spond/revised-sars-cov-2-analytics-page?collection=@spond/sars-cov-2)). 58. 58.Wang, S. et al. AXL is a candidate receptor for SARS-CoV-2 that promotes infection of pulmonary and bronchial epithelial cells. Cell Research 31, 126–140 (2021). 59. 59.Vita, R. et al. The immune epitope database (IEDB) 3.0. Nucleic Acids Research 43, D405–D412 (2015). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/nar/gku938&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25300482&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F03%2F12%2F2021.03.08.21253075.atom) 60. 60.Tarke, A. et al. Comprehensive analysis of T cell immunodominance and immunoprevalence of SARS-CoV-2 epitopes in COVID-19 cases. Cell Reports Medicine 2, 100204 (2021). 61. 61.Kemp, S.A., et al. SARS-CoV-2 evolution during treatment of chronic infection. Nature (2021). 62. 62.Collier, D.A., et al. SARS-CoV-2 B.1.1.7 sensitivity to mRNA vaccine-elicited, convalescent and monoclonal antibodies. medRxiv, 2021.01.19.21249840. Preprint at [https://www.medrxiv.org/content/10.1101/2021.01.19.21249840v4](https://www.medrxiv.org/content/10.1101/2021.01.19.21249840v4) (2021). 63. 63.Wang, P., et al. Antibody Resistance of SARS-CoV-2 Variants B.1.351 and B.1.1.7. bioRxiv, 2021.01.25.428137. Preprint at [https://www.biorxiv.org/content/10.1101/2021.01.25.428137v2](https://www.biorxiv.org/content/10.1101/2021.01.25.428137v2) (2021). 64. 64.Quick, J. nCoV-2019 sequencing protocol. ([https://www.protocols.io/view/ncov-2019-sequencing-protocol-bbmuik6w.pdf](https://www.protocols.io/view/ncov-2019-sequencing-protocol-bbmuik6w.pdf), 2020). 65. 65.Artic-network/artic-ncov2019. (Artic-network, [https://github.com/artic-network/artic-ncov2019](https://github.com/artic-network/artic-ncov2019), 2020). 66. 66.Grubaugh, N.D. et al. An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar. Genome Biol 20, 8 (2019). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s13059-018-1618-7&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30621750&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F03%2F12%2F2021.03.08.21253075.atom) 67. 67.Wood, D.E. & Salzberg, S.L. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol 15, R46 (2014). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/gb-2014-15-3-r46&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24580807&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F03%2F12%2F2021.03.08.21253075.atom) 68. 68.Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/bty560&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30423086&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F03%2F12%2F2021.03.08.21253075.atom) 69. 69.Ewels, P., Magnusson, M., Lundin, S. & Käller, M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32, 3047–8 (2016). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btw354&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27312411&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F03%2F12%2F2021.03.08.21253075.atom) 70. 70.Rice, P., Longden, I. & Bleasby, A. EMBOSS: The European molecular biology open software suite. Trends in Genetics 16, 276–277 (2000). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0168-9525(00)02024-2&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=10827456&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F03%2F12%2F2021.03.08.21253075.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000087457200010&link_type=ISI) 71. 71.Kumar, S., Stecher, G., Li, M., Knyaz, C. & Tamura, K. MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms. Mol Biol Evol 35, 1547–1549 (2018). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/molbev/msy096&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=29722887&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F03%2F12%2F2021.03.08.21253075.atom) 72. 72.Wu, F. et al. A new coronavirus associated with human respiratory disease in China. Nature 579, 265–269 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-020-2008-3&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F03%2F12%2F2021.03.08.21253075.atom) 73. 73.Katoh, K., Misawa, K., Kuma, K. & Miyata, T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 30, 3059–66 (2002). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/nar/gkf436&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=12136088&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F03%2F12%2F2021.03.08.21253075.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000177154300016&link_type=ISI) 74. 74.Page, A.J. et al. SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments. Microb Genom 2, e000056 (2016). 75. 75.Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 6, 80-92 (2012). 76. 76.Gu, Z., Gu, L., Eils, R., Schlesner, M. & Brors, B. circlize implements and enhances circular visualization in R. Bioinformatics 30, 2811–2812 (2014). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btu393&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24930139&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F03%2F12%2F2021.03.08.21253075.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000343082900017&link_type=ISI) 77. 77.Lanfear, R. A global phylogeny of SARS-CoV-2 sequences from GISAID. (Zenodo, 2020). 78. 78.Nguyen, L.T., Schmidt, H.A., von Haeseler, A. & Minh, B.Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol 32, 268–74 (2015). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/molbev/msu300&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25371430&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F03%2F12%2F2021.03.08.21253075.atom) 79. 79.Letunic, I. & Bork, P. Interactive Tree Of Life (iTOL) v4: recent updates and new developments. Nucleic Acids Research 47, W256–W259 (2019). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=org/10.1093/nar/gkz239&link_type=DOI) 80. 80.Wang, L.G. et al. Treeio: An R Package for Phylogenetic Tree Input and Output with Richly Annotated and Associated Data. Mol Biol Evol 37, 599–603 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/molbev/msz240&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=31633786&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F03%2F12%2F2021.03.08.21253075.atom) 81. 81.Xia, S. et al. Inhibition of SARS-CoV-2 (previously 2019-nCoV) infection by a highly potent pan-coronavirus fusion inhibitor targeting its spike protein that harbors a high capacity to mediate membrane fusion. Cell Research 30, 343–355 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41422-020-0305-x&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32231345&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F03%2F12%2F2021.03.08.21253075.atom) 82. 82.Emsley, P., Lohkamp, B., Scott, W.G. & Cowtan, K. Features and development of Coot. Acta Crystallogr D Biol Crystallogr 66, 486–501 (2010). 83. 83.Gozalbo-Rovira, R. et al. SARS-CoV-2 antibodies, serum inflammatory biomarkers and clinical severity of hospitalized COVID-19 patients. J Clin Virol 131, 104611 (2020).