Abstract
COVID-19 vaccination resistance has become a major challenge to prevent global SARS-CoV-2 transmission. Here we report that the vaccination coverage rate is inversely correlated to the mutation frequency of the full genome (R2=0.878) and spike gene (R2=0.829) of SARS-CoV-2 delta variants in 16 countries, suggesting that full vaccination against COVID-19, with other mitigation strategies, is critical to suppress emergent mutations. Neutrality analysis of DH and Zeng’s E tests suggested that directional selection was the major driving force of delta variant evolution. To eliminate the homogenous effects (population expansion, selective sweep etc.), the synonymous (Dsyn) and nonsynonymous (Dnonsyn) polymorphisms of the delta variant spike gene were estimated with Tajima’s D statistic. Both D ratio (Dnonsyn/Dsyn) and ΔD (Dsyn-Dnonsyn) have positive correlation with the full vaccination rate (R2= 0.723 and 0.505, respectively) in 19 countries, indicating that purifying selection pressure of SARS-CoV-2 spike gene increased as the vaccination coverage rate increased. Taken together, our data suggests that vaccination plays an important role in the purifying selection force of spike protein of SARS-CoV-2 delta variants.
Introduction
Legally required vaccination against various infectious diseases is essential to public health policy in many countries. However, resistance to voluntary COVID-19 vaccination has emerged worldwide, including in the United States. In mid-June 2021, 33% of Americans said they were not willing to be vaccinated (Kirzinger et al, 2021). Public distrust has undermined COVID-19 vaccine acceptance, in association with a belief that the vaccine is ineffective, dangerous or compromises individual freedom (Schmelz et al, 2021). Therefore, overcoming COVID-19 vaccination resistance has become a major challenge to prevent global SARS-CoV-2 transmission.
Mutations drive genome variability, generating many different SARS-CoV-2 variants as the virus evolves to escape vaccine-mediated immunity and thereby, develop drug or vaccine resistance (Roy et al., 2020). The delta (lineage B1.617.2) variants were first documented in October 2020 in India and were designated variants of concern by the World Health Organization in May of 2021 due to its increased transmissibility and virulence (https://www.who.int/en/activities/tracking-SARS-CoV-2-variants/). All SARS-CoV-2 mutations result from two major mechanisms: (1) spontaneous substitution/deletion of nucleotides, and (2) RNA recombination (Roy et al, 2020, Yeh and Contreras, 2020, Ignatieva et al, 2021, Yeh and Contreras, 2021). Until now, it remained unclear exactly how human vaccinations affected virus selective pressure.
Is there correlation between mutation frequency and vaccination?
To explore this question, we first analyzed the correlation between the rates of full vaccination and the point mutation frequency (Mf) of COVID-19 delta variants’ genome in 20 countries. Complete SARS-CoV-2 genome sequences with high coverage from June 20 to July 3 2021 in 20 countries: Australia (N=121), France (N=788), Germany (N=955), Indonesia (N=97), India (N=171), Ireland (N=617), Israel (N=333), Italy (N=642), Japan (N=105), Mexico (N=368), Netherland (N=456), Norway (N=142), Portugal (N=782), Singapore (N=131), Spain (N=689), Switzerland (N=131), Sweden (N=786), Turkey (N=428), United States (N=537), and United Kingdom (UK, N=3534) were collected from the Global Initiative on Sharing All Influenza Data (GISAID) (https://www.gisaid.org/). Mf was calculated as Pi / (Ln × Ns). Pi is the number of instances of polymorphism detected within the genome/locus. Ln is the nucleotide length of the genome/locus. Ns is the number of sequenced entities present in the dataset (Roy et al., 2020). The information of fully vaccinated rate on 3 July 2021 in individual counties were obtained in Our World in Data (https://ourworldindata.org/covid-vaccinations).
We found that Mf was logarithmically reduced as the full vaccination rate increased in 16 of 20 countries (R2=0.878, P<10−7, Figure 1A). To our knowledge, this is the first evidence suggesting that vaccinations could successfully suppress viral mutations. Since the spike protein is the target of the vaccination program, we further examined mutations of delta variant spike gene. Likewise, we found that the Mf of the spike gene was also logarithmically reduced as the full vaccination rate increased this time in 17 of the 20 countries (R2=0.829, P<10−6, Figure 1B). With a 10.8% vaccinated rate, Mf was exceptionally low in Australia, likely as a result of the strict state lockdown restrictions. In contrast, the Mfs are higher in Japan, Switzerland, and United States, suggesting that their social mitigation strategies have been less successful. This also suggests that he mutation frequency may be not only affected by vaccination only, but may also reflect overall control of the virus.
How does vaccination affect selection pressure?
SARS-CoV-2 is subject to many powerful selection pressures which may contribute to vaccine escape. Like other viruses, the delta variant appears to have been selected for its increased transmissibility regardless of vaccination. Previously we reported on the mutations and selection pressure of SARS-CoV-2 among passengers in close-quarters quarantine on the Diamond Princess cruise ship between January and March 2020 (Yeh and Contreras, 2021). Until then, it had been unclear whether COVID-19 variants were evolving randomly or under selection pressures generated by vaccination and/or other mitigation efforts.
To understand how selection pressure operates within COVID-19 virus populations, we analyzed the polymorphism of the full SARS-CoV-2 genome using neutrality tests based on the site-frequency spectrum: (1) Tajima’s D test, (2) normalized Wu and Fay’s DH test, and (3) Zeng’s E test, using DNASP6 software with the bat coronavirus RaTG13 as the outgroup sequence (Rozas, et al., 2017). The Tajima D test compares pairwise nucleotide diversity (π, the average number of nucleotide differences per site between two sequences) and total polymorphism to infer selection and/or demographic events (Tajima, 1989). DH test is sensitive to the changes in high-frequency variants and primarily affected by directional selection and no other driving forces, such as demographic expansion (Zeng et al., 2006). Zeng’s E test contrasts high and low frequency regions of frequency spectrum, so this test can detect population growth after a sweep (Zeng et al., 2006). Statistical significance of observed values of all tests was obtained by coalescent simulation with free recombination after 10,000 replicates. The probability for each test statistic was calculated as the frequency of replicates with a value lower than the observed statistic (two-tailed test) by DNASP6 (Rozas, et al., 2017).
We observed that Tajima’s D values were negative and significantly deviated from zero (−2.199 to -2.878, P < 0.05) among 19 countries, indicating an excess of variants of low frequency (Table 1). Because the negative Tajima’s D values could result from demographic expansion and/or purifying/positive selection, we further employed normalized DH (insensitive to population growth) and Zeng’s E test (Zeng et al., 2006). Because fixation causes quick loss of high-frequency alleles by drift, and excess of low-frequency mutations are generated, Zeng’s E test is very sensitive to population growth. Genomic polymorphism of these viral populations revealed significantly negative DH values (−1.177 to -2.791, P<0.05), but E values were not significantly different from zero (−0.04 to 0.482, P>0.05) (Table 1). Similar results were obtained after confining our analysis to the spike gene of SARS-CoV-2 delta variants using Tajima’ D, DH, and Zeng’s E tests (Table 1). Taken together, these data suggest that directional selection was operating in these countries, and demographic expansion was less likely the major driving force.
Selection pressure has been observed in unvaccinated SARS-CoV-2 populations, such as in the aforementioned Diamond Princess shipboard lockdown (Yeh and Contreras, 2021). Our current results lead to the hypothesis that vaccination could contribute to a part of the selection pressures on SARS-CoV-2, especially on the spike protein. It has been shown that, within populations, the frequency distribution of non-synonymous polymorphisms is negatively skewed relative to the distribution of synonymous polymorphisms under purifying selection (Hahn et al, 2002; Hughes, et al. 2005). Therefore, this hypothesis can be examined using a modified Tajima’s D statistics, which takes more negative values for non-synonymous (Dnonsyn) than for synonymous sites (Dsyn) of spike gene under purifying selection (Hahn et al, 2002, Hughes, et al., 2005). The average number of pairwise synonymous differences (kS) and the average number of nonsynonymous pairwise differences (kN), the number of synonymous segregating sites (SS), and the number of nonsynonymous segregating sites (SN) were computed with equation as Austin Hughes described (1) S*S=SS/a1, (2) S*N=SN/a1, where a1 is as (Tajima, 1989, Hughes, et al., 2005). Dsyn was defined as kS − S*S, divided by the standard error of that difference, and Dnonsyn was defined as kN − S*N, divided by the standard error of that difference (Tajima, 1989, Hughes, et al., 2005). The values of Ka/Ks (the ratio of the number of nonsynonymous substitutions per non-synonymous site (Ka), to the number of synonymous substitutions per synonymous site (Ks), in a given period of time), and the neutrality index of the McDonald-Kreitman (MK) test of the spike protein (RaTG13 as the outgroup sequence) were analyzed by DNASP6 software (Rozas, et al., 2017).
Unlike Tajima’s D test, it has been demonstrated that Dnonsyn and Dsyn analysis is independent of sample size, so the value of Dnonsyn or Dsyn can be compared between data sets of different sizes (Hughes et al., 2008). The same excess of low-frequency alleles in non-synonymous polymorphism was also observed (Dnonsyn = -1.482 to −2.714; P < 0.01). In addition, the Ka/Ks values were less than 1 and the neutrality index values of the MK test were significantly more than 1 (P<0.05) in all countries (Table 1). These results confirmed that purifying selection led to constraint on the available neutral mutations at non-synonymous sites of the spike gene of delta variants, consistent to previous reports of SARS-CoV-2 variants in 2020 (Chaw et al, 2020).
Because non-synonymous and synonymous mutations are affected unequally by purifying selection, Dnonsyn is expected to be disproportionally lower than Dsyn (Hahn et al., 2002). It has been proposed that the value of ΔD (Dsyn-Dnonsyn) increased as purifying selection became stronger, and ΔD (Dsyn-Dnonsyn) is also able to rule out the homogenous effects (population expansion, selective sweep etc.) in neutrality tests (Hahn et al, 2002). We observed that ΔD of the spike gene of SARS-CoV-2 delta variants is positively proportional to fully vaccination coverage rate (R2= 0.505, P=0.001). When using D ratio (Dnonsyn/Dsyn), the standard error terms in both equations of Dnonsyn and Dsyn cancel out the sample size effect (Hughes, 2008). The positive correlation of D ratio and fully vaccination rate became even more evident (R2= 0.723, P<10−4) (Figure 2). Therefore, purifying selection pressure of SARS-CoV-2 spike gene increased as the vaccination coverage rate increased. This suggests that vaccination, in combination with other mitigation strategies, plays an important role in the purifying selection force of SARS-CoV-2 spike protein.
Application of neutrality tests and their limitations
Neutrality tests are very powerful tools to study population genetics and viral evolution. However, these tests require careful interpretation due to two known limitations in this study: (1) sampling across countries and the representation of sequences in public databases, (2) the timing of introductions and vaccination campaigns. Previously we have applied various neutrality tests to analyze SARS-CoV-2 transmission and evolution in Diamond Princess cruise (Yeh and Contreras, 2021). In contrast to the shipboard quarantine, where the virus spread in a closed environment, many more complicated factors (mitigation efforts, travel, herd immunity, etc.) could influence viral evolution during pandemics among different countries.
Bhatt et al. have shown that the Tajima D test performs very well with low type I error rate (less than 5%) when the θ value (the substitutions per site of a sequence alignment) is between 0.1 to 10−4 in different RNA viruses (Bhatt et al., 2010). Because θ values in this study ranged between 0.0009 to 0.032, the type I error rate is low. The Tajima D test is also much less influenced by RNA recombination, which we have reported previously (Yeh and Contreras, 2020, Yeh and Contreras, 2021). Tajima D test results depend on the interplay of many parameters: positive and negative selection, population size dynamics, spatial structure and migration, random genetic drift, etc. A key limitation of the Tajima D test is the difficulty to separate the effects of each parameter. However, the selection pressure and demographic expansion can be verified in combination with other neutrality tests. Our results of DH and Zeng’s E test suggests that purifying selection, not demographic expansion, was the acting force. Tajima D may show negative values due to a recent bottleneck event rather than selection, but the bottleneck effect should impact all types of polymorphism equally (Tajima, 1989). Significant differences between Dnonsyn and Dsyn (either ΔD or D ratio (Dnonsyn/Dsyn)), which efficiently eliminated the homogenous effects, strongly indicates that purifying selection was at play and is positively corelated with vaccination coverage rate within these samples.
The inter-species divergence Ka/Ks test has been shown to be poorly equipped to detect positive selection in short time scales for the comparison of young evolutionary lineages like SARS-CoV-2 (Mugal, et al., 2014). Consistent with other analytical attempts, our Ka/Ks data was inconclusive (Table 1). This has been the case with other applications of the Ka/Ks test in SARS-CoV-2 spike gene, which resulted in extreme values and controversial results (Kang et al., 2021), In contrast to Ka/Ks, the MK test cannot be used to locate specific sites under selection, although this test enables us to estimate the adaptive substitution rate of a gene (Smith and Eyre-Walker, 2002). Variants of the MK test have been used to analyze the evolution of RNA viruses (Bhatt et al., 2010). Since the MK test relies on the comparison of the ratio of polymorphism to fixed differences of both synonymous and nonsynonymous mutations, selection of an outgroup sequence is critical for this test. The number of fixed synonymous mutations may be underestimated in a rapidly evolving virus like SARS-CoV-2 when a distant outgroup is applied (Baudry and Depaulis, 2003). Since the Ka/Ks or the MK tests may not be the appropriate tests for SARS-CoV-2 delta variants, we only included them to confirm the observation of purifying selection of SARS-CoV-2 spike gene here as well. We also did not include neutrality tests based on haplotype distribution or linkage disequilibrium, because they are expected to be strongly affected by RNA recombination that commonly occurred in SARS-CoV-2 populations (Yeh and Contreras, 2020, Ignatieva et al, 2021, Yeh and Contreras, 2021).
Perspectives
More virulent strains of SARS-CoV-2 have emerged with enhanced transmissibility and immune evasion properties. For example, multiple variants have escaped neutralizing antibodies developed to target the spike protein receptor-binding or N-terminal domain (Harvey, et al., 2021). What’s more, the case numbers of breakthrough infection caused by the delta variant have increased drastically worldwide (Farinholt et al., 2021).
Of all countries studied, our observations show that higher vaccination rates corresponded with lower mutation frequencies and higher Dnonsyn/Dsyn values for the whole genome and the spike gene. We conclude then, that as the vaccination coverage rate increases, purifying selection forces of nonsynonymous mutations also increase.
Thus, we recommend that: 1) universal vaccination should be administered as soon as possible to suppress the generation of deadly mutations; 2) mitigations strategies such as personal protection equipment, social distancing, etc., should be continued until full vaccination is reached to prevent viral transmission; 3) genomic surveillance should be undertaken to monitor for new mutations; and 4) more sequence summary statistics of RNA virus evolution are required to facilitate understanding new COVID-19 outbreaks.
Data Availability
1.The World Health organization. Tracking SARS-CoV-2 variants. 2. Global Initiative on Sharing All Influenza Data. 3. Coronavirus (COVID-19) Vaccinations. Our World in Data.
https://www.who.int/en/activities/tracking-SARS-CoV-2-variants/
Disclosure statement
No potential conflict of interest was reported by the author(s).
Author contributions
All authors contributed to study concept, rationale, and initial manuscript drafts, interpretation of data, and final manuscript preparation. All authors have read and approved the final version of the manuscript.
Funding
No funding
Ethical approval
None declared.
Footnotes
This paper is dedicated to Professor Austin Hughes (1949-2015), an evolution biologist at University of South Carolina, for his lifetime achievement in neural theory and adaptive molecular evolution.
The Tajima D forecasting in the previous version was replaced by the synonymous (Dsyn) and nonsynonymous (Dnonsyn) polymorphisms of the delta variant spike gene. Both D ratio (Dnonsyn/Dsyn) and deltaD (Dsyn-Dnonsyn) have positive correlation with the full vaccination rate (R2= 0.723 and 0.505, respectively) in 19 countries, indicating that purifying selection pressure of SARS-CoV-2 spike gene increased as the vaccination coverage rate increased.