Evolving Infection Paradox of SARS-CoV-2: Fitness Costs Virulence? ================================================================== * A. S. M. Rubayet Ul Alam * Ovinu Kibria Islam * Md. Shazid Hasan * Mir Raihanul Islam * Shafi Mahmud * Hassan M. AlEmran * Iqbal Kabir Jahid * Keith A. Crandall * M. Anwar Hossain ## ABSTRACT **Background** SARS-CoV-2 is continuously spreading worldwide at an unprecedented scale and evolved into seven clades according to GISAID where four (G, GH, GR and GV) are globally prevalent in 2020. These major predominant clades of SARS-CoV- 2 are continuously increasing COVID-19 cases worldwide; however, after an early rise in 2020, the death-case ratio has been decreasing to a plateau. G clade viruses contain four co- occurring mutations in their genome (C241T+C3037T+C14408T: RdRp.P323L+A23403G:spike.D614G). GR, GH, and GV strains are defined by the presence of these four mutations in addition to the clade-featured mutation in GGG28881- 28883AAC:N. RG203-204KR, G25563T:ORF3a.Q57H, and C22227T:spike.A222V+C28932T-N.A220V+G29645T, respectively. The research works are broadly focused on the spike protein mutations that have direct roles in receptor binding, antigenicity, thus viral transmission and replication fitness. However, mutations in other proteins might also have effects on viral pathogenicity and transmissibility. How the clade- featured mutations are linked with viral evolution in this pandemic through gearing their fitness and virulence is the main question of this study. **Methodology** We thus proposed a hypothetical model, combining a statistical and structural bioinformatics approach, endeavors to explain this infection paradox by describing the epistatic effects of the clade-featured co-occurring mutations on viral fitness and virulence. **Results and Discussion** The G and GR/GV clade strains represent a significant positive and negative association, respectively, with the death-case ratio (incidence rate ratio or IRR = 1.03, p <0.001 and IRR= 0.99/0.97, p < 0.001), whereas GH clade strains showed no association with the Docking analysis showed the higher infectiousness of a spike mutant through more favorable binding of G614 with the elastase-2. RdRp mutation p.P323L significantly increased genome-wide mutations (p<0.0001) since more expandable RdRp (mutant)-NSP8 interaction may accelerate replication. Superior RNA stability and structural variation at NSP3:C241T might impact upon protein or RNA interactions. Another silent 5’UTR:C241T mutation might affect translational efficiency and viral packaging. These G- featured co-occurring mutations might increase the viral load, alter immune responses in host and hence can modulate intra-host genomic plasticity. An additional viroporin ORF3a:p.Q57H mutation, forming GH-clade, prevents ion permeability by cysteine (C81)- histidine (H57) inter-transmembrane-domain interaction mediated tighter constriction of the channel pore and possibly reduces viral release and immune response. GR strains, four G clade mutations and N:p.RG203-204KR, would have stabilized RNA interaction by more flexible and hypo-phosphorylated SR-rich region. GV strains seemingly gained the evolutionary advantage of superspreading event through confounder factors; nevertheless, N:p.A220V might affect RNA binding. **Conclusion** These hypotheses need further retrospective and prospective studies to understand detailed molecular and evolutionary events featuring the fitness and virulence of SARS-CoV-2. **Highlights** * We speculated an association of particular SARS-CoV-2 clade with death rate. * The polymerase mutant virus can speed up replication that corresponds to higher mutations. * The impact on viral epistasis by evolving mutations in SARS-CoV-2. * How the virus changes its genotype and circulate with other types given the overall dynamics of the epidemics? * Human intervention seems to work well to control the viral virulence. This hygiene practice will control the overall severity of the pandemic situation as recommended by the WHO. Our work has given the same message but explain with the dominant co-occurring mutations. Key words * SARS-CoV-2 * COVID-19 * Infection Paradox * Fitness * Virulence * Clades * Co-occurring mutations ## 1. Introduction Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused the COVID- 19 pandemic since the beginning of 2020 1. This highly contagious virus spread in 213 countries infecting over 11.3 million people including half a million deaths within the first six months 2. SARS-CoV-2 has possessed some extraordinary attributes that make it extremely infectious: high replication rate, large burst size, high stability in the environment, strong binding efficiency of S protein receptor-binding domain (RBD) with human angiotensin- converting enzyme 2 (ACE2) receptor, and additional furin cleavage site in S protein 3–5. In addition to those, it has proofreading capability ensuring high-fidelity replication 6. The virus contains four major structural proteins: spike glycoprotein (S), envelope (E), membrane (M), and nucleocapsid (N) proteins along with 16 nonstructural proteins (NSP1 to NSP16) and seven accessory proteins (ORF3a, ORF6, ORF7a, ORF7b, ORF8a, ORF8b, and ORF10) 7, 8. Mutational spectra within the SARS-CoV-2 genome 9, 10, spike protein 11, RdRp 12, ORF3a 13, and N protein 14 are reported where Islam et al. (2020) and Rahman et al. (2020) studied the possible impact of mutations on the virulence of a strain 10. SARS-CoV-2 has been classified into seven major clades, such as G, GH, GR, GV, S, V, and L by GISAID based on the dominant mutations 15. Yin 16 reported the leader sequence mutation 241C > T is co-occurring with three other mutations, 3037C > T (NSP3: C318T), 14408C > T (RdRp: p.P323L), and 23403A > G (S: p.D614G). GISAID referred these co-occurring mutation containing viruses as clade G, a viral strain over the wild-type 17, 18. These clade G or lineage B.1 viruses were dominant in Europe 17 and the east coast of USA 19 during the earlier stages of the pandemic that further spread in Southeast Asia 20, 21 and Oceania 18. Remarkably, this mutation variant is reported to be more transmissible 17 and was speculated to cause high mortality in USA 22. The GR clade or lineage B.1.1 was classified with additional trinucleotide mutations at 28881-28883 (GGG>AAC) creating two consecutive amino acid changes, R203K and G204R, in N protein 18. The GR strains are now the dominant type causing more than one-third of infection cases globally 15. Another derivative of G clade is GH or lineage B.1.3-B.1.66, B.1.351, characterized by an additional ORF3a:p.Q57H mutation frequently found in the USA and Europe 18, 23. The newest variant GV or lineage B.1.177 is responsible for COVID-19 cases in Europe featuring an A222V mutation in the S protein with other mutations of the clade G. N: A220, ORF10: V30L, and three other synonymous mutations, namely T445C, C6286T, and C26801G, are also found in this clade24. The most frequently observed mutation is D614G of S protein 25, which has direct roles in receptor binding, and immunogenicity, thus viral immune-escape, transmission, and replication fitness. The role of other dominant mutations remains largely underestimated. However, the ORF3a:p.Q57H 23 and RdRp:p.P323L 26 have recently been investigated while the effect of NSP3: C318T, 5’UTR: C241T, N:p.RG203-203KR, and N:p.A220V is still being overlooked. Mutations in other proteins could also have effects on viral pathogenicity and transmissibility. How the clade-featured mutations are linked with viral evolution in this pandemic through gearing their fitness and virulence is the main question of this study. Different mutations may work independently or through epistatic interaction, thus it is difficult to spot exactly when a single mutation or co-occurring mutations become dominant in populations through evolutionary fitness. Several co-occurring mutations identified in SARS-CoV-2 may have complementary roles in changing its virulence and the infection dynamics of COVID-19 pandemic. Yet, many questions remain: When are ‘co-occurring mutations’ first observed? Why are particular clades, GV and GR, outcompeting their speculative ancestor G clade? What are the impacts of these mutations on protein structures and what are their functional roles? Do these mutant proteins interact together? Is there any role of the coevolved ‘silent’ mutations? Do these mutations have any impact on viral fitness and virulence? We attempt to answer these questions by *in silico* molecular insights of SARS-CoV-2 mutants and possible interactions of coevolved proteins. This study aims to identify the prevalence of dominant co-occurring mutations of G, GH, GR, and GV strains circulating in all over the world, correlation of the clades with death-case ratio, probable individual and/or synergistic impact of those mutants upon virulence in terms of viral entry and fusion, evasion of host cell lysis, replication rate, ribonucleoprotein stability, protein-protein interactions, translational capacity, and ultimately the combined effect on fitness and virulence. ## 2. Materials and Methods ### 2.1 Retrieval of Sequences and Mutation Analyses This study analyzed 225,526 high-coverage (<1% Ns and <0.05% unique amino acid mutations) and complete (>29,000 nucleotide) genome sequences with specified collection date from a total of 3,16,166 sequences submitted to GISAID until January 03, 2021.We sifted the sequences generated from the non-human host out from the dataset. The Wuhan- Hu-1 (Accession ID- [NC\_045512.2](http://medrxiv.org/lookup/external-ref?link_type=GEN&access_num=NC_045512.2&atom=%2Fmedrxiv%2Fearly%2F2021%2F02%2F23%2F2021.02.21.21252137.atom)) 27 isolate was used as the reference genome. A python script was used to partition a significant part of the dataset into two subsets based on the RdRp: C14408T mutation and estimated the genome-wide variations (single nucleotide changes) for each strain. For the genome-wide mutation analysis, a total of 37,179 sequences (RdRp wild type or ‘C’ variant: 9,815; and mutant or ‘T’ variant: 27,364) were analyzed from our dataset. The frequency of mutations was tested for significance with the Wilcoxon signed-rank test between RdRp ‘C’ variant and ‘T’ variant using IBM SPSS statistics 25. Random effect poisson regression model was performed in STATA v13.0 to identify the association between death-case ratio and different clade strains (G, GH, GR, and GV); both unadjusted and adjusted incidence risk ratio (IRR) were estimated where time was introduced as a panel variable 28. ### 2.2 Epidemiological Data Analysis and Time Plot Generation In this study, we report the prevalence of these dominant clades in 2020, both individually and in combination, with disease progression and deaths allowing us to infer increasing fitness of the SARS-CoV-2. A weekly-based time plot of G, GH, GR, and GV clade frequencies with infection and death-cases was generated from 23 December 2019 until January 3, 2021 that counted a total of 54 weeks (supplementary table S1). The total number of infections and deaths by weeks were extracted from the WHO 2019-nCoV situation reports. The case-death ratio was estimated by dividing the number of deaths of a particular week by the number of cases identified in the earlier week based on the conservative assumption of a one-week interval between diagnosis and death 29. Regional time plots of those clades were also generated monthly (from January to June) with frequencies of new infections, deaths, and death-case ratio based on the available data on WHO situation reports 30. The studied sequences were divided into six regions; Europe (n= 145,254), Americas (n= 48,014), Eastern Mediterranean (n=3,103), Southeast Asia (n= 4,134), West Pacific (n= 16,974), and Africa (n= 2,740). ### 2.3 Stability, Secondary and Three-Dimensional Structure Prediction Analyses of S, RdRp, ORF3a, and N Proteins DynaMut 31 and FoldX 5.0 32, 33 were used to determine the stability of both wild and mutant variants of N, RdRp, S, and ORF3a proteins. PredictProtein 34 was utilized for analyzing and predicting the possible secondary structure and solvent accessibility of both wild and mutant variants of those proteins. The SWISS-MODEL homology modeling webtool 35 was utilized for generating the three-dimensional (3D) structures of the RdRp, S, and ORF3a protein using 7c2k.1.A, 6xr8.1.A, and 6xdc.1.A PDB structure as the template, respectively. Modeller v9.25 36 was also used to generate the structures against the same templates. I-TASSER 37 with default protein modeling mode was employed to construct the N protein 3D structure of wild and mutant type since there was no template structure available for the protein. The built-in structural assessment tools (Ramachandran plot, MolProbity, and Quality estimate) of SWISS-MODEL were used to check the quality of generated structures. ### 2.4 Molecular Docking and Dynamics of RdRp-NSP8 and S protein-Elastase2 Complex Determination of the active sites affected by binding is a pre-requisite for docking analysis. We chose 323 along with the surrounding residues (315-324) of RdRp and the residues 110 to 122 of NSP8 monomer as the active sites based on the previously reported structure 38. The passive residues were defined automatically where all surface residues were selected within the 6.5°A radius around the active residues. The molecular docking of the wild and predicted mutated RdRp with the NSP8 monomer from the PDB structure 7C2K was performed using the HADDOCKv2.4 to evaluate the interaction 39. The binding affinity of the docked RdRp-NSP8 complex was predicted using the PRODIGY 40. The number and specific interfacial contacts (IC) for each of the complexes were identified. The human neutrophil elastase (hNE) or elastase-2 (PDB id: 5A0C) was chosen for docking of the S protein, based on earlier reports 41. Here we employed CPORT 42 to find out the active and passive protein-protein interface residues of hNE. The S protein active sites were chosen based on the target region (594-638) interacting with the elastase-2. The passive residues of S protein were defined automatically as mentioned for RdRp-NSP8 docking analysis. Afterward, we individually docked wild (614D) and mutated (614G) S protein with the hNE using HADDOCK 2.4. The binding affinity of the docked complexes, as well as the number and specific interfacial contacts (IC), were predicted as performed after RdRP-NSP8 docking. We employed HDOCK server 43 with specifying the active binding sites residues for predicting the molecular docking energy. The structural stability of the protein complex and their variations were assessed through YASARA Dynamics software package 44. The AMBER14 force field 45 was used for these four systems, and the cubic simulation cell was created. The TIP3P (at 0.997 g/L-1, 25C, and 1 atm) water solvation model was used, and steepest gradient energy minimization techniques by simulated annealing method was used for initial energy minimization process. The hydrogen bond network system was optimized. The simulation cell was extended 20Å in all cases of the protein to move freely 46. The PME or particle mesh Ewald methods was applied to calculate the long-range electrostatic interaction by a cut off radius of 8Å 47. The Berendsen thermostat was applied to maintain the temperature of the simulation cell. The simulation system was neutralized by the addition of 0.9% NaCl, pH 7.4, and 310K temperature. The time step of the simulation was set as 1.25fs 46. The simulation trajectories were saved after every 100ps. Finally, the molecular dynamics simulation was conducted for 100ns to analyze root mean square deviation (RMSD), root mean square fluctuation (RMSF), radius of gyration (Rg), solvent accessible surface area (SASA), hydrogen bond 48–51. ### 2.5 Mutational Analysis of Transmembrane Domain 1 of ORF3a and SR-domain of N protein The complete genome of 12 pangolin derived coronavirus strains, as well as 38 bat, civet and human SARS-CoVs, were downloaded from GISAID and NCBI, respectively for the mutational comparison between the SARS-CoV and SARS-CoV-2. We mainly targeted transmembrane domain 1 (TM1), which covers 41 to 63 residues, of ORF3a to find the identical mutation and scan overall variation in TM1. In the case of the N protein, CoVsurver of GISAID was used to study the frequent mutations other than our target ones (RG KR:203-04) to bolster our speculation on the change and subsequent effects of those surrounding variations on the functions of the protein. A generalized comparison between SARS-CoV and SARS-CoV-2 reference sequences was performed to identify mutation in the SR-rich region that will help to postulate on N protein functions of novel coronavirus based on previous related research on SARS-CoV. ### 2.6 Analyzing RNA Folding prediction of 5’UTR and NSP3 The Mfold web server 52 was used with default parameters to check the folding pattern of RNA secondary structure in the mutated 5’UTR, synonymous leader (T445C) and NSP3 regions (C3037T). The structure of complete mutant 5’UTR (variant ‘T’) was compared with the wild type (variant ‘C’) secondary pattern as mentioned in the Huston, et al. 53. Since the wild type (variant ‘C’ at 318th nucleotide) RNA structure of the NSP3 was not available in the literature, we generated the structure of mutant (variant ‘T’) to predict the RNA folding in the Mfold web server. From the Mfold web server, we also estimated freeenergy change (ΔG) for wild and mutant NSP3 RNA fold. ## 3. RESULTS AND DISCUSSION We represented the global scenario of SARS-CoV-2 infection by the G, GH, GR, and GV clade strains and estimated the association between the clade strains and death-case ratio. Afterwards, the possible effects of the nine mutations in S, RdRp, ORF3a, N, 5’UTR, leader protein, and NSP3 were discussed with associated results. Whereas researches on molecular docking of the spike protein 54, 55 and RdRp 56, 57 in search of potential drug targets is a continuous process, our study approached in a unique way to dock spike with elastase-2 and RdRp with NSP8 to satisfy our purpose. The overall epistatic interactions of the mutant proteins and/or RNA was then depicted (Figure 1) with appropriate explanation. Finally, we endeavored to postulate using theoretical evolutionary theory how the virus might be changing virulence and fitness. ![Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/02/23/2021.02.21.21252137/F1.medium.gif) [Figure 1.](http://medrxiv.org/content/early/2021/02/23/2021.02.21.21252137/F1) Figure 1. Schematic diagram of SARS-CoV-2 replication in cell showcasing the related to S, N, ORF3a, RdRp, NSP3 and 5’-UTR based epistatic interactions. The replication cycle starts with the ACE2 receptor binding of the spike glycoprotein (S) as cornered at the top-left and finishes with the exocytosis at the top-right. The viruses which do not carry G-, GH- and/or GR-featured mutations in the S, N, ORF3a, RdRp, NSP3 and 5’- UTR are denoted as the wild type where mutants contain those. Throughout the diagram, the red and green color icons such as proteins, genome, and virion represent the wild and mutant type, respectively. For a generalized virion, we used the blue color. Although this theme is not show the co-infection of both types, which might occur in rare occasions, we showed the comparative epistatic effects side-by-side fashion during the whole replication cycle that will make it easy to grasp. Related figure(s) for each protein are shown in the enclosed box. To mean the uncertainty or unknown effects of any mutant proteins/RNA structure, we used the ‘question’ mark in a pathway and on explanatory box. RdRp- RNA dependent RNA polymerase; NSP14- proof-reading enzyme of SARS-CoV-2; ER- endoplasmic reticulum; ERGIC- endoplasmic reticulum (ER) Golgi intermediate compartment. ### 3.1 Global Emergence of Dominant Co-occurring Mutations Analysis of the SARS-CoV-2 genome sequences has indicated that 241C > T, 3037C > T, 14408C > T, and 23403A > G mutations were discretely identified among different viruses in China on 24th January of 2020. These four mutations together in a single virus was first detected in England on 3rd February 2020 (Table 1). Since then, those mutations were found to cooccur with other mutations, thus formed clade G, GH, GR, and GV, and have become the most dominant variants in other regions of the world (Figure 2b), for example, escalating to 85% in May 2020 in Southeast Asia 20. The G clade strain circulated predominantly (30%, n=828) in Africa, whereas the GH, GR, and GV clades have become more recognized in the Americas, Western Pacific and Europe, respectively (Table 1). ![Figure 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/02/23/2021.02.21.21252137/F2.medium.gif) [Figure 2.](http://medrxiv.org/content/early/2021/02/23/2021.02.21.21252137/F2) Figure 2. Global epidemiological scenario of G, GH and GR clades. **(a)** Weekly time-plot showing the percentage of G, GHGR and GV clade viruses with worldwide SARS-CoV-2 infections, death cases and death rates from 24 December, 2019 (collection date of first sequenced virus) to 19th October 2020 where the weeks are shown in X axis. As low as 9 (for 2nd week) to 9587 sequences (for 14th Week) were analyzed to determine the ratio of G, GH and GR clade (shown in right Y axis) in the sequenced viruses based on the availability of high coverage genomes submitted in GISAID until 28th of October, 2020. For analyzing the death rate for each week, the number of detected infections and death cases (left Y axis) were collected from WHO situation reports. **(b)** Monthly region wise time-plot showing the percentage of G, GH and GR clades with infections, deaths and death rates from January to October of 2020 in 6 WHO regions. 68,125 sequences were analyzed for Europe, 30,395 for Americas, 14778 for West Pacific, 3050 for Southeast Asia, 1170 for Eastern Mediterranean and 1523 for Africa. Infections and death cases were collected from WHO situation reports. The data labeling and values in the X and Y axis are same to figure 2a. View this table: [Table 1.](http://medrxiv.org/content/early/2021/02/23/2021.02.21.21252137/T1) Table 1. First occurrence, global frequencies and prevalent zones for co-occurring mutation were shown. View this table: [Table 2.](http://medrxiv.org/content/early/2021/02/23/2021.02.21.21252137/T2) Table 2. Incidence risk ratio (IRR) for different clade strains against death-case ratio. Our weekly based time-plot has depicted a gradual increase of G, GH, and GR viruses altogether since the 10th week (24 February – 2 March 2020) recording a sudden jump to 42% in that week from a mere 12% of the previous week. The global COVID-19 cases exponentially increased from the 10th week with only 7,806 cases and infected almost 500,000 people with ca. 35,000 deaths in just seven weeks while the G, GH, and GR strains reached 80% (supplementary table S1). The death rate elevated at its peak (14%) in mid of March (week 12) and gradually decreased to below 2% in early September. Correspondingly, the number of GR strains is increasing from the 10th week to 32nd week with a small fluctuation where the G and GH strains each maintained a static ratio between ∼20-30%. GV clade strains initiated into the population from 30th week and became most dominant strain replacing GR clade strains in just 9 weeks (Figure 2a). The geographical distribution plot of the G, GH, GR, and GV clades with infection and death-cases delineates that the new infections began to rise exponentially with the increase of coevolved mutant variants in all regions except the West Pacific area (Figure 2b). The West Pacific region, which includes East Asian countries as well, identified a very low number of infection cases and deaths per million (Figure 2b). Europe and America have a high rate of infections as well as a high percentage of those variants. The reason may be linked with α1-antitrypsin (AAT) deficiency, which is very rare in East Asia unlike Europe and North America 58. The AAT allele deficiency facilitates entry of the 614G subtype into the host cells and accelerates the spread of G, GH, GR, and GV clades. Our analysis has also found that the proportion of strains containing 23403A > G mutation was 25% in East Asia and >75% in Europe and America in first 6 months of this year (data not shown). The death-case ratio was decreasing globally while the GR mutants were increasing until August (Figure 2a). Since September, GV clade strains were increasing while the death-case ratio remained low (2%). To examine the association between clade strains and death-case ratio, both unadjusted and adjusted incidence risk ratio were estimated. In the adjusted model, G, GR, and GV clades were found to be significantly associated with death-case ratio in both models. If G clade strains increase by one percentage point then death-case ratio would be expected to increase by a factor of 1.03, while holding all other variables in the model constant (IRR = 1.03, 95% CI 1.01 – 1.06). However, GR and GV clades were inversely related with death-case ratio. One percentage point increment in GR clade led to reduction to death-case ratio by a factor of 0.99, while holding all other variables in the model constant (IRR = 0.98, 95% CI 0.97 – 0.99). And then again, If GV clade strains increase by one percentage point then death-case ratio would be expected to decrease by a factor of 0.97, while holding all other variables in the model constant (IRR = 0.97, 95% CI 0.96 – 0.98). Like other SARS-CoV-2 studies 59–61, this statistical analysis also suffers from some limitations in dealing with genomic and calculating death-case ratio data. The death-case ratio is believed to be underestimated because of the inadequate number of tests capacity and asymptomatic SARS-CoV-2 cases in the general population. Moreover, fewer mutation patterns are uploaded from underdeveloped or developing countries like African and Sub- Saharan countries which might lead to spatial biasedness in the analysis. Therefore, the global epidemiological scenario of different clades was explored to mitigate this problem. Regional monthly data depicted a similar increase of GR and GV strains while the death ratio was decreasing in studied regions with some rare exceptions. In Europe, GR strains were predominant from April to August, while GV strains became predominant from September. In the Americas a high abundance (over 50%) of GH clade strains was found from March to December. In Eastern Mediterranean, Africa and Southeast Asian region there was an increased rise of GR strains in May, while in Western Pacific region GR strains became predominant in June. The increase of G strains and the death-case ratio at the same time was observed in most regions, except Eastern Mediterranean and Southeast Asia, where a limited number of sequence data were produced. (Figure 2b). Researchers around the globe are now trying to explore the factors associated with variable mortality rates due to the infection by SARS-CoV2 in different regions of the world. A recent report identified a Neanderthal-derived haplotype in chromosome 3 which is prevalent in South Asia as a substantial contributor to COVID-19 risk for hospitalized patients along with other risk factors 62. Also, a haplotype inherited from Neandertals on chromosome 12 was found at ∼25 to 30% in Eurasia and the Americas, can reduce relative risk of becoming severely ill with COVID-19 by ∼ 22% . But other reports say that mortality associated to COVID-19 in Indian and south Asian subcontinent is lower than in the west 64. In-hospital mortality was also found to be considerably lower in Asia than Europe and America 65. Several factors such as the virulence of the pathogen, host factors like innate immunity, genetic diversity in immune responses, epigenetic factors, genetic polymorphisms of ACE2 receptors, micro RNAs and universal BCG vaccination, along with environmental factors may have contributed in low mortality in Indian region 66. In European countries, mortality rates were observed to vary from region to region 67. In Africa, where the Neanderthal risk haplotype is almost absent, youthful population, and weather conditions might have also contributed to low morbidity and mortality of COVID-19 68. No association between the SARS-CoV-2 variants and mortality rates was observed in the Eastern Mediterranean Region 69. However, another report showed that, GR, GH and L clade viruses are predominant in countries with higher deaths and GR clade showed higher prevalence among severe/deceased patient 70. Besides these host, pathogen and environment associated factors, human and social intervention like political decision making, scientific advice, and health system and public health capacity can also contribute to variable COVID-19 mortality rates in different regions of the world 71. ### 3.2 S Protein D614G Mutation Favors Elastase-2 Binding Korber et al. (2020) linked the S:p.D614G mutation with viral spread or transmission capacity rather than virus infectivity or virulence [17] whereas Becerra Flores et al. (2020) approached to demonstrate a potential link of D614G with virus pathogenicity [22], using same data and method alike Korber et al. (2020). A preview published by Grubaugh et al. (2020) stated those major points as unclear without any concrete interpretations, and questioned on the correlation of D614G mutation with virus spread or transmissibility. Dearlove et al. (2020) acknowledged that this 614G variant may constitute an exception to their conclusion on the arising of other mutations of the virus due to genetic drift 72. Rausch et al. (2020) 73 also stressed upon the possible effect of D614G on viral fitness through revisiting the points raised by Dearlove et al. (2020). Recently, Plante et al. (2020) 74 and Hou et al. (2020) 75 reported that D614G mutation alters virus fitness by enhancing viral load in COVID-19 patients and may increase transmission in human, which was speculated from the increasing transmission of mutant-type viruses in hamsters. S: p.D614G mutation may facilitate the exposure of the cleavage domains of S1-S2 and S2’ to proteases, elastase-2, Furin, or TMPRSS2, as discussed by Eaaswarkhanth, et al. 76. Another explanation of the more efficient cellular entry for D614G variant is the possibly breakage of hydrogen bonds that modulates the interactions between S protein protomers and ACE2 receptor binding 17, 75, 77. Moreover, G614 mutation may increase S protein stability and participate in N-linked glycosylation at N616 41. Several recent experiments have suggested that mutated (G614) protein contains a novel serine protease cleavage site at 615-616 that is cleaved by host elastase-2, a potent neutrophil elastase playing important roles in inflammatory diseases of human, more efficiently than wild protein, D61425, 41, 58, 78. Bhattacharyya et al. (2020) hypothesized that the neutrophil elastase level at the site of infection will facilitate the host cell entry for G614. We explained here, with the aid of structural bioinformatics, how the mutated protein gets the advantage over wild type with a single mutation. This study found interesting structural features of the S protein while comparing and superimposing the wild (D614) over mutant (G614). The secondary structure prediction and surface accessibility analyses showed that there was a slight mismatch at the S1-S2 junction (681PRRAR S686) where serine at 686 (S686) was found covered in G614 and exposed to the surface in D614. However, S686 in both G614 and D614 were exposed to an open-loop region to have possible contact with the proteases (supplementary Figure S1). Further investigation on the aligned 3D structures showed no conformational change at the Furin or TMPRSS2 cleavage site (Figure 3c). We observed no structural variation in the surrounding residues of the protease-targeting S1-S2 site (Figure 3c), which eliminated the assumption of Phan 79. The predictive 3D models and structural assessment of D614 and G614 variants also confirmed that the cleavage site at 815-16 of S2 subunit (812PSKR S816) or S2’ 3, 80 had no structural and surface topological variation (Figure 3d-e). Rather, the superimposed 3D structures suggested a conformational change in the immediate downstream region (618TEVPVAIHADQLTPT632) of the 614th position of G614 that was not observed in D614 variants (Figure 3a-b). ![Figure 3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/02/23/2021.02.21.21252137/F3.medium.gif) [Figure 3.](http://medrxiv.org/content/early/2021/02/23/2021.02.21.21252137/F3) Figure 3. Different structural and stability comparison of wild and mutant spike protein. Structural superposition of wild and mutant spike proteins **(a-b)**; conformation in the S1-S2 **(c)** and S2’sites **(d-e)**; representation of vibrational entropy energy change on the mutant type structure **(f)**; and interatomic interaction prediction of both wild **(g)** and mutant **(h)** types. For Figure a-e, the gray and yellow color represent the wild and mutant protein, respectively. **(a)** The downstream (617-636) of D614G in wild (green) and mutant (red) S protein was focused. Overlapping of the wild (D614) and mutant (G614) S protein showed conformational change in the 3D structures. **(b)** However, the conformational change are in the loop region (618-632) of the proteins thus may potentially play role in interacting with other proteins or enzymes, such as elastase-2 as we focused in this work. **(c)** No change was found in the S1-S2 cleavage site (685-686), depicted in blue color, of the wild and mutant protein. **(d)** Surface and **(e)** cartoon (2°) structure of the superimposed wild and mutant proteins where the S2’ (pink) is situated in surface region and do not show any change in accessibility in the residual loop region. **(f)** The mutant (G614) protein showed higher flexibility in the G614 (sticks) and its surroundings (red). The intra-molecular interaction determined the overall stability of the **(g)** wild and **(h)** mutant structure where C of D614 β (aspartic acid at 614; green stick modelled) had two hydrophobic interaction with the benzene rings. This intramolecular contacts stabilize the S protein of wild type and missing of this bond destabilize the mutant (G614) protein. The mutant protein has glycine at 614 which has less chance of interacting with other neighboring amino acids due to its shorter and nonpolar R-group. The color code representing the bond type is presented in each **(g)** and **(h)**. ![Figure 4.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/02/23/2021.02.21.21252137/F4.medium.gif) [Figure 4.](http://medrxiv.org/content/early/2021/02/23/2021.02.21.21252137/F4) Figure 4. The molecular docking of wild and mutant with elastase-2. Both the (upper figure) wild (D614) and (lower figure) mutant (G614) version of S protein was shown in golden color whereas the elastase-2 docked to D614 and G614 in blue and green color, respectively. The enlarged views of the docked site were shown in separate boxes. **(a)** The possible docked residues (stick model) on the wild S protein (warm pink) and elastase-2 (green) are 618(Thr)- 619(Glu)-620(Val) and 198(Cys)-199(Phe)-225:227 (Gly, Gly, Cys), respectively. The aspartic acid at 614 is 17.3°A far away from the valine (101), apparently the nearest amino acid of elastase-2 to the cleavage site (615-616). **(b)** The possible interacting residues (stick model) on the mutant S protein (blue) and elastase-2 (warm pink) are 614(Gly)-618(Thr)- 619(Glu)-620(Val) and 101(Val)-103(Leu)-181(Arg)-222:227(Phe, Val, Arg, Gly, Gly, Cys), and 236 (Ala) respectively. In this case, the glycine at 614 is only 5.4°A far away from the valine (101), the nearest amino acid of elastase-2 to the cleavage site (615-616). The elastase-2 restrictedly cut valine at 615, due to its valine-dependent constriction of catalytic groove 81. The present sequence setting surrounding of G614 (P6- 610VLYQGV NCTEV620-P’5) showed a higher enzymatic activity over the D614, which cannot be completely aligned with previous works on the sequence-based substrate specificity of elastase-2 82. However, the first misaligned residue of the superimposed G614, located at the P’4 position (T618), may also be important for binding with the elastase-2, and further down the threonine (T) at 618, the residues may affect the bonding with the respective amino acids of the protease. This changed conformation at the downstream binding site of G614 may help overcome unfavorable adjacent sequence motifs in the mutated S protein as the elastase-2 substrate. Therefore, the simultaneous and/or sequential processing of the mutated S protein by TMPRSS2/Furin/Cathepsin and elastase-2 facilitates a more efficient SARS- CoV-2 entry into the host cells and cell-cell fusion 41, 58, 78. This study further observed the possible association of the S protein with elastase-2 and found an increased binding affinity in case of G614 (Table 3). Hence, the active sites of the mutated protein interacted efficiently with more amino acids of elastase-2 (Table 4), possibly providing a better catalytic activity as shown by Hu, et al. 41. The mutation may have changed the structural configuration of the elastase-2 cleavage site in a way that the enzyme is facing less challenge to get near to the cutting site of the altered protein (Figure 3a-b and 4). The efficient cleaving of this enzyme, although located in an upstream position of the S1- S2 junction, may assist in releasing S1 from S2 and change the conformation in a way to get later cleaved at the S2’ site, and then help in the fusion process 83, 84. Mutated spike protein and elastase-2 complex was more flexible than the wild one, and the interactions with enzyme was also different as shown in RMSD deviation between the complexes (Figure 5). ![Figure 5.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/02/23/2021.02.21.21252137/F5.medium.gif) [Figure 5.](http://medrxiv.org/content/early/2021/02/23/2021.02.21.21252137/F5) Figure 5. (a) Both the wild and mutated spike protein had lower RMSD profile till 60ns, then it rised and maintained steady state. Although the spike protein had higher degree of deviation in RMSD profile than RdRp but they did not exceed 3.0Å. The RMSD from demonstrated that mutant and wild RdRp protein complex has initial rise of RMSD profile due to flexibility. Therefore, both RdRp complexes stabilized after 30ns and maintained steady peak. The wild type RdRp complex had little bit higher RMSD peak than mutant RdRp which indicates the more flexible nature of the wild type. (b) The spike protein complex had similar SASA profile and did not change its surface volume and maintained similar trend during the whole simulation time. The higher deviation of SASA indicates that mutant and wild type RdRp had straight line but mutant structure had higher SASA profile which indicates the protein complex had enlarged its surface area. Therefore, mutation in RdRp protein leads to more expansion of the surface area than wild types as their average SASA value had significant difference. (c) Mutated spike exhibits little more Rg profile than the wild type which correlates with the comparative labile nature of the mutant. The higher level of Rg value defines the loose packaging system and mobile nature of the protein systems. The mutant RdRp had lower level of fluctuations and maintains its integrity in whole simulation time. The wild type RdRp complexes had higher deviations and more mobility than the mutant complex. (d) Any aberration in hydrogen bond number can lead to a higher flexibility. Therefore, the mutant and wild spike protein exhibit same flexibility in terms of H-bonding. The mutant RdRp protein had more hydrogen bonding than the wild types, but they did not differentiate too much and relatively straight line was observed for the protein. View this table: [Table 3.](http://medrxiv.org/content/early/2021/02/23/2021.02.21.21252137/T3) Table 3. The scores of HADDOCK, PRODIGY (ΔG and Kd (M) at 37.0) for RdRp/NSP8 and Spike-Elastase docked complex. View this table: [Table 4.](http://medrxiv.org/content/early/2021/02/23/2021.02.21.21252137/T4) Table 4. Assess the effect of mutations on structural dynamics of NSP-12/ RDRP, Spike, NS3 and N Protein of SARS CoV-2 using DynaMut. The value of ΔΔG <0 indicates that the mutation causes destabilization and ΔΔG > 0 represents protein stabilization. For ΔΔSVibENCoM, positive and negative value denotes the increase and decrease of molecular flexibility, respectively. Besides, this G614 amino acid replacement may have a destabilization effect on the overall protein structure (Table 4 and Figure 3a-b). The deformed flexible region at or near G614 is the proof of that destabilizing change (Figure 3f and supplementary figure S3). Zhang et al. 85 explained less S1 shedding through more stable hydrogen bonding between Q613 (glutamine) and T859 (threonine) of protomer due to greater backbone flexibility provided by G614. On the other hand, increasing the number of RBD up conformation or increasing the chance of 1-RBD-up conformation due to breakage of both intra-and inter-protomer interactions of the spike trimer and symmetric conformation will give a better chance to bind with ACE2 receptor and can also increase antibody-mediated neutralization 86. The S1 will release from S2 more effectively in G614 protein due to introduction of glycine that will break hydrogen bond in between the D614 and the neighboring protomer T859 amino acid 86, 87. Our analyses have provided the *in silico* proof of this point by showing that the mutated protein was less stable than the wild type protein by missing a hydrophobic interaction with Phe592 (Figure 3g-h). This new result accorded well with the later consensus as explained in Weismann et al. (2020) that G614 will increase its overall flexibility. Therefore, the overall structural change may assist the mutated S protein by providing elastase-2 a better binding space and attachment opportunity onto the cleavage site, thus providing a more stable interaction with the S2 domain that increases the credibility of an efficient infection (Figure 1). ### 3.3 Increased Flexibility of RdRp-NSP8 Complex: Compromise Proof-Reading Efficiency with Replication Speed The binding free energy (ΔG) of the RdRp-NSP8 complexes have been predicted as - 10.6 and -10.5 Kcalmol-1, respectively, in wild (P323) and mutant (L323) type that suggests less strong interaction for mutant protein (Table 4). The number of contacts made at the interface (IC) per property and interacting amino acids increased between L323 and NSP8 (Table 3 and 4). This increasing contacts due to higher hydrogen bonds (Figure 5d) made the complex of mutated RdRp and NSP8 less flexible. Our analyses have identified that proline (P323) or leucine (L323) of RdRp can interact with the aspartic acid (D112), cysteine (C114), valine (V115), and proline (P116) of NSP8 (Figure 6). RdRp binds with NSP8 in its interface domain (residues alanine:A250 to arginine:R365), forming positively charged ‘sliding poles’ for RNA exit and enhance the replication speed probably by extending the RNA-binding surface on NSP8 88, 89. Molecular dynamics of the mutated RdRp-NSP8 complex supported this by showing a more expanded surface area in interacting sites (Figure 5b) and maintained integrity throughout the simulation (Figure 5c). Another study reported that the NSP8 binding sites on RdRp and the RNA exit tunnel were comparatively neutral and conserved 90. Besides, a zinc ion in the conserved metal-binding motif (H295, C301, C306, and C310), close to the 323 residue, is responsible for maintaining the integrity of the RdRp architecture 90, 91. We did not find any interaction of NSP8 with the zinc-binding residues of RdRp proteins (Table 3 and Figure 6). Therefore, the P323L mutation within this conserved site of the RdRp interface domain may only affect the RdRp-NSP8 interaction without changing metal binding affinity. ![Figure 6.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/02/23/2021.02.21.21252137/F6.medium.gif) [Figure 6.](http://medrxiv.org/content/early/2021/02/23/2021.02.21.21252137/F6) Figure 6. The molecular interaction of mutant RdRp with NSP8. The mutant (L323) RdRp (pale green) and NSP8 (light blue) are interacting as shown in center of the lower figure and an enlarged view of the docked site is presented above within a box. The leucine at 323 interacted with the Asp (112), Cys (114), Val (115), and Pro (116). The wild (P323) RdRp has identical docking interactions with NSP8 (table 4), thus is not presented as separate figure here. Recent reports on structural analyses revealed that the RdRp:p.P323L led to the stabilization of the protein structure 92, 93. The results from the six state-of-the-art tools used in this study have suggested that mutant (L323) protein cannot be concluded as ‘stable’ from the ambiguous G values (Table 4), rather the interaction with the adjacent amino acid 94 ΔΔ mainly defined the stability. Although the hydrophobic amino acid (L323) embedded in the L323 protein, our secondary structure analysis indicated the presence of only loop structure in 323 position for both wild and mutant types (Figure S1). However, Chand et al. (2020) reported the turn structures of 323 and 324 shifted into five sheets at positions 321, 322, 323, 324 and 327 to bury leucine. The superimposed 3D structures solved this contradiction about secondary structure by showing that there was no deviation in loop/turn structure of mutant protein (supplementary Figure S4). Overall, the mutation at 323 position, to some extent, stabilized the L323 structure, made the protein more rigid, binds less strongly with the NSP8, and thus expanded the interacting region with NSP8. These variations may together increase the replication speed by exiting the processed RNA genome from the RdRp groove structure more swiftly (Figure 1). The increasing replication speed might be due to the perturbation of interaction between RdRp and NSP8 88, 94, or less possibly, the complex tripartite interactions (RdRp, NSP8, and NSP14) responsible for the speculated decrease of proof-reading efficiency 6. Thus, RdRp mutants might increase the mutation rate by a trade-off between high replication speed and low fidelity of the mutant polymerase 95. Another possibility could be the lower proof-reading efficiency of NSP14 that was not linked to the replication speed 6. Analysis of our study sequences revealed that the frequency of mutation (median=8) in L323 mutants (n=27,364) is significantly higher (p<0.0001) than the frequency (median=6) of wild-type (P323) strains (n=9,815). This increased mutation rate may play a vital role in genetic drifts and provide next generations a better adaptation to adverse environments. ### 3.4 Q57H Substitution in ORF3a Viroporin: the Roles of Decreased Ion Permeability This study has found that the replacement of glutamine (Q57) with positively charged histidine (H57) at 57 position of ORF3a transmembrane region 1 (TM1) does not change secondary transmembrane helical configuration (supplementary Figure S1), and aligned 3D structures have also shown no variation of TM1 in the monomeric state (Figure 7a). The mutant (H57) protein has a non-significant increase in structural stabilization and a minimal decrease in molecular flexibility (Table 4 and supplementary Figure S5). This is because of the weak ionic interaction of Cα with the sulfur atom of cysteine (C81) in TM2 and hydrogen bond of terminal Nζ of lysine (61K) with one of the endocyclic nitrogens of H57 (Figure 7b). ![Figure 7.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/02/23/2021.02.21.21252137/F7.medium.gif) [Figure 7.](http://medrxiv.org/content/early/2021/02/23/2021.02.21.21252137/F7) Figure 7. The effect on transmembrane channel pore of ORF3a viroporin due to p.Q57H mutation. **(a)** The wild (Q57) and mutant (H57) ORF3a protein are presented in light gray and green color, respectively. The structural superposition displays no overall conformation change, however the histidine at 57 position of mutant ORF3a (deep blue) has slightly rotated from glutamine at same position of the wild protein (bright red). This change in rotamer state at 57 residue may influence **(b)** the overall stability of H57 (upper part) over Q57 (lower part) because of ionic interaction of histidine (green; stick model) of transmembrane domain 1 (TM1) with cysteine at 81 (yellow stick) of TM2. The color code defined different bond types is shown in inlet. Selection analysis predicted the accumulation of those mutations as a result of pervasive positive selection with an increasing trend 96. One possibility of this positive selection of histidine over glutamine might be the role of H57 (TM1) in increasing constrictions wherein the diagonal C81 (TM2) may assist through the ionic interaction (Figure 7b). Notably, the Q57 in wild-type protein forms the major hydrophilic constriction within the ORF3a channel pore 97. Thus, further favorable increasing constrictions within the H57 protein channel pore and the replacement of charge-neutral Q with a positive-charged H in the selectivity filter may reduce the passing of positive ions such as Ca2+, Na+, and K+ by either electrostatic repulsion or blocking 98–101. This speculation for ORF3a mutant protein was supported by another study that showed the reduction of ion permeability of Na2+ and Ca2+ through the H57, however, that decrease was not found statistically significant (p>0.05) 97. The decrease intracellular concentration of cytoplasmic Ca2+ ions potentially reduces caspase-dependent apoptosis of the host cell 102, mainly supporting viral spread without affecting replication 23 as shown in Figure 1. Moreover, the ORF3a can drive necrotic cell death 103 wherein the permeated ions 104 and the insertion of ORF3a viroporin into lysosome 105 play vital roles. The H57 mutant may thus decrease pathogenicity and symptoms during the early stages of the infection, i.e., reducing ‘cytokine storm’ in the host 106. Besides, ORF3a was proved to affect inflammasome activation, viral release, and cell death, as shown by Castaño-Rodriguez, et al. 107 that the deletion of ORF3a reduced viral load and morbidity in animal models. Even though similar proteins of ORF3a have been identified in the sarbecovirus lineage infecting bats, pangolins, and humans 108, only one pangolin derived strain from 2017 in Guangxi, China contains H57 residue as shown by our mutation analyses (supplementary Figure S6) and also reported by Kern, et al. 97. However, bat or civet did not contain this mutation in TM1 and the flanking vicinity was not identical; whereas, the TM1 (41LPFGWLIVGVALLAVFQSASKII63) does not have amino acid replacements for the strains of SARS-CoV-2 and pangolin-coronaviruses (Figure S5a-b). The presence of this mutation in pangolin could be an accidental case or might explain its impact on modulating host-specific immune response, which needs functional experimental verification. A possible explanation behind that presence might be the more accustomed nature of the virus towards reverse transmission by being less virulent, i.e., from human to other animals, as observed in recent reports 109, 110. ### 3.5 N Protein Mutation: Augmenting Nucleocapsid Stability and Exerting Miscellaneous Effects Our study has observed that the combined mutation (N: p.RG203-204KR) causes no conformational change in secondary and 3D structures (Figure S1 and Figure 8, respectively) of the conserved SR-rich region of the LKR (supplementary Figure S7), but there is a minor alteration in the degree of buried or exposed site (Figure 3). This result contradicted the prediction of 111 about the change in the length and arrangements of the alpha-helix in the SR- rich region. The superimposed 3D structures showed structural deviation, rather at 231ESKMSGKGQQQQGQTVT247 of the LKR (Figure 9), corresponding to the high destabilization of the KR203-204 protein (Table 3). On the other hand, A220V mutation in the N protein of the GV clade showed a slightly more stable formation of the mutated N protein with no change in the chemical properties (Table 4), that might affect RNA binding affinity112. Impedance to form particular SR-motif due to RG KR mutation might disrupt the phosphorylation catalyzed by host glycogen synthase kinase-3 113. Similar hypo- phosphorylation events could arise due to the conversion of serine to nonpolar or neutral amino acids (L188/194/197, I193, and N202), as represented in supplementary Table S2 and 114. Consequently, the low phosphorylation level after entering into the cell should unwind the viral ribonucleoprotein (RNP) in a slower but more organized fashion that might have an impact upon translation and immune-modulation 115–117. In KR203-204, replacement of a glycine with lysine that may increase the nucleocapsid (N protein- RNA complex) stability by forming stronger electrostatic and ionic interactions due to increased positive charge 118, 119. Besides, the more disordered orientation of the associated LKR 115 and highly destabilizing property of KR203-204 may assist in the packaging of a stable RNP 112, 120. These interactions and impact upon mutations are depicted in Figure 1. ![Figure 8.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/02/23/2021.02.21.21252137/F8.medium.gif) [Figure 8.](http://medrxiv.org/content/early/2021/02/23/2021.02.21.21252137/F8) Figure 8. Structural superposition of wild and mutant N protein. The light grey color represents both wild (RG203-204) and mutant (KR203-204) N protein. The linker region (LKR: 183-247 amino acids) of wild (RG203-204) and mutant (KR203-204) are in pale yellow and warm pink color, respectively. **(a)** The aligned structures showed a highly destabilizing (Table 3) conformational change from 231 to 247 amino acids within LKR. Other regions of the N protein, especially the SR-rich region (184-204 amino acids) where the mutations occur, do not change. **(b)** A more emphasized look into the SR-rich and mutated sites (RG203-204KR) of wild and mutant N protein represent slight deviation in the Ser (197) and Thr (198) while only glycine (green) to arginine (blue) substitution at 204 position shows changing at rotamer state. The enlarged view is shown in the bottom part. N protein also utilizes the dynamic nature of the intrinsically disordered linker region (LKR) that controls its affinity towards M protein, self-monomer, 5’UTR, and cellular proteins 121–123. The phosphorylation at the LKR site may play an essential role to regulate these interactions 118. It was speculated that KR203–204 attained more selective advantage 96 over the other mutations of N protein (Table S2), probably because of stronger RNA binding and synchronized hypo-phosphorylation. ### 3.6 Silent Mutations may not be Silent The C241T of 5’UTR (untranslated region), a single nucleotide change, or ‘silent’ mutation, located at the UUCGU pentaloop part of the stem-loop region (SLR5B) has a potential role in viral packaging 124. This pentaloop of 5’UTR remains unchanged and maintains a particular structure 122, 125. The RNA secondary structural analysis in our study predicted that there is no change in the 241T structure (supplementary Figure S8a). However, the silent mutation in the loop region upstream to the ORF1a start codon (266-268 position) may be involved in differential RNA binding affinity to the ribosome and translational factors126. In the case of multi-domain NSP3 (papain-like protease), we have observed superior stability of the RNA after gaining of the synonymous mutation 3037C