The emergence, spread and vanishing of a French SARS-CoV-2 variant exemplifies the fate of RNA virus epidemics and obeys the Black Queen rule ============================================================================================================================================= * Philippe Colson * Philippe Gautret * Jeremy Delerce * Hervé Chaudet * Pierre Pontarotti * Patrick Forterre * Raphael Tola * Marielle Bedotto * Léa Delorme * Anthony Levasseur * Jean-Christophe Lagier * Matthieu Million * Nouara Yahi * Jacques Fantini * Bernard La Scola * Pierre-Edouard Fournier * Didier Raoult ## Summary The nature and dynamics of mutations associated with the emergence, spread and vanishing of SARS-CoV-2 variants causing successive waves are complex1-5. We determined the kinetics of the most common French variant (“Marseille-4”) for 10 months since its onset in July 20205. Here, we analysed and classified into subvariants and lineages 7,453 genomes obtained by next-generation sequencing. We identified two subvariants, Marseille-4A, which contains 22 different lineages of at least 50 genomes, and Marseille-4B. Their average lifetime was 4.1±1.4 months, during which 4.1±2.6 mutations accumulated. Growth rate was 0.079±0.045, varying from 0.010 to 0.173. All the lineages exhibited a “gamma” distribution. Several beneficial mutations at unpredicted sites initiated a new outbreak, while the accumulation of other mutations resulted in more viral heterogenicity, increased diversity and vanishing of the lineages. Marseille-4B emerged when the other Marseille-4 lineages vanished. Its ORF8 gene was knocked out by a stop codon, as reported in several mink lineages and in the alpha variant. This subvariant was associated with increased hospitalization and death rates, suggesting that ORF8 is a nonvirulence gene. We speculate that the observed heterogenicity of a lineage may predict the end of the outbreak. Keywords * SARS-CoV-2 * variants * epidemic * Marseille-4 * Pangolin B.1.1.160 * ORF8 ## Introduction The shape of epidemic curves of acute infectious diseases is the subject of several hypotheses and interpretations. The occurrence of successive waves of SARS-CoV-2 infections during the current pandemic was linked to the emergence of viral variants1-5, while possible causes of the extinction of epidemics are viral load decrease6, herd immunity7 (as hypothesized for influenza viruses)8, or the implementation of treatment or vaccination9. However, the factors and mechanisms involved in the rise and fall of SARS-CoV-2 variants have not been elucidated. We identified in July 2020 at IHU Méditerranée Infection, Marseille, France (which generated 20% of the genomes deposited in GISAID ([https://www.gisaid.org/](https://www.gisaid.org/))10 by France as on 16/12/2021), a new SARS-CoV-2 variant, named Marseille-4 (later classified as lineage 20A.EU2 and B.1.160 in Nextstrain and Pangolin classifications)11. This variant is characterized by 20 mutations, including 13 specific compared with the Wuhan-Hu-1 isolate5,11. Seven mutations are nonsynonymous, including one in the spike glycoprotein (S477N). We analysed the epidemiological source and features of this variant and accumulated genetic data through extensive SARS-CoV-2 genomic surveillance by next-generation sequencing from its onset until its disappearance 10 months later in April of 2021. Thus, we could study the nature and dynamics of mutations associated with its emergence, spread, and vanishing. ## Results ### Kinetics of the Marseille-4 variant infection The identification of the Marseille-4 variant in late July 202011 (Figure 1a) led to design a specific qPCR to evaluate its incidence and 9,616 cases were identified. It was the third most commonly diagnosed variant at our institution after variants Alpha/B.1.160 (n= 10,139) and Delta/B.1.617.2 (11,060). By contrast, it was rarely observed during this period in the UK or Spain, where the Marseille-2/B.1.177 variant predominated (Figure 1b). The shape of the incidence curve of Marseille-4 differed from that of mutants of the Wuhan-Hu-1 isolate or of alpha and delta variants that show a “gamma” distribution12. The curve of Marseille-4 included several peaks, with a first in September (week 37), a second in October (week 43), and a third in January 2021 (week 2) before the variant vanished in April 2021 (Figure 1a). ![Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/01/08/2022.01.04.22268715/F1.medium.gif) [Figure 1.](http://medrxiv.org/content/early/2022/01/08/2022.01.04.22268715/F1) Figure 1. Weekly incidence of SARS-CoV-2 diagnoses at the IHU Méditerranée Infection, Marseille, France and incidence of the SARS-CoV-2 Marseille-4 variant (a) and spread of the Marseille-4 variant in France and three additional European countries (b). Figure 1b is adapted from screenshots from the CoVariants website ([https://covariants.org/](https://covariants.org/))31. ### Marseille-4 subvariants and lineages Next-generation sequencing was performed when cycle threshold values (Ct) of qPCR was <30 and a genome was obtained from 7,453 patients (Figure 1a). Phylogenetic analysis and comparative genomics identified two subvariants, i.e. new variants issued from a circulating variant (Marseille-4A and Marseille-4B). The Marseille-4A subvariant contained 22 different lineages of at least 50 SARS-CoV-2 genomes harbouring one to four hallmark nucleotide changes (Figure 2a; Supplementary Tables S1, S2, S3). Interestingly, the single sequence reported from an infected mink farm in France was a Marseille-4A variant11. The growth rate varied throughout the Marseille-4 epidemic for each subvariant and lineage (Figures 2b-g; Supplementary Figure S1; Supplementary Tables S2, S3). It was 0.079±0.045 on average and varied from 0.010 for the Marseille-4A.17 lineage to 0.173 for the Marseille-4A.15 lineage. Thus, we observed heterogeneous growth rates for Marseille-4 subvariants and lineages, as indicated by a very high ratio of true heterogeneity to the total observed variation (I2 = 99%; p < 0.05; Supplementary Figure S2). ![Figure 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/01/08/2022.01.04.22268715/F2.medium.gif) [Figure 2.](http://medrxiv.org/content/early/2022/01/08/2022.01.04.22268715/F2) Figure 2. Phylogenetic tree of SARS-CoV-2 Marseille-4 genome sequences obtained from patients diagnosed with SARS-CoV-2 infection at IHU Méditerranée Infection (a) and growth rate of the Marseille-4 variant overall (b) and of four Marseille-4A lineages (c-f) and of the Marseille-4B lineage (g). b-g: Time series of the number of mutations from the variant root of each subvariant or lineage along with the loess (locally estimated scatterplot smoothing) regression curve and its 95% confidence interval. ### ORF8 gene inactivation in the Marseille-4B subvariant The Marseille-4B subvariant spread from September 2020 to March 2021 with a gamma distribution of cases (Figure 1a; Supplementary Tables S2, S3). It expanded significantly in December 2020 during the vanishing of the other lineages. It exhibited a mean number of 4.1±1.6 lineage-specific mutations (range, 0-15; n= 1,319 genomes). Interestingly, the ORF8 gene was knocked out at the origin of the Marseille-4B variant. This gene may play a key role in immune modulation and increases virus multiplication13. Its inactivation by a stop codon has been reported in several mink lineages and in all genomes of the Alpha variant. The dimeric structure of the ORF8 protein (wild-type and mutant forms) is shown in Figure 3a-c. The dimer is stabilized by a covalent bond (a disulfide bridge) between two cysteine residues in the N-terminal region of each subunit14. Mutation A65S does not induce major structural or electrostatic surface potential alterations (Figure 3b). Both the initial A65 and mutant A65S residues are well exposed to the solvent and occupy approximately the same volume. By contrast, the truncated 18-63 form leads to a different protein, despite its sequence identity with the 18-63 region of the initial ORF8 protein chains (Figure 3c). The electrostatic surface potential of the truncated protein is also significantly affected, with an increase in both neutral and electronegative surface areas. These structural data suggest a total loss of ORF8 function for the truncated 18-63 form. Finally, mutation H17Y affects the C-terminal residue of the signal peptide, so it is not present in the mature form of ORF8, as shown in Figure 3a-c. ![Figure 3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/01/08/2022.01.04.22268715/F3.medium.gif) [Figure 3.](http://medrxiv.org/content/early/2022/01/08/2022.01.04.22268715/F3) Figure 3. Structure of the SARS-CoV-2 ORF8 protein and its mutated and truncated forms (a-c), and signature mutations in the genomes obtained for each of the Marseille-4 subvariants and lineages (d). a-c: The upper panel (a) shows the structure of dimeric SARS-CoV-2 ORF8 shown as superimposed surface and cartoon representations. The missing amino acids (65-66 in chain A and 66-68 in chain B) were inserted with Swiss-PdbViewer in PDB: 7JTL, and the resulting model was minimized using the Polak-Ribiere algorithm of Hyperchem. Residue A65 in both chains is highlighted in cyan. The structure of the ORF8 mutant A65S (highlighted in green) was modelled using Swiss-PdbViewer and Hyperchem (middle panels) (b). The structure of truncated ORF8 18-63 (bottom panels) was obtained using Hyperchem (c). For all models, the surface potential of the protein is shown in the right panels (blue, positive; red, negative; white, neutral). b: Synonymous nucleotide changes are indicated by a grey background. Non-synonymous nucleotide changes are indicated by a black background. ### Severity of the Marseille-4B subvariant infections We compared the characteristics of the first 181 patients identified as infected with Marseille-4B and 1,647 patients identified as infected with Marseille-4A (Table 1a-b). Patients infected with Marseille-4B were more likely to be female and older than those infected with Marseille-4A. The mean Ct values did not differ between the two groups of patients. Higher hospitalization and death rates were observed in patients infected with Marseille-4B. Multivariate analysis (Table 1b) confirmed that increased hospitalization rate was significantly associated with male sex, older age, a lower viral load (increased Ct value) and Marseille-4B infection. An increased rate of transfer to the intensive care unit was significantly associated with male sex and older age. An increased death rate was significantly associated with male sex, older age and Marseille-4B infection. Thus, we concluded that Marseille-4B was more virulent and suspected that it is related to the knock-out of ORF8. View this table: [Table 1.](http://medrxiv.org/content/early/2022/01/08/2022.01.04.22268715/T1) Table 1. Characteristics of 1,828 patients infected with Marseille-4A or Marseille-4B (a) and risk factor analysis for hospitalization, transfer to the intensive care unit and death in 1647 patients infected with Marseille-4A or Marseille-4B (b). ### Some mutations are associated with expansion of Marseille-4A lineages, other with their vanishing Analysing the kinetics of new variants without enough viral sequences leads to confused interpretation because of the superposition of different lineages in this clade. As for Marseille-4, we identified nucleotide changes among the 22 Marseille-4A lineages. Most of the Marseille-4 subvariants and lineages showed a gamma distribution of cases with a lifetime of 4.1±1.4 months on average (Supplementary Figure S3), during which 4.1±2.6 mutations accumulated in viral genomes. RNA viruses have a high mutation rate15,16 and SARS-CoV-2 accumulates approximately one mutation every 2 weeks17. We found signature mutations in each of these lineages, including in the Nsp1, Nsp2, NSp2, spike, ORF7a, ORF7b, and ORF8 genes (Supplementary Table S1; Figure 3d). Interestingly, the accumulation of mutations resulting in an increasing genetic divergence correlated with a decreased incidence. Thus, the accumulation of nonlethal and nonfavouring mutations leads gradually to a dispersion of the lineages, a decrease in viral fitness and vanishing of the epidemic. ## DISCUSSION Here, we described the complete cycle of emergence, spread and vanishing of the Marseille-4 variant identified in France from mink11 by analysing 7,453 genomes. The viral mutation rate leads to the accumulation of many random mutations, most presumably mildly deleterious, with little effect on fitness. Only 10−8 may be associated with a fitness gain15,18,19. Overall, the RNA virus fitness evolution includes an initial period of rapid multiplication possibly caused by a positive mutation followed by the decline of viral fitness caused by accumulation of unfit mutations, as described for the vesicular stomatitis virus15,20. We observed heterogeneity of the growth rates for the different Marseille-4 subvariants and lineages, making it challenging to generalize the behaviour of one SARS-CoV-2 subvariant or lineage to all of them. All Marseille-4 lineages present a genetic signature with mutations sometimes associated with the inactivation of ORF7a or ORF7b, as described21. None of these mutations, apart from those located in the spike gene, were predicted to be possibly associated with increased transmissibility. Knock-out of the ORF8 and ORF7a/b genes shows that SARS-CoV-2 virulence may be associated with a loss of genes, as shown for bacteria in which the decline in genomic content is often associated with an increased specificity and virulence22. Analysis of the Marseille-4B subvariant revealed the presence of a stop codon in ORF8, which was observed in the alpha variant and viruses infecting minks and pangolins23. Marseille-4B subvariant is more virulent, at a time when the other Marseille-4 lines had vanished or were vanishing. This finding suggests that this gene may be a so-called “nonvirulence gene”24 and that SARS-CoV-2 may follow the “Black Queen” hypothesis, which proposes that the loss of a gene may provide a selective advantage when its function is dispensable25, leading to the uselessness of correcting inactivating mutations on the “Use it or lose it” model26. As expected, structural analysis of the truncated form of ORF8 (ORF8 18-63) confirms that the large deletion induced by the stop codon results in a distinct protein that does not retain any resemblance to the native ORF8 dimer. Interestingly, partial or complete deletions of the ORF8 gene were observed during the middle and late phases of the SARS-CoV epidemic in 2002-200327. Additionally, the knock-out of these genes suggests that SARS-CoV-2 originates from a distinct animal reservoir, explaining why several of its genes are not essential and may be knocked-out in humans, minks and pangolins. Similarly, ORF8 truncation has been hypothesized to have occurred for HCoV-229E in humans after zoonotic transmission from bats or intermediate hosts28. Functionally, ORF8 may help the virus to adapt to new hosts by facilitating immune evasion due to functional mimicry with immunological molecules29. This function may not remain critical as soon as the virus is well adapted to its new hosts. Interestingly, evolutionary changes in virulence observed for myxoma virus after its introduction as a biological control in rabbits of Australia were often associated with losses of gene functions30. Regarding the spread of these Marseille-4 lineages, their average detection duration was approximately 4 months, indicating that the accumulation of mutations beyond 8 was associated with a vanishing. This accumulation of mutations is associated with genetic heterogeneity and increased diversity. This leads to a funnel-shaped evolution of the viral population. Overall, several beneficial mutations at unpredicted sites increase fitness, while the accumulation of other mutations results in decreased fitness, loss of clonality and vanishing of the lineage. We speculate that the observed heterogenicity of a lineage may predict the end of the outbreak. ## Data Availability The dataset generated and analyzed during the current study are available in the GISAID database ([https://www.gisaid.org/](https://www.gisaid.org/)). ## Methods ### Patients Patients included in the present study were those identified as infected with the SARS-CoV-2 Marseille-4 variant1,2. The present study has been approved by the ethics committee of University Hospital Institute (IHU) Méditerranée Infection (N°2020-016-3). Epidemiological and clinical data were retrieved for patients registered in the Assistance Publique-Hôpitaux de Marseille (APHM) information system. Access to the patients’ biological and registry data issued from this system was approved by the data protection committee of APHM and was recorded in the European General Data Protection Regulation registry under number RGPD/APHM 2019-73. Statistical processes were performed using R software version 4.0.2 ([https://cran.r-project.org/](https://cran.r-project.org/)). A p< 0.05 was considered statistically significant. ### SARS-CoV-2 genotyping SARS-CoV-2 genotyping was performed from nasopharyngeal samples tested between 1 July 2020 and 30 April 2021 (10 months) at the IHU Méditerranée Infection Institute ([https://www.mediterranee-infection.com/](https://www.mediterranee-infection.com/)). Next-generation sequencing was performed when the cycle threshold value (Ct) of the qPCR used to diagnose SARS-CoV-2 infection was <30. When the Ct was ≥30 and in any case in the absence of an available genome sequence, the genotype was determined for SARS-CoV-2-positive specimens using variant-specific qPCR, as previously described1,2,3. Viral RNA was extracted from 200 µL of nasopharyngeal swab fluid using the EZ1 Virus Mini Kit v2.0 and the EZ1 Advanced XL instrument (Qiagen, Courtaboeuf, France) or the KingFisher Flex system (Thermo Fisher Scientific, Waltham, MA, USA) following the manufacturer’s instructions. SARS-CoV-2 genome sequences were obtained as follows: by next-generation sequencing using Illumina technology, the Nextera XT paired end strategy and the MiSeq instrument (Illumina Inc., San Diego, CA, USA) since February 2020, as previously described2; using the Illumina COVID-seq protocol and the NovaSeq 6000 instrument (Illumina Inc.) since April 2021; or using Oxford Nanopore technology (ONT) and the GridION instrument (Oxford Nanopore Technologies Ltd., Oxford, UK), as previously described2. Next-generation sequencing with ONT was performed without or with (since March 2021) synthesized cDNA amplification using a multiplex PCR protocol with ARTIC nCoV-2019 v3 Panel primers purchased from Integrated DNA technologies (IDT, Coralville, IA, USA) according to the ARTIC procedure ([https://artic.network/](https://artic.network/)), as previously described. Postextraction, viral RNA was reverse-transcribed using SuperScript IV (Thermo Fisher Scientific) before cDNA second strand synthesis with Klenow Fragment DNA polymerase (New England Biolabs, Beverly, MA, USA) when performing NGS using the Illumina MiSeq instrument (Illumina Inc.)2, LunaScript RT SuperMix kit (New England Biolabs) when performing NGS with the ONT, or according to the COVIDSeq protocol (Illumina Inc.) following the manufacturer’s recommendations. The generated cDNA was purified using Agencourt AMPure XP beads (Beckman Coulter, Villepinte, France) and quantified using a Qubit 2.0 fluorometer (Invitrogen, Carlsbad, CA, USA). ### Assembly and analyses of genome sequences Genome sequences were assembled by mapping on the SARS-CoV-2 genome GenBank accession no. [NC\_045512.2](http://medrxiv.org/lookup/external-ref?link_type=GEN&access_num=NC_045512.2&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F08%2F2022.01.04.22268715.atom) (Wuhan-Hu-1 isolate) using CLC Genomics workbench v.7 (with the following thresholds: 0.8 for coverage and 0.9 for similarity) ([https://digitalinsights.qiagen.com/](https://digitalinsights.qiagen.com/)) as previously described2 or Minimap2 ([https://github.com/lh3/minimap2](https://github.com/lh3/minimap2))4. Samtools ([https://www.htslib.org/](https://www.htslib.org/)) was used for soft clipping of Artic primers ([https://artic.network/](https://artic.network/)) and to remove sequence duplicates5. Consensus genomes were generated using CLC Genomics workbench v.7 and Sam2consensus ([https://github.com/vbsreenu/Sam2Consensus](https://github.com/vbsreenu/Sam2Consensus)) through a first in-house script written in Python language ([https://www.python.org/](https://www.python.org/)). Mutation detection was performed using the Nextclade tool ([https://clades.nextstrain.org/](https://clades.nextstrain.org/)) and freebayes ([https://github.com/freebayes/freebayes](https://github.com/freebayes/freebayes))6 using a mapping quality score of 20 and results filtered by the Python script based on major nucleotide frequencies ≥ 70% and nucleotide depths ≥ 10 (when sequence reads were generated on the NovaSeq Illumina instrument (Illumina Inc.)) or ≥ 5 (when sequence reads were generated on the MiSeq Illumina instrument). SARS-CoV-2 genotyping was performed using a second in-house script written in Python language ([https://www.python.org/](https://www.python.org/)) by comparing mutation patterns with those of our database of SARS-CoV-2 variants. Nextstrain clades and Pangolin lineages provided in the present study were determined using the Nextclade web application ([https://clades.nextstrain.org/](https://clades.nextstrain.org/))7,8 and Pangolin web application ([https://cov-lineages.org/pangolin.html](https://cov-lineages.org/pangolin.html))9, respectively. The sequences described in the present study have been deposited in the GISAID sequence database ([https://www.gisaid.org/](https://www.gisaid.org/))10 and can be retrieved online using the GISAID online search tool with “Marseille” as a keyword, and then selecting sequence names containing “IHU” or “MEPHI”. The sequences have also been deposited in the IHU Marseille Infection website: [https://www.mediterranee-infection.com/tout-sur-le-coronavirus/sequencage-genomique-sars-cov-2/](https://www.mediterranee-infection.com/tout-sur-le-coronavirus/sequencage-genomique-sars-cov-2/). ### Phylogenetic reconstruction and definition and naming of Marseille-4 subvariants and lineages Phylogenetic reconstruction based on SARS-CoV-2 genomes were performed for genome sequences >24,000 nucleotides using the 3extstrain/ncov tool ([https://github.com/nextstrain/ncov](https://github.com/nextstrain/ncov)) and then were visualized using Auspice ([https://docs.nextstrain.org/projects/auspice/en/stable/](https://docs.nextstrain.org/projects/auspice/en/stable/)). ### Structural analysis of the untruncated and truncated ORF8 protein A structural model of the ORF8 protein was generated from pdb file 7JTL11. The gaps in the crystal structure were fixed by incorporating the missing amino acids with the Robetta protein structure prediction tool12, followed by energy minimization with the Polak-Ribière algorithm as previously reported13. Mutant and truncated proteins were then generated with Swiss-PdbViewer14 and submitted to several rounds of energy minimization as described15. ### Evolution of Marseille-4 subvariants and lineages and time dynamics of SARS-CoV-2 mutation accumulation Duration of circulation of the different subvariants and lineages was calculated using the differences between the 5th and 95th percentiles of sampling dates; this allowed considering the time periods during which the subvariants and lineages had a significant incidence. Time dynamics of mutation accumulation were analysed using locally estimated scatterplot smoothing (loess) for regression fitting. Latent structural changes in mutation distributions were retrieved using change point analysis of mean and variance with binary segmentation method and Schwartz information criterion penalty associated with a penalty threshold of 0.0516. We assessed the epidemiological capabilities of subvariants using the early stages of each epidemic curve. During this exponential phase, the size of the susceptible population may be considered as constant, and the cumulated number of cases exponentially increases at an approximately constant rate that was defined as the growth rate. After logarithmic transformation, the cumulated number of cases followed a linear model as follows: ln(It) = ln(I0) + Λt, where It is the cumulated incidence at time t, I0 is the initial number of cases, and Λ is the regression slope and the growth rate. From Λ, the reproduction rate, R, may be easily retrieved using the following equation17: R = (I+ΛD)(1+ΛD’), where D and D’ are the average infectious and pre-infectious periods (according to the SEIR model). Here, we set D at 9.318 and d’ at 3.319. To calculate the growth rate, we used Chow’s F test to determine the inflexion point of the logarithm of cumulated number of cases, which corresponded to the end of the exponential phase. Then, we applied a linear model for this phase to obtain the regression slope (i.e. the growth rate) and its 95% confidence interval. Statistical processes were performed using R software version 4.0.2 ([https://cran.r-project.org/](https://cran.r-project.org/)). All statistical conclusions were made using a 0.05 threshold. ## Author contributions Study conception and design: D.R., P.C., B.L.S., P.E.F, P.G.. Materials, data and analysis tools: P.C., P.G., J.D., H.C., M.B., L.D., A.L., J.C.L., M.M., N.Y., J.F., P.E.F.. Data analyses: D.R., P.C., H.C., J.D., M.B., L.D., N.Y., J.F., B.L.S., P.E.F.. Writing of the first draft of the manuscript: P.C. and D.R.. Critical reviews of and revisions to the manuscript: D.R., P.C., P.G., P.P., P.F., J.F., B.L.S., P.E.F.. All authors approved the final manuscript. Supervision: D.R.. ## Competing interests D.R. is a scientific board member of Eurofins company, a founder of a microbial culture company (Culture Top), and was a consultant for Hitachi High-Technologies Corporation, Tokyo, Japan from 2018 to 2020. All other authors do not have any competing interest to declare. Funding sources had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; and preparation, review, or approval of the manuscript. ## Funding French Government under the “Investments for the Future” program managed by the National Agency for Research (ANR), no. Méditerranée-Infection 10-IAHU-03, and the “Emergen Consortium” ([https://www.santepubliquefrance.fr/dossiers/coronavirus-covid-19/consortium-emergen](https://www.santepubliquefrance.fr/dossiers/coronavirus-covid-19/consortium-emergen)).; Région Provence Alpes Côte d’Azur and European funding FEDER PRIMMI (Fonds Européen de Développement Régional-Plateformes de Recherche et d’Innovation Mutualisées Méditerranée Infection), no. FEDER PA 0000320 PRIMMI. ## Extended data figure/table legends See Supplementary Material. ## Supplementary material ### Supplementary figure legends ![](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/01/08/2022.01.04.22268715/F4/graphic-5.medium.gif) [](http://medrxiv.org/content/early/2022/01/08/2022.01.04.22268715/F4/graphic-5) ![](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/01/08/2022.01.04.22268715/F4/graphic-6.medium.gif) [](http://medrxiv.org/content/early/2022/01/08/2022.01.04.22268715/F4/graphic-6) ![](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/01/08/2022.01.04.22268715/F4/graphic-7.medium.gif) [](http://medrxiv.org/content/early/2022/01/08/2022.01.04.22268715/F4/graphic-7) ![](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/01/08/2022.01.04.22268715/F4/graphic-8.medium.gif) [](http://medrxiv.org/content/early/2022/01/08/2022.01.04.22268715/F4/graphic-8) ![](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/01/08/2022.01.04.22268715/F4/graphic-9.medium.gif) [](http://medrxiv.org/content/early/2022/01/08/2022.01.04.22268715/F4/graphic-9) ![](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/01/08/2022.01.04.22268715/F4/graphic-10.medium.gif) [](http://medrxiv.org/content/early/2022/01/08/2022.01.04.22268715/F4/graphic-10) Supplementary Figure S1a-f. Time series for the different Marseille-4 subvariants and lineages of the number of mutations showing both the loess regression curve and the overall Marseille-4 loess regression curve, with their 95% confidence intervals. MRS, Marseille. ![Supplementary Figure S2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/01/08/2022.01.04.22268715/F5.medium.gif) [Supplementary Figure S2.](http://medrxiv.org/content/early/2022/01/08/2022.01.04.22268715/F5) Supplementary Figure S2. Forest plot like representation of the growth rates of subvariants and lineages with their 95% confidence intervals. ![Supplementary Figure S3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/01/08/2022.01.04.22268715/F6.medium.gif) [Supplementary Figure S3.](http://medrxiv.org/content/early/2022/01/08/2022.01.04.22268715/F6) Supplementary Figure S3. Temporal distribution of the genome sequences of SARS-CoV-2 Marseille-4 subvariants and lineages obtained from patients diagnosed with SARS-CoV-2 infection at IHU Méditerranée Infection. ### Supplementary Tables View this table: [Supplementary Table S1.](http://medrxiv.org/content/early/2022/01/08/2022.01.04.22268715/T2) Supplementary Table S1. List of nucleotide and amino acid changes associated with the onset and expansion of Marseille-4A and Marseille-4B subvariants and lineages. View this table: [Supplementary Table S2.](http://medrxiv.org/content/early/2022/01/08/2022.01.04.22268715/T3) Supplementary Table S2. Time dynamics of Marseille-4 subvariants and lineages. View this table: [Supplementary Table S3.](http://medrxiv.org/content/early/2022/01/08/2022.01.04.22268715/T4) Supplementary Table S3. Number of genomes, number of mutations and time duration for Marseille-4 subvariants and lineages. ## Acknowledgments This work was supported by the French Government under the “Investments for the Future” program managed by the National Agency for Research (ANR), Méditerranée-Infection 10-IAHU-03 and the Emergen Consortium ([https://www.santepubliquefrance.fr/dossiers/coronavirus-covid-19/consortium-emergen](https://www.santepubliquefrance.fr/dossiers/coronavirus-covid-19/consortium-emergen)). We also acknowledge support from the Région Provence Alpes Côte d’Azur and European funding FEDER PRIMMI (Fonds Européen de Développement Régional-Plateformes de Recherche et d’Innovation Mutualisées Méditerranée Infection), FEDER PA 0000320 PRIMMI. We are thankful to Mamadou Beye, Emilie Burel, Elsa Prudent, Céline Gazin, Sofiane Bakour, Laurence Thomas, Séverine Guillon, Véronique Filosa, Ludivine Bréchard, and Claudia Andrieu for their technical help. * Received January 4, 2022. * Revision received January 4, 2022. * Accepted January 8, 2022. * © 2022, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution 4.0 International), CC BY 4.0, as described at [http://creativecommons.org/licenses/by/4.0/](http://creativecommons.org/licenses/by/4.0/) ## References 1. 1.Lemey, P. et al. Untangling introductions and persistence in COVID-19 resurgence in Europe, Nature. 595, 713 (2021). 2. 2.Li, J., Lai, S., Gao, G.F., & Shi, W. The emergence, genomic diversity and global spread of SARS-CoV-2. Nature. 600, 408–418 (2021). 3. 3.Vöhringer, H.S. et al. Genomic reconstruction of the SARS-CoV-2 epidemic in England. Nature. 600, 506–511 (2021). 4. 4.Rochman, N. D. et al. Ongoing global and regional adaptive evolution of SARS-CoV-2, Proc. Natl. Acad. Sci. U. S. A. 118, e2104241118 (2021). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czoxODoiMTE4LzI5L2UyMTA0MjQxMTE4IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjIvMDEvMDgvMjAyMi4wMS4wNC4yMjI2ODcxNS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 5. 5.Colson, P. et al. Analysis of SARS-CoV-2 variants from 24,181 patients exemplifies the role of globalisation and zoonosis in pandemics, Front. Microbiol., Online ahead of print (2022). 6. 6.Hay, J. A. et al. Estimating epidemiologic dynamics from cross-sectional viral load distributions, Science 373 (2021). 7. 7.Omer, S. B., Yildirim, I. & Forman, H. P. Herd Immunity and Implications for SARS-CoV-2 Control, JAMA 324, 2095 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1001/jama.2020.20892&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F08%2F2022.01.04.22268715.atom) 8. 8.Palese, P. & Wang, T. T. Why do influenza virus subtypes die out? A hypothesis, MBio. 2 (2011). 9. 9.Pegu, A. et al. Durability of mRNA-1273 vaccine-induced antibodies against SARS-CoV-2 variants, Science 373, 1372 (2021). 10. 10.Elbe, S., & Buckland-Merrett, G. Data, disease and diplomacy: GISAID’s innovative contribution to global health. Glob Chall 1, 33–46 (2017). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/gch2.1018&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=31565258&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F08%2F2022.01.04.22268715.atom) 11. 11.Fournier, P. E. et al. Emergence and outcomes of the SARS-CoV-2 ‘Marseille-4’ variant, Int. J. Infect. Dis. 106, 228 (2021). 12. 12.Moran, P. A. P. Statistical inference with bivariate gamma distributions, Biometrika 56, 627 (1969). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/biomet/56.3.627&link_type=DOI) 13. 13.Prates, E. T. et al. Potential Pathogenicity Determinants Identified from Structural Proteomics of SARS-CoV and SARS-CoV-2, Mol. Biol Evol 38, 702 (2021). 14. 14.Flower, T. G. et al. Structure of SARS-CoV-2 ORF8, a rapidly evolving immune evasion protein, Proc. Natl. Acad. Sci. U. S A 118, e2021785118 (2021). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1073/pnas.2021785118&link_type=DOI) 15. 15.Elena, S. F. et al. The two faces of mutation: extinction and adaptation in RNA viruses, IUBMB Life. 49, 5 (2000). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1080/152165400306296&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=10772334&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F08%2F2022.01.04.22268715.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000085790800002&link_type=ISI) 16. 16.Jensen, J. D. et al. Imposed mutational meltdown as an antiviral strategy, Evolution 74, 2549 (2020). 17. 17.van Dorp, L. et al. No evidence for increased transmissibility from recurrent mutations in SARS-CoV-2, Nat. Commun. 11, 5986 (2020). 18. 18.Miralles, R. et al. Clonal interference and the evolution of RNA viruses, Science 285, 1745 (1999). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEzOiIyODUvNTQzNC8xNzQ1IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjIvMDEvMDgvMjAyMi4wMS4wNC4yMjI2ODcxNS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 19. 19.Muller, H. J. The relation of recombination to mutational advance, Mutat. Res. 106, 2 (1964). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/0027-5107(64)90047-8&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=14195748&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F08%2F2022.01.04.22268715.atom) 20. 20.Elena, S. F. & Moya, A. Rate of deleterious mutation and the distribution of its effects on fitness in vesicular stomatitis virus, J. Evol. Biol. 12, 1078 (1999). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1046/j.1420-9101.1999.00110.x&link_type=DOI) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000083843600019&link_type=ISI) 21. 21.Nemudryi, A. et al. SARS-CoV-2 genomic surveillance identifies naturally occurring truncation of ORF7a that limits immune suppression, Cell Rep. 35, 109197 (2021). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.celrep.2021.109197&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=34043946&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F08%2F2022.01.04.22268715.atom) 22. 22.Georgiades, K. et al. Gene gain and loss events in Rickettsia and Orientia species, Biol Direct. 6, 6 (2011). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/1745-6150-6-6&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21303508&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F08%2F2022.01.04.22268715.atom) 23. 23.Pereira, F. SARS-CoV-2 variants lacking ORF8 occurred in farmed mink and pangolin, Gene 784, 145596 (2021). 24. 24.Bliven, K. A. & Maurelli, A. T. Antivirulence genes: insights into pathogen evolution through gene loss, Infect. Immun. 80, 4061 (2012). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiaWFpIjtzOjU6InJlc2lkIjtzOjEwOiI4MC8xMi80MDYxIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjIvMDEvMDgvMjAyMi4wMS4wNC4yMjI2ODcxNS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 25. 25.Morris, J. J., Lenski, R. E. & Zinser, E. R. The Black Queen Hypothesis: evolution of dependencies through adaptive gene loss, MBio. 3, mBio–12 (2012). 26. 26.Moran, N. A. Microbial minimalism: genome reduction in bacterial pathogens, Cell. 108, 583 (2002). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0092-8674(02)00665-7&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=11893328&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F08%2F2022.01.04.22268715.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000174314800001&link_type=ISI) 27. 27.Molecular evolution of the SARS coronavirus during the course of the SARS epidemic in China, Science 303, 1666 (2004). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEzOiIzMDMvNTY2NC8xNjY2IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjIvMDEvMDgvMjAyMi4wMS4wNC4yMjI2ODcxNS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 28. 28.Corman, V. M. et al. Evidence for an Ancestral Association of Human Coronavirus 229E with Bats, J. Virol. 89, 11858 (2015). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoianZpIjtzOjU6InJlc2lkIjtzOjExOiI4OS8yMy8xMTg1OCI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIyLzAxLzA4LzIwMjIuMDEuMDQuMjIyNjg3MTUuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 29. 29.Valcarcel, A. et al. Structural Analysis of SARS-CoV-2 ORF8 Protein: Pathogenic and Therapeutic Implications, Front. Genet. 12, 693227 (2021). 30. 30.Kerr, P. J. et al. Evolutionary history and attenuation of myxoma virus on two continents, PLoS. Pathog. 8, e1002950 (2012). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.ppat.1002950&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23055928&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F08%2F2022.01.04.22268715.atom) 31. 31.Hodcroft, E. CoVariants: SARS-CoV-2 mutations and variants of interest. (2012). Available from: [https://covariants.org/](https://covariants.org/). ## References 1. 1.Fournier, P. E. et al. Emergence and outcomes of the SARS-CoV-2 ‘Marseille-4’ variant, Int. J. Infect. Dis. 106, 228 (2021). 2. 2.Colson, P. et al. Analysis of SARS-CoV-2 variants from 24,181 patients exemplifies the role of globalisation and zoonosis in pandemics, Front. Microbiol., Online ahead of print (2022). 3. 3.Bedotto, M. et al. Implementation of an in-house real-time reverse transcription-PCR assay for the rapid detection of the SARS-CoV-2 Marseille-4 variant, J Clin. Virol 139, 104814 (2021). 4. 4.Li, H. Minimap2: pairwise alignment for nucleotide sequences, Bioinformatics. 34, 3094 (2018). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/bty191&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=29750242&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F08%2F2022.01.04.22268715.atom) 5. 5.Li, H. et al. The Sequence Alignment/Map format and SAMtools, Bioinformatics. 25, 2078 (2009). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btp352&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19505943&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F08%2F2022.01.04.22268715.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000268808600014&link_type=ISI) 6. 6.Garrison, E. & Marth, G. Haplotype-based variant detection from short-read sequencing. arXiv. org (2012). 7. 7.Hadfield, J. et al. Nextstrain: real-time tracking of pathogen evolution, Bioinformatics. 34, 4121 (2018). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10/gdkbqx&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F08%2F2022.01.04.22268715.atom) 8. 8.Aksamentov, I., Roemer, C., Hodcroft, E. B. & Neher, R. A. Nextclade: clade assignment, mutation calling and quality control for viral genomes. J Open Source Softw 6, 3773 (2021). 9. 9.Rambaut, A. et al. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology, Nat. Microbiol. 5, 1403–1407 (2020). 10. 10.Elbe, S., & Buckland-Merrett, G. Data, disease and diplomacy: GISAID’s innovative contribution to global health. Glob Chall 1, 33–46 (2017). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/gch2.1018&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=31565258&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F08%2F2022.01.04.22268715.atom) 11. 11.Flower, T. G. et al. Structure of SARS-CoV-2 ORF8, a rapidly evolving immune evasion protein, Proc. Natl. Acad. Sci. U. S A 118 (2021). 12. 12.Kim, D. E., Chivian, D. Ȗ Baker, D. Protein structure prediction and analysis using the Robetta server, Nucleic Acids Res 32, W526–W531 (2004). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/nar/gkh468&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=15215442&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F08%2F2022.01.04.22268715.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000222273100104&link_type=ISI) 13. 13.Fantini, J. et al. Structural dynamics of SARS-CoV-2 variants: A health monitoring strategy for anticipating Covid-19 outbreaks, J. Infect. 83, 197–206 (2021). 14. 14.Guex, N. & Peitsch, M. C. SWISS-MODEL and the Swiss-PdbViewer: an environment for comparative protein modeling, Electrophoresis 18, 2714 (1997). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/elps.1150181505&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=9504803&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F01%2F08%2F2022.01.04.22268715.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000071948300004&link_type=ISI) 15. 15.Di Scala. C. & Fantini, J. Hybrid In Silico/In Vitro Approaches for the Identification of Functional Cholesterol-Binding Domains in Membrane Proteins, Methods Mol. Biol 1583, 7 (2017). 16. 16.Chen, J. & Gupta, A. K. Parametric statistical change point analysis with applications to genetics, medicine, and finance, 2nd ed. (Birkhauser, 2012). 17. 17.White, R. & Vynnycky, E. An introduction to infectious disease modelling, Oup Oxford ed. (2016). 18. 18.Zhao, S. et al. Estimating the generation interval and inferring the latent period of COVID-19 from the contact tracing data, Epidemics. 36, 100482 (2021). 19. 19.Byrne, A. W. et al. Inferred duration of infectious period of SARS-CoV-2: rapid scoping review and analysis of available evidence for asymptomatic and symptomatic COVID-19 cases, BMJ Open. 10, e039856 (2020). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiYm1qb3BlbiI7czo1OiJyZXNpZCI7czoxMjoiMTAvOC9lMDM5ODU2IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjIvMDEvMDgvMjAyMi4wMS4wNC4yMjI2ODcxNS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=)