Multiple introductions followed by ongoing community spread of SARS-CoV-2 at one of the largest metropolitan areas in the Northeast of Brazil ============================================================================================================================================= * Marcelo Henrique Santos Paiva * Duschinka Ribeiro Duarte Guedes * Cássia Docena * Matheus Filgueira Bezerra * Filipe Zimmer Dezordi * Laís Ceschini Machado * Larissa Krokovsky * Elisama Helvecio * Alexandre Freitas da Silva * Luydson Richardson Silva Vasconcelos * Antonio Mauro Rezende * Severino Jefferson Ribeiro da Silva * Kamila Gaudêncio da Silva Sales * Bruna Santos Lima Figueiredo de Sá * Derciliano Lopes da Cruz * Claudio Eduardo Cavalcanti * Armando de Menezes Neto * Caroline Targino Alves da Silva * Renata Pessôa Germano Mendes * Maria Almerice Lopes da Silva * Michelle da Silva Barros * Wheverton Ricardo Correia do Nascimento * Rodrigo Moraes Loyo Arcoverde * Luciane Caroline Albuquerque Bezerra * Sinval Pinto Brandão Filho * Constância Flávia Junqueira Ayres * Gabriel Luz Wallau ## ABSTRACT The emergence of SARS-CoV-2 in the human population has caused a huge pandemic that is still unfolding in many countries around the world. Multiple epicenters of the pandemic have emerged since the first pneumonia cases in Wuhan, first in Italy followed by the USA and Brazil. Up to now, Brazil is the second most affected country, however, genomic sequences of SARS-CoV-2 strains circulating in the country are restricted to some highly impacted states. Although the Pernambuco state, located in the Northeast Region, is the sixth most affected brazilian state and the second considering lethality rate, there is a lack of high quality genomic sequences from the strains circulating in this region. Here, we sequenced 38 strains of SARS-CoV-2 from patients presenting Covid-19 symptoms. Phylogenetic reconstructions revealed that three lineages were circulating in the state and 36 samples belong to B1.1 lineage. We detected two introductions from European countries and five clades, corroborating the community spread of the virus between different municipalities of the state. Finally, we detected that all except one strain showed the D614G spike protein amino acid change that may impact virus infectivity in human cells. Our study brought new light to the spread of SARS-CoV-2 strains in one of the most heavily impacted states of Brazil. ## INTRODUCTION Severe Acute Respiratory Syndrome (SARS) is caused by several viral pathogens that are able to infect the upper and lower respiratory tract of humans leading to compromised oxygenation and multiple organ failure [1]. Influenza viruses, rhinoviruses and coronaviruses (CoV) are commonly associated with SARS, although with a low incidence due to the worldwide influenza immunization campaigns and the mild symptoms normally derived from infection of other respiratory viruses [2]. On the other hand, recently emerged respiratory viruses, which are able to sustain human-to-human transmission, may infect a large proportion of the world human population, leading to devastating pandemics in the absence of a vaccine and/or preventive measures to control its dissemination. For example, the H1N1 virus infected more than 575 thousand people during the 2009 pandemic [3] and two coronaviruses, SARS-CoV-1 that emerged in 2002 and MERS-CoV, in 2012, infected thousands of people. These last two viruses were not well adapted to human-to-human transmission and the outbreaks came into control before massive human spread [4] Viruses from the Coronaviridae family have a positive sense RNA genome ranging from 27 to 32 Kb in length and codify 16 non-structural and 4 structural proteins [5]. These highly diverse viruses are composed by at least five genuses (Alphaletovirus, Alphacoronavirus, Betacoronavirus, Deltacoronavirus and Gamacoronavirus) that are infectious to a myriad of vertebrates [6,7]. Six different coronaviruses from Alpha and Betacoronavirus genuses are known to infect humans, all of them likely derived from wild and domestic animals [5,8]. Studies suggest that four of these viruses are associated with only mild symptoms in humans (HCoV-NL63, HCoV-229E, HCoV-OC43 and HKU1), while SARS-CoV-1 and MERS-CoV show a high morbidity and mortality rate (MASTERS, 2006; SINGHAL, 2020). However, around December 2019 a new pneumonia-like illness was reported in the Wuhan municipality in China (ZHOU et al., 2020). A few weeks later the genome of the etiological agent was sequenced and revealed a new coronavirus named SARS-CoV-2 [9]. Different from the SARS-CoV-1 and MERS-CoV, SARS-CoV-2 is particularly well adapted to sustained human-to-human transmission. It is highly infectious, spreading easily through symptomatic and asymptomatic individuals [10]. Due to widespread human mobility, this virus reached all continents in less than 3 months after the reports from Wuhan ([https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports/](https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports/)). Such a pandemic is still unfolding and more than 15 million people were already infected by the virus, and more than 600 thousand succumbed, highlighting that the human kind is facing the largest and more challenging human pandemic of all times. Since the initial spread of SARS-CoV-2, there were two well recognized epicenters of the pandemic: Northern Italy that reported the peak of the epidemic around late March to mid April 2020 [11,12] followed by the United States of America (USA), which reached the first peak around April 2020, with most cases concentrating in the New York City area [13]. Currently, the USA is the most affected country in the world, accounting for almost five million infected people. Real time epidemiological monitoring are showing that the numbers of new infected patients are decreasing in some early affected states while increasing rapidly in later affected ones giving support to a second wave in the country [14] ([https://coronavirus.jhu.edu/map.html](https://coronavirus.jhu.edu/map.html), [https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports/](https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports/)). On the other hand, SARS-CoV-2 human infections have been increasing at a rapid pace in South America, led mostly by Brazil - the most populous country of the continent with more than 211.8 million people according to the Brazilian Institute of Geography and Statistics (Instituto Brasileiro de Geografia e Estatística, IBGE - [http://www.ibge.gov.br](http://www.ibge.gov.br)). The first SARS-CoV-2 infection in Brazil was reported in February 25th 2020, and now (July 2020), Brazil is the second most affected country in the world, only behind the US in the number of new cases and deaths [15] ([https://coronavirus.jhu.edu/map.html](https://coronavirus.jhu.edu/map.html) - [https://covid.saude.gov.br/](https://covid.saude.gov.br/)). Recent publications based on the sequencing and characterization of SARS-CoV-2 genomes from Minas Gerais and São Paulo (southeast region of Brazil) revealed the introduction of different SARS-CoV-2 lineages from European countries, which were associated with recent travel history of the patients [16-18]. Resende et al 2020 [19], sequenced 95 SARS-CoV-2 genomes from 10 brazilian states and Candido et al 2020 sequenced 427 genomes from 21 Brazilian states and both identified more than 100 independent introduction of SARS-CoV-2 in Brazil [19,20]. Pernambuco is the seventh most populous state in Brazil and the eighth state regarding the number of infected patients with 101.395 confirmed cases, but the fourth in terms of deaths (6.828 deaths) and the second considering lethality (6.7%) (last accessed August 8th 2020 - CIEVS PE - [https://www.cievspe.com/](https://www.cievspe.com/)). So far no genomic epidemiology study [21] was performed to sequence SARS-CoV-2 viral genomes circulating in the state. In this study, we sequenced 38 SARS-CoV-2 genomes from the Pernambuco state, Northeastern Brazil during an early pandemic phase in order to evaluate the emergence and community spread of this virus in one of the most affected states. We found that two introductions are directly linked with European countries and substantial community spread contributed to the dispersion of the virus to smaller cities in the state country-side. ## MATERIAL AND METHODS ### Sampling and Molecular Detection Samples were obtained from nasopharyngeal and oropharyngeal swabs from symptomatic patients from different ambulatory facilities in Pernambuco (Northeast Brazil). These samples are a part of the COVID-19 biorepository from the Aggeu Magalhães Institute (IAM), a Oswaldo Cruz Foundation (FIOCRUZ) unit. All standard operating procedures (SOPs) established by the World Health Organization guidelines were employed (2020c). Rigorous biosafety measures were employed as all samples were manipulated in the BSL-3 facility laboratory. RNA extractions were performed using the robotic platform using the Maxwell® 16 Viral Total Nucleic Acid Purification Kit (Promega, Wisconsin-USA), following the manufacturer’s protocol. The molecular detection was performed using the Kit Molecular Bio Manguinhos SARS-CoV-2 (E/RP): a single-step reaction for detecting the virus envelope gene (E) and the Ribonuclease P housekeeping control gene (RNAse P) [22]. ### Genomic sequencing Total RNA was used for single strand cDNA generation using Platus Transcriber RNase H- cDNA First Strand kit *(Sinapse inc)* following manufacturer’s instruction. cDNA generated was subjected to multiplex PCR reactions using Q5 High Fidelity Hot-Start DNA Polymerase (New England Biolabs) and a set of SARS-CoV-2 specific primers, designed by ([https://www.protocols.io/view/ncov-2019-sequencing-protocol-bbmuik6w](https://www.protocols.io/view/ncov-2019-sequencing-protocol-bbmuik6w)). Cycling conditions were: 98°C at 30 seconds, 98°C at 15 seconds, 62°C at 30 seconds and 65°C at 5 minutes during 35 cycles. Amplified PCR products were purified using AMPure XP Beads (Beckman Coulter) following standard protocol and quantified using the Qubit® dsDNA HS Assay Kits (Invitrogen) following the manufacturer’s instruction. Sequence libraries were prepared with Nextera XT Library Prep Kit (Illumina, San Diego, CA, USA) using 1.5 ng of PCR products following the manufacturer’s instructions. Sequencing was performed in the MiSeq (Illumina) machine using MiSeq Reagent kit V3 of 150 cycles employing a paired-end strategy. ### Genome Assembly and Annotation Low quality raw sequencing reads and primers sequences were removed using Trimmomatic 0.36 with default parameters. Based on the knowledge that epidemic viruses sampled at short time frames does not accumulate a substantial amount of mutations, we performed a reference-based assembly strategy using the first published SARS-CoV-2 genome as reference (NC_045512.2) using Bowtie2 software [23] with default parameters. Following, we generated a .bed file using samtools 1.5 [24] and genomeCoverageBed from bedtools v 2.15.0 [25] keeping only position with > 5x of coverage. Lastly we used vcf-annotate (parameters --filter Qual=20/MinDP=100/SnpGap=20) and vcf-consensus from vcftools v 0.1.13 [26] to generate the final consensus sequences. The N regions and coverage values for each genome were plotted using karyoploteR [27] and the annotation was performed with VAPiD [28], using the NC_045512.2 genome as reference. All in house scripts used in the following sections are deposited on [https://github.com/dezordi/SARS-CoV-2\_tools](https://github.com/dezordi/SARS-CoV-2_tools). ### Evolutionary Analysis Nineteen thousand nine hundred and one public genomes of SARS-CoV-2 were retrieved from GISAID ([https://www.gisaid.org/](https://www.gisaid.org/) - accessed in 23 May 2020), this number represents only SARS-CoV-2 genomes identified in human samples tagged as complete and sequenced with high coverage. Sequences with less than 29400 bp were removed with fasta_cleaner.py resulting in 16,645 genomes. These genomes and the 38 ones sequenced in our study were aligned with the reference genome NC_045512.2 using MAFFT add v7.310 [29] with the --keep-length parameter. The resulted alignment was edited in different ways to generate three datasets: I - The 3’ and 5’ ends were removed according to [30]; II - The gaps and homoplasic sites were masked according to ([https://virological.org/t/issues-with-sars-cov-2-sequencing-data/473](https://virological.org/t/issues-with-sars-cov-2-sequencing-data/473)) with algn_mask.py and III - Only the gaps and nucleotide positions with degenerated IUPAC bases were masked. Each alignment was submitted to a clustering step using cd-hit-est [31] to remove redundancy with cluster_gisaid.py, where sequences with 100% of identity from the same country at the nucleotide level were clustered and only one representative genome were selected for further analyses. Such subsampling strategy was necessary since the sequencing effort has been much more intense in some countries than others. Still, after this first redundancy removal more than 3000 genomes remained for the USA and England. Then a second round of redundancy removal was applied using a lower identity threshold (99.7%) only for some countries (see Supplementary Material 1 - Sheet 1) in order to sample a similar number of genomes available for the majority of the remaining countries (ranging from 153 to 194). It is important to note that we kept all the information about clustered sequences and added them back to the alignment if some of our sequences grouped with high branch support to one of the representative sequences from the clusters in order to reduce any bias to the final phylogenetic reconstruction. Finally, each alignment dataset was visualized and checked with Aliview [32] and these 3 alignments were submitted to the following steps. The SARS-CoV-2 lineages were assigned with pangolin ([https://pangolin.cog-uk.io/](https://pangolin.cog-uk.io/)) and the phylogenetic trees were reconstructed with IQ-TREE [33] using 1000 replicates of the bootstrap ultrafast method [34]. The evolutionary models were selected with the ModelFinder [35] for each dataset. The best model of substitution, log likelihood values for each datasets is available on (Supplementary Material 1 - Sheet 2). iTOL [36] was used to annotate the phylogenetic trees, the annotation files were generated with itol_annot.py and pangolin_annot.py. Tree branches were colored by continent and tip color ranges were colored by pangolin lineage, the root was set between lineages A and B, as proposed by [30]. After each tree reconstruction and annotation, the dataset that generates the tree with best log-likelihood value, and a topology that converges with pangolin annotation was selected and subsets of each clade containing IAM genomes were created to rerun phylogenetic trees for each clade (Supplementary Material 1 - Sheet 2) with the same parameters of the previous tree. After tree reconstructions, genomes from GISAID that are present on highly-supported clades (branch support ≥ 90) with IAM genomes were separated to evaluate the SNPs supporting each cluster. Information about collection date and travel history used in subsequent analysis can be found in Supplementary Material 1 - Sheet 3. For the Bayesian analysis, datasets with SARS-CoV-2 genomes were sampled by week with gisaid_sampler.py and only sequences showing complete collection dates (year-moth-day) were kept. One dataset was constructed comprising the lineages B + B1 from pangolin including all SARS-CoV-2 genomes sequenced in this study. After the ML reconstruction with IQ-TREE the tree was evaluated in Tempest 1.5.3 [37] to check the root-to-tip temporal signal. Outlier sequences were removed before the phylodynamics analysis performed in BEAST 1.10.4 [38]. Bayesian phylodynamic analyses were performed using either a strict clock applying a fixed mean clock rate of 0.8×10-3 as used in other studies [13,39] or an uncorrelated lognormal relaxed clock model. The analyses were performed either using a constant, coalescent exponential growth or Skyline demographic model as tree priors and GTR+F+I as nucleotide substitution model. The molecular clock and demographic model were evaluated based on likelihood using Path Sampling/Stepping-stone sampling. The final analysis was based on three independent runs of 100 million MCM generations sampling in every 10.000 steps using the uncorrelated lognormal relaxed clock and coalescent exponential growth demographic model. The run convergence was evaluated with Tracer 1.7 and all parameters showed ESS values > 200. The generated trees were combined using LogCombiner applying a burn-in of 25% and the Maximum clade credibility tree was obtained using the treeannotator. The time-scaled trees were visualized on Figtree 1.4.4. ### Single Nucleotide Polymorphism analysis Single Nucleotide Polymorphism (SNPs) were evaluated with the snp-sites tool [40], using as input the alignments containing the genomes generated in this study and the NC_045512.2 genome as reference. The positions with SNPs were retrieved with BCFtools [41]. The SNPs positions were crossed with the depth of genome assembly and nucleotide diversity per position accessed with bam-readcount tool ([https://github.com/genome/bam-readcount](https://github.com/genome/bam-readcount)), the outputs from bcftools and bam-readcount were crossed with snp_div.py to access the metrics and nucleotide diversity of SNPs by genomic region (Supplementary Material 4 and 5). Owing to the importance of Spike protein in SARS-CoV-2 biology, Single Amino acid Polymorphisms (SAPs) and regions were deletions were found in other SARS-CoV-2 genomes [42] were carefully analyzed using Aliview and karyoploteR ([http://bioconductor.org/packages/release/bioc/html/karvoploteR.html](http://bioconductor.org/packages/release/bioc/html/karvoploteR.html)). ### Epidemiological data SARS-CoV-2 confirmed and deaths cases from Brazil and Pernambuco state were obtained from the coronavirus website from the brazilian Ministry of Health ([https://covid.saude.gov.br/](https://covid.saude.gov.br/)). We collected the number of cases and deaths per day and per epidemiological week since the first case reported in the state (March 12th, 2020). Plots were performed using the ggplot2 package of the R statistical language ([https://www.r-proiect.org/](https://www.r-proiect.org/)). ## RESULTS AND DISCUSSION ### Epidemiological data from Pernambuco state The thirty-eight samples processed for whole genome amplification and sequencing were obtained from eight municipalities of the Pernambuco state (Figure 1A). The first confirmed SARS-CoV-2 infection in Pernambuco was reported in the second week of March (12th March 2020), in the 11th epidemiological week (Figure 1B), only sixteen days after the first confirmed case in Brazil (25th February 2020) (Figure 1B - top-left panel). In the 21st epidemiological week, the Pernambuco state reached the peak of the pandemics reporting 8298 new cases, averaging 1185 new cases per day (Figure 1 - top-right panel). An increasing number of deaths followed, in which the largest number occurred at the twenty-first epidemiological with 683 deaths and an average of 97.57 deaths per day (Figure 1 - lower panel). The thirty-eight samples selected for SARS-CoV-2 whole genome sequencing were collected at the 15th (32 samples) and 16th (six samples) epidemiological weeks (first and second weeks of April - blue bars in Figure 1B - lower panel) representing the beginning of the SARS-CoV-2 spread at the Pernambuco state. ![Figure 1](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/08/31/2020.08.25.20171595/F1.medium.gif) [Figure 1](http://medrxiv.org/content/early/2020/08/31/2020.08.25.20171595/F1) Figure 1 Sample distribution over the Pernambuco state municipalities and epidemiological curves of SARS-CoV-2 confirmed patients from Brazil and Pernambuco per epidemiological week. A - Map showing the sample distribution by Pernambuco municipality. B - SARS-CoV-2 epidemiological data from Brazil - Top-left panel and Pernambuco state - Top-right and lower panels. Number of confirmed SARS-CoV-2 cases (grey) and deaths (red) per epidemiological week. Number of genomes sequence (blue bars) obtained in this study. Epidemiological week 10 refers to the first week of March 2020 and the first official case reported in Pernambuco on March 12th 2020. ### Genome sequencing and variability Thirty-eight SARS-CoV-2 genomes were obtained with a coverage breadth ranging from 93.92 to 99.92. All genomes were considered of high quality following the GISAID criteria (showing coverage higher than 95%) except for sample IAM138 (Table 1). We also obtained a very high average coverage depth of 2354x with a standard deviation of 868x (Table 1). View this table: [Table 1](http://medrxiv.org/content/early/2020/08/31/2020.08.25.20171595/T1) Table 1 Information of human samples processed for SARS-CoV-2 genome sequencing Coverage depth was mostly uniform with few amplicons showing a much higher depth and few systematic gaps between the 7th and 8th kb position from the ORF1ab, the ORF that generate all non-structural proteins of SARS-CoV-2, and between 20-21kb of the Spike protein ORF (S), the outermost protein in the coronavirus crown responsible for the cell surface receptor binding [43] (Figure 2 and Supplementary Material 2. These systematic gaps are probably related to primer competition during the PCR reaction, which probably lead to a higher abundance of some amplicons in detriment of others. It can be clearly seen in the coverage depth plot for each genome in Supplementary Material 2. On the other hand, at least for the Spike protein deletions, Liu et al 2020 [27] found recurrent deletions in the coding region that may restrict late phase viral replication both in clinical and in vitro isolated viral strains [42]. In order to investigate if these natural occurring deletions could be responsible for such gap patterns seen in the spike protein of the genomes sequenced here we manually checked the alignment of all genomes from Pernambuco and probed our raw sequencing paired-end data. We could not detect any evidence of deletion in the Spike protein region (NSPRRAR) or (QTQTN) (Supplementary Material 3) identified by Liu et al. 2020 suggesting that the gap regions found in the genomes sequenced here were likely a result of primer competition and the low amount of sequenced amplicons corresponding to those regions. ![Figure 2](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/08/31/2020.08.25.20171595/F2.medium.gif) [Figure 2](http://medrxiv.org/content/early/2020/08/31/2020.08.25.20171595/F2) Figure 2 SARS-CoV-2 genomic map of the reference strain Wuhan-1 (MN908947) and the genomes obtained in this study. Thirty-eighth SARS-CoV-2 genomes sequenced using whole genome amplification and a paired-end sequencing strategy (2×75 bp). Red rectangles above the Wuhan-1 reference genome are the overlapping amplicons obtained with the NetworkArtic V3 primer sets ([https://artic.network/resources/ncov/ncov-amplicon-v3.pdf](https://artic.network/resources/ncov/ncov-amplicon-v3.pdf)) while red spaces in the sequenced genomes correspond to non sequenced regions. Regarding amino acid mutations, new evidence recently emerged showing amino acid change in the Spike protein. An amino acid change (D614G) is of particular interest since G614 lineages have consistently replaced well established D614 strains and could confer a fitness advantage of the former strains leading to a higher viral load in infected patients and higher mortality rate [44-46] although it is not clear yet if this mutation has any impact on virus transmissibility and covid-19 progression and outcome [47]. Besides, this amino acid change is almost always accompanied by three other mutations: a non coding nucleotide change (C-to-T) in the 5'UTR, a synonymous nucleotide mutation (C-to-T) at position 3,037 and a non-synonymous mutation (C-to-T) at position 14,408 that generated an amino acid change in the RNA-dependent RNA polymerase (RdRp P323L). All genomes sequenced from the Pernambuco state showed the G614 amino acid change and the three accompanying nucleotide mutations described above, except the IAM138 strain which has all four positions identical to Wuhan-1 reference sequence (Figure 3. Supplementary Material 4). Recent experimental evidence, in different human cell lines, confirmed that C614G is a key amino acid change that increases SARS-CoV-2 infectivity and when associated with I472V may increase the resistance to neutralizing antibodies [44]. Interestingly, two hundred and two SARS-CoV-2 genomes available from Brazil have been screened for such mutations (up to 30th July 2020) and 190 of those contain the D614G mutation ([https://cov.lanl.gov/apps/covid-19/map/](https://cov.lanl.gov/apps/covid-19/map/)). Such data associated with the results from our study suggests that G614 also replaced D614 strains in a similar way as it happened in other countries or it is a result of a founder effect based on the importation of primarily G614 variants to Brazil. However, additional genomic data will be necessary to evaluate those two hypotheses. The full set of nucleotide changes found in the genomes sequenced in this study can be found in Supplementary Material 1 - Sheet 4 and 5) ![Figure 3](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/08/31/2020.08.25.20171595/F3.medium.gif) [Figure 3](http://medrxiv.org/content/early/2020/08/31/2020.08.25.20171595/F3) Figure 3 Spike protein amino acid change D614G found in the genomes sequenced in this study. Codon alignment of SARS-CoV-2 Wuhan-1 Spike coding region showing the reference D amino acid in the IAM138 genome and the G amino acid in other four representative genomes sequenced in this study. x axis represents Wuhan-1 genomic coordinates and the y axis the coverage depth at each position. ### Phylogenetic clustering of Pernambuco SARS-CoV-2 strains Comprehensive analysis with thousands of genomes have identified two main SARS-CoV-2 lineages called A and B, which emerged during the beginning of the pandemic in the Hubei province - China [30]. Besides, a subdivision of these lineages was also proposed with 5 and 9 sub lineages belonging to A and B lineages, respectively. Both A and B major lineages spread worldwide, but ongoing studies have been showing that the B lineage spread and replaced the lineage A in several different countries [45]. One clear example of that was the B1 lineage, that was responsible for the Italian outbreak [48,49], and it later spread to other European countries and several countries in the Americas including Brazil [20]. Up to now, three studies investigated the SARS-CoV-2 lineages imported to Brazil and its further spread into the country. The largest study sequenced 427 genomes and analyzed 490 genomes from 21 brazilian states detecting that only 5 strains belong to the lineage A while the remaining 485 belong to the B lineage [20]. These authors estimated that more than 100 international introductions of the virus occurred in Brazil. Resende et al 2020 sequenced 95 genomes from 10 Brazilian states characterizing six SARS-CoV-2 lineages (A.2, B.1, B.1.1, B.2.1, B.2.2 and B.6). In agreement with the above, the majority of the strains were classified as clade B.1 (95%) and 92% of those belong to the sub-clade B.1.1 [19]. Lastly, Xavier et al 2020 sequenced the genome of 40 strains of SARS-CoV-2 from Minas Gerais state where 85% of these belong to the B lineage in which most of those fall within B1.1 and only one genome belong to the A lineage [18] In order to evaluate which lineages belong the genomes sequenced in this study, we performed the lineages and sublineages assignment with Pangolin following Rambout et al 2020 dynamic nomenclature for SARS-CoV-2 and plotted it onto a ML phylogenetic tree reconstructed with 5297 SARS-CoV-2 genomes (Figure 4). All thirty-eight viral genomes obtained from Pernambuco belong to the B lineage, one strain (IAM138) was basal to all B lineages but with low branch support (Figure 4A). This same genome presented the lowest genome coverage breadth (89%) (Table 1) and such clustering could be related to the lower amount of sequenced synapomorphic single nucleotide polymorphism (SNPs). However, we found several nucleotide and amino acid polymorphisms that are characteristic of early diverging SARS-CoV-2 lineages such as the reference Wuhan-1 genome sequencing (see **Genome sequencing and variability** section above and **Figure 3**) suggesting that this lineage is likely a representative of the early divergent SARS-CoV-2 lineages emerged from China. A second strain (IAM19) was assigned to the B.1 lineage and grouped with high branch support with two SARS-CoV-2 genomes from France (Figure 4 B). The remaining thirty-six strains grouped in the B1.1 lineage (Table 1 and Figure 4 C), but 10 were placed directly at a polytomic branch showing no specific clustering with other samples. The proofreading correction of the RNA polymerase of SARS-CoV-2 genome results in a low mutation rate of the SARS-CoV-2 compared to other RNA viruses [50] and such low variability of densely sequenced genomes from a short period of time likely hindered the positioning of those samples since there is a low number of informative SNPs available for a confident phylogenetic clustering from the early pandemic phase. In addition, IAM311 sample clustered with a genome from Belgium with posterior probability of 0.83 (**Figure 4C**). On the other hand, at least twenty-three strains grouped between them in clades containing only Pernambuco samples with a high posterior probability branch support (over 70). It clearly reveals virus transmission chains between neighborhoods and municipalities of the Pernambuco state corroborating the rapid community transmission within the state (**Figure 4 C**). In addition, two samples (IAM67 and IAM356) grouped with high branch support with a sample from Alagoas, a southern neighbouring state (**Figure 4 C**) and IAM307 grouped with several genomes from South America having as a sister group of SARS-CoV-2 genomes from European countries, both highly-supported (**Figure 4 C**). Time-resolved bayesian tree reconstructions confirmed all clades (Figure 4) and the polytomic positioning of several of the genomes investigated here as shown in the ML tree including the basal positioning of IAM138 samples although with no branch support (Figure 5). Besides, the estimated most recent common ancestors (tMRCA) of specific clades are in agreement to the emergence of the virus around December 2019. The IAM19 genome clustered with two samples from France with an estimated the more recent common ancestor (tMRCA) between late February and mid March (Table 2) suggesting that such viral strain correspond to one of the first importation cases to the state in agreement with the first official case reported (March 12). Moreover, all other sequenced strains belonging to the B1.1 lineage that were clustered in highly supported clades showed overlapping tMRCA between mid March and Mid April (**Table 2**) supporting that these lineages that emerged slightly later were the most successful ones spreading in the state. (**Figure 5**). Overall, our results showed two new international introduction events of the SARS-CoV-2 from Europe, spread of the virus between brazilian states and the community transmission in the Pernambuco state. Besides, the lineage assignment further supports that the B lineage is prevailing through community spread in Brazil in line with other SARS-CoV-2 studies. ![Figure 4](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/08/31/2020.08.25.20171595/F4.medium.gif) [Figure 4](http://medrxiv.org/content/early/2020/08/31/2020.08.25.20171595/F4) Figure 4 Maximum likelihood phylogenetic tree using 5259 genomes available from GISAID plus 38 genomes generated in this study (red stars) rooted between group A and B. Pangolin lineage assignments are denoted by the tip colors: reddish are A lineages and bluish are B lineages. Branch colors follows continent of samples origin: Red - Asia; Orange - Africa; Yellow - Australia and Zealandia; Green- Europe; Pink - North America; Blue - Central America; Purple - South America. Full interactive tree can be found at [https://itol.embl.de/tree/45462241443551591453841](https://itol.embl.de/tree/45462241443551591453841). ![Figure 5](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/08/31/2020.08.25.20171595/F5.medium.gif) [Figure 5](http://medrxiv.org/content/early/2020/08/31/2020.08.25.20171595/F5) Figure 5 Time-resolved tree of the B lineage reconstructed with a Bayesian framework. Numbers above the branches are specific clades showing posterior probability higher than 80 depicted in Table 2 and horizontal bar represents the HPD95% credible interval of the estimated tMRCA. Branch colors follow continent order as in Figure 4, but only highly supported clades having the sequences obtained in this study were colored. Red stars show the sequenced IAM genomes. View this table: [Table 2.](http://medrxiv.org/content/early/2020/08/31/2020.08.25.20171595/T2) Table 2. Predicted origin intervals of clades from phylodynamic analysis. ## CONCLUSION The genomic analysis of 38 SARS-CoV-2 genomes from the beginning of the epidemic at Pernambuco state - Brazil revealed at least two independent international importation events of SARS-CoV-2 strains from Europe occurring from late February to mid March and a more recent one from late March to mid April which is in line with studies focusing on other Brazilian states. Moreover, we also found evidence of comunitary spread of SARS-CoV-2 among Recife city neighborhoods and other municipalities. Interestingly, all genomes belong to the B lineage, with a higher prevalence of B1.1 lineage, which was shown to be the most prevalent currently circulating in Brazil. All except one genome have the G614 amino acid change which has been suggested to increase the viral fitness allowing to reach a higher viral load in an experimental setting and likely in human patients. G614 strains have replaced D614 strains in several regions around the globe where the second was first established. The high prevalence of G614 strains in our dataset suggests two possible explanations: I - most strains that entered Brazil belonged to G614 strains or D614 and G614 were equally imported into Brazil but G614 became prevalent as it occured in European and North American countries. Continued genomic-based surveillance of SARS-CoV-2 in a much broader set of samples is needed in order to access the viral mutational spectra and provide data to tease apart those hypotheses that will likely impact the control measures set to curb the epidemic. ## Data Availability All data can be accessed in the GISAID databank (https://www.gisaid.org/) ## DATA AVAILABILITY All genomes generated in this study are deposited on GISAID under the accessions EPI\_ISL\_500460-500486 and EPI\_ISL\_500865-500875. ## AUTHOR CONTRIBUTIONS LCAB, SPBF, CFJA and GLW conceived and planned the study. RHS. PMSL, BASA, DGAC, SCGA, JJFM, MJCO, LVBL and PROH conducted swab collections and sample processing. MHSP, DRDG, CD, MFB, FZD, LCM, LK, EH, AFS, SJRS, KGSS, BSLFS, DLC, CEC, AMN, CTAS, RPGM, MALS, MSB, WRCN, RMLA performed laboratory experiments. MHSP, DRDG, CD, MFB, FZD, AFS, LCM, LRSV, AMR, CFJA and GLW performed data analysis. MHSP, FZD and GLW wrote the manuscript. All authors reviewed the final manuscript. ## CONFLICTS OF INTEREST The authors declare no conflict of interest. **KARYOPLOTER**: karyoploteR. [http://bioconductor.org/packaaes/release/bioc/html/karvoploteR.html](http://bioconductor.org/packaaes/release/bioc/html/karvoploteR.html) (Accessed on 03 June 2020). **SARS-COV-TOOLS**: SARS-CoV-2 tools. [https://github.com/dezordi/SARS-CoV-2\_tools](https://github.com/dezordi/SARS-CoV-2_tools) (Accessed on 03 June 2020). **VIROLOGICAL**: Virological.org. [https://virological.org/t/issues-with-sars-cov-2-sequencing-data/473](https://virological.org/t/issues-with-sars-cov-2-sequencing-data/473) (Accessed on 03 June 2020). **BCF-TOOLS**: BCFtools. [https://github.com/samtools/bcftools](https://github.com/samtools/bcftools) (Accessed on 03 June 2020). **BAM-READCOUNT**: bam-readcount. [https://github.com/genome/bam-readcount](https://github.com/genome/bam-readcount) (Accessed on 03 June 2020). ## ACKNOWLEDGMENT We thank the many brave researchers around the globe for their effort to generate and make readily available the SARS-CoV-2 genomic data in GISAID and other databases. We also thank the LACEN-PE whole team for providing the samples to sequence the SARS-CoV-2 genomes, the Technological Platform Core and the Bioinformatic Core of the Aggeu Magalhaes Institute for the support with their research facilities. This project was supported by the National Council for Scientific and Technological Development by the productivity research fellowship level 2 for Wallau GL (303902/2019-1). * Received August 25, 2020. * Revision received August 25, 2020. * Accepted August 31, 2020. * © 2020, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at [http://creativecommons.org/licenses/by-nc-nd/4.0/](http://creativecommons.org/licenses/by-nc-nd/4.0/) ## References 1. 1.Yang, X.; Yu, Y.; Xu, J.; Shu, H.; Xia, J.; Liu, H.; Wu, Y.; Zhang, L.; Yu, Z.; Fang, M.; Yu, T.; Wang, Y.; Pan, S.; Zou, X.; Yuan, S.; Shang, Y. Clinical course and outcomes of critically ill patients with SARS-CoV-2 pneumonia in Wuhan, China: a single-centered, retrospective, observational study. Lancet Respir. Med. 2020, 8, 475–481. 2. 2.Hodinka, R. L. Respiratory RNA Viruses. Microbiol. Spectr. 2016, 4, doi:10.1128/microbiolspec.DMIH2-0028-2016. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1128/microbiolspec.DMIH2-0028-2016&link_type=DOI) 3. 3.Smith, G. J. D.; Vijaykrishna, D.; Bahl, J.; Lycett, S. J.; Worobey, M.; Pybus, O. G.; Ma, S. K.; Cheung, C. L.; Raghwani, J.; Bhatt, S.; Peiris, J. S. M.; Guan, Y.; Rambaut, A. Origins and evolutionary genomics of the 2009 swine-origin H1N1 influenza A epidemic. Nature 2009, 459, 1122–1125. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nature08182&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19516283&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F08%2F31%2F2020.08.25.20171595.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000267636700042&link_type=ISI) 4. 4.Rabi, F. A.; Al Zoubi, M. S.; Kasasbeh, G. A.; Salameh, D. M.; Al-Nasser, A. D. SARS-CoV-2 and Coronavirus Disease 2019: What We Know So Far. Pathogens 2020, 9, 231. 5. 5.Cui, J.; Li, F.; Shi, Z.-L. Origin and evolution of pathogenic coronaviruses. Nat. Rev. Microbiol. 2019, 17, 181–192, doi:10.1038/s41579-018-0118-9. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41579-018-0118-9&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30531947&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F08%2F31%2F2020.08.25.20171595.atom) 6. 6.Siddell, S. G.; Walker, P. J.; Lefkowitz, E. J.; Mushegian, A. R.; Adams, M. J.; Dutilh, B. E.; Gorbalenya, A. E.; Harrach, B.; Harrison, R. L.; Junglen, S.; Knowles, N. J.; Kropinski, A. M.; Krupovic, M.; Kuhn, J. H.; Nibert, M.; Rubino, L.; Sabanadzovic, S.; Sanfaçon, H.; Simmonds, P.; Varsani, A.; Zerbini, F. M.; Davison, A. J. Additional changes to taxonomy ratified in a special vote by the International Committee on Taxonomy of Viruses (October 2018). Arch. Virol. 2019, 164, 943–946, doi:10.1007/s00705-018-04136-2. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s00705-018-04136-2&link_type=DOI) 7. 7.Gorbalenya, A. E.; Baker, S. C.; Baric, R. S.; de Groot, R. J.; Drosten, C.; Gulyaeva, A. A.; Haagmans, B. L.; Lauber, C.; Leontovich, A. M.; Neuman, B. W.; Penzar, D.; Perlman, S.; Poon, L. L..; Samborskiy, D. V.; Sidorov, I. A.; Sola, I.; Ziebuhr, J. The species Severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2.: classifying 2019-nCoV and naming it SARS-CoV-2. Nat. Microbiol. 2020, 5, 536–544, doi:10.1038/s41564-020-0695-z. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41564-020-0695-z&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32123347&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F08%2F31%2F2020.08.25.20171595.atom) 8. 8.Cabeça, T. K.; Carraro, E.; Watanabe, A.; Granato, C.; Bellei, N. Infections with human coronaviruses NL63 and OC43 among hospitalised and outpatient individuals in São Paulo, Brazil. Memorias do Inst. Oswaldo Cruz 2012, 107, 693–694. 9. 9.Zhou, P.; Yang, X.-L.; Wang, X.-G.; Hu, B.; Zhang, L.; Zhang, W.; Si, H.-R.; Zhu, Y.; Li, B.; Huang, C.-L.; Chen, H.-D.; Chen, J.; Luo, Y.; Guo, H.; Jiang, R.-D.; Liu, M.-Q.; Chen, Y.; Shen, X.-R.; Wang, X.; Zheng, X.-S.; Zhao, K.; Chen, Q.-J.; Deng, F.; Liu, L.-L.; Yan, B.; Zhan, F.-X.; Wang, Y.-Y.; Xiao, G.-F.; Shi, Z.-L. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature 2020, 579, 270–273. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-020-2012-7&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F08%2F31%2F2020.08.25.20171595.atom) 10. 10.He, X.; Lau, E. H. Y.; Wu, P.; Deng, X.; Wang, J.; Hao, X.; Lau, Y. C.; Wong, J. Y.; Guan, Y.; Tan, X.; Mo, X.; Chen, Y.; Liao, B.; Chen, W.; Hu, F.; Zhang, Q.; Zhong, M.; Wu, Y.; Zhao, L.; Zhang, F.; Cowling, B. J.; Li, F.; Leung, G. M. Temporal dynamics in viral shedding and transmissibility of COVID-19. Nat. Med. 2020, 26, 672–675. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.7326/M20-3012&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F08%2F31%2F2020.08.25.20171595.atom) 11. 11.Tuite, A. R.; Ng, V.; Rees, E.; Fisman, D. Estimation of COVID-19 outbreak size in Italy. 12. 12.Licastro, D.; Rajasekharan, S.; Dal Monego, S.; Segat, L.; D’Agaro, P.; Marcello, A. Isolation and Full-Length Genome Characterization of SARS-CoV-2 from COVID-19 Cases in Northern Italy. J. Virol. 2020, 94 . 13. 13.Gonzalez-Reiche, A. S.; Hernandez, M. M.; Sullivan, M. J.; Ciferri, B.; Alshammary, H.; Obla, A.; Fabre, S.; Kleiner, G.; Polanco, J.; Khan, Z.; Alburquerque, B.; van de Guchte, A.; Dutta, J.; Francoeur, N.; Melo, B. S.; Oussenko, I.; Deikus, G.; Soto, J.; Sridhar, S. H.; Wang, Y.-C.; Twyman, K.; Kasarskis, A.; Altman, D. R.; Smith, M.; Sebra, R.; Aberg, J.; Krammer, F.; García-Sastre, A.; Luksza, M.; Patel, G.; Paniz-Mondolfi, A.; Gitman, M.; Sordillo, E. M.; Simon, V.; van Bakel, H. Introductions and early spread of SARS-CoV-2 in the New York City area. Science 2020, eabc1917. 14. 14.Dong, E.; Du, H.; Gardner, L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect. Dis. 2020, 20, 533–534. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S1473-3099(20)30120-1&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F08%2F31%2F2020.08.25.20171595.atom) 15. 15.de Souza, W. M.; Buss, L. F.; da Silva Candido, D.; Carrera, J.-P.; Li, S.; Zarebski, A. E.; Pereira, R. H. M.; Prete, C. A.; de Souza-Santos, A. A.; Parag, K. V.; Belotti, M. C. T. D.; Vincenti-Gonzalez, M. F.; Messina, J.; da Silva Sales, F. C.; dos Santos Andrade, P.; Nascimento, V. H.; Ghilardi, F.; Abade, L.; Gutierrez, B.; Kraemer, M. U. G.; Braga, C. K. V.; Aguiar, R. S.; Alexander, N.; Mayaud, P.; Brady, O. J.; Marcilio, I.; Gouveia, N.; Li, G.; Tami, A.; de Oliveira, S. B.; Porto, V. B. G.; Ganem, F.; de Almeida, W. A. F.; Fantinato, F. F. S. T.; Macário, E. M.; de Oliveira, W. K.; Nogueira, M. L.; Pybus, O. G.; Wu, C.-H.; Croda, J.; Sabino, E. C.; Faria, N. R. Epidemiological and clinical characteristics of the COVID-19 epidemic in Brazil. Nat. Hum. Behav. 2020. 16. 16.de Souza, W. M.; Buss, L. F.; da Silva Candido, D.; Carrera, J.-P.; Li, S.; Zarebski, A. E.; Pereira, R. H. M.; Prete, C. A.; de Souza-Santos, A. A.; Parag, K. V.; Belotti, M. C. T. D.; Vincenti-Gonzalez, M. F.; Messina, J.; da Silva Sales, F. C.; dos Santos Andrade, P.; Nascimento, V. H.; Ghilardi, F.; Abade, L.; Gutierrez, B.; Kraemer, M. U. G.; Braga, C. K. V.; Aguiar, R. S.; Alexander, N.; Mayaud, P.; Brady, O. J.; Marcilio, I.; Gouveia, N.; Li, G.; Tami, A.; de Oliveira, S. B.; Porto, V. B. G.; Ganem, F.; de Almeida, W. A. F.; Fantinato, F. F. S. T.; Macário, E. M.; de Oliveira, W. K.; Nogueira, M. L.; Pybus, O. G.; Wu, C.-H.; Croda, J.; Sabino, E. C.; Faria, N. R. Epidemiological and clinical characteristics of the COVID-19 epidemic in Brazil. Nat. Hum. Behav. 2020. 17. 17.Nascimento, V. A. do; Corado, A. L. G.; Nascimento, F. O. do; Costa, Á. K. A. da; Duarte, D. C. G.; Jesus, M. S. de; Luz, S. L. B.; Gonçalves, L. M. F.; Costa, C. F. da; Delatorre, E.; Naveca, F. G. Genomic and phylogenetic characterization of an imported case of SARS-CoV-2 in Amazonas State, Brazil. Memorias do Inst. Oswaldo Cruz 2018. 18. 18.Xavier, J.; Giovanetti, M.; Adelino, T.; Fonseca, V.; da Costa, A. V. B.; Ribeiro, A. A.; Felicio, K. N.; Duarte, C. G.; Silva, M. V. F.; Salgado, Á.; Lima, M. T.; de Jesus, R.; Fabri, A.; Zoboli, C. F. S.; Santos, T. G. S.; Iani, F.; Ciccozzi, M.; de Filippis, A. M. B.; de Siqueira, M. A. M. T.; de Abreu, A. L.; de Azevedo, V.; Ramalho, D. B.; de Albuquerque, C. F. C.; de Oliveira, T.; Holmes, E. C.; Lourenço, J.; Alcantara, L. C. J.; Oliveira, M. A. A. The ongoing COVID-19 epidemic in Minas Gerais, Brazil: insights from epidemiological data and SARS-CoV-2 whole genome sequencing. Emerg. Microbes & Infect. 2020, 1–42. 19. 19.Resende, P. C.; Delatorre, E.; Gräf, T.; Mir, D.; do Couto Motta, F.; Appolinario, L. R.; da Paixão, A. C. D.; Ogrzewalska, M.; Caetano, B.; dos Santos, M. C.; de Almeida Ferreira, J.; Junior, E. C. S.; da Silva, S. P.; Fernandes, S. B.; Vianna, L. A.; da Costa Souza, L.; Ferro, J. F. G.; Nardy, V. B.; Croda, J.; Oliveira, W. K.; Abreu, A.; Bello, G.; Siqueira, M. M. Genomic surveillance of SARS-CoV-2 reveals community transmission of a major lineage during the early pandemic phase in Brazil. *bioRxiv* 2020. 20. 20.Candido, D. S.; Claro, I. M.; de Jesus, J. G.; Souza, W. M.; Moreira, F. R. R.; Dellicour, S.; Mellan, T. A.; du Plessis, L.; Pereira, R. H. M.; Sales, F. C. S.; Manuli, E. R.; Thézé, J.; Almeida, L.; Menezes, M. T.; Voloch, C. M.; Fumagalli, M. J.; Coletti, T. M.; da Silva, C. A. M.; Ramundo, M. S.; Amorim, M. R.; Hoeltgebaum, H. H.; Mishra, S.; Gill, M. S.; Carvalho, L. M.; Buss, L. F.; Prete, C. A.; Ashworth, J.; Nakaya, H. I.; Peixoto, P. S.; Brady, O. J.; Nicholls, S. M.; Tanuri, A.; Rossi, Á. D.; Braga, C. K..; Gerber, A. L.; de C. Guimarães, A. P.; Gaburo, N.; Alencar, C. S.; Ferreira, A. C..; Lima, C. X.; Levi, J. E.; Granato, C.; Ferreira, G. M.; Francisco, R. S.; Granja, F.; Garcia, M. T.; Moretti, M. L.; Perroud, M. W.; Castiñeiras, T. M. P. P.; Lazari, C. S.; Hill, S. C.; de Souza Santos, A. A.; Simeoni, C. L.; Forato, J.; Sposito, A. C.; Schreiber, A. Z.; Santos, M. N. N.; de Sá, C. Z.; Souza, R. P.; Resende-Moreira, L. C.; Teixeira, M. M.; Hubner, J.; Leme, P. A. F.; Moreira, R. G.; Nogueira, M. L.; Ferguson, N. M.; Costa, S. F.; Proenca-Modena, J. L.; Vasconcelos, A. T. R.; Bhatt, S.; Lemey, P.; Wu, C.-H.; Rambaut, A.; Loman, N. J.; Aguiar, R. S.; Pybus, O. G.; Sabino, E. C.; Faria, N. R. Evolution and epidemic spread of SARS-CoV-2 in Brazil. Science 2020, eabd2161. 21. 21.Grubaugh, N. D.; Ladner, J. T.; Lemey, P.; Pybus, O. G.; Rambaut, A.; Holmes, E. C.; Andersen, K. G. Tracking virus outbreaks in the twenty-first century. Nat. Microbiol. 2018, 4, 10–19. 22. 22.Corman, V. M.; Landt, O.; Kaiser, M.; Molenkamp, R.; Meijer, A.; Chu, D. K.; Bleicker, T.; Brünink, S.; Schneider, J.; Schmidt, M. L.; Mulders, D. G.; Haagmans, B. L.; van der Veer, B.; van den Brink, S.; Wijsman, L.; Goderski, G.; Romette, J.-L.; Ellis, J.; Zambon, M.; Peiris, M.; Goossens, H.; Reusken, C.; Koopmans, M. P.; Drosten, C. Detection of 2019 novel coronavirus (2019-nCoV) by real-time RT-PCR. Euro surveillance: Bull. Eur. sur les Mal. Transm. = Eur. Commun. Dis. Bull. 2020, 25 . 23. 23.Langmead, B.; Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 2012, 9, 357–359. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nmeth.1923&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22388286&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F08%2F31%2F2020.08.25.20171595.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000302218500017&link_type=ISI) 24. 24.Li, H.; Handsaker, B.; Wysoker, A.; Fennell, T.; Ruan, J.; Homer, N.; Marth, G.; Abecasis, G.; Durbin, R. The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009, 25, 2078–2079. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btp352&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19505943&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F08%2F31%2F2020.08.25.20171595.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000268808600014&link_type=ISI) 25. 25.Quinlan, A. R. BEDTools: The Swiss-Army Tool for Genome Feature Analysis. Curr. Protoc. Bioinforma. 2014, 47, 11.12.1–11.1234. 26. 26.Danecek, P.; Auton, A.; Abecasis, G.; Albers, C. A.; Banks, E.; DePristo, M. A.; Handsaker, R. E.; Lunter, G.; Marth, G. T.; Sherry, S. T.; McVean, G.; Durbin, R.; 1000 Genomes Project Analysis Group The variant call format and VCFtools. Bioinforma. 2011, 27, 2156–2158. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btr330&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21653522&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F08%2F31%2F2020.08.25.20171595.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000292778700023&link_type=ISI) 27. 27.Gel, B.; Serra, E. karyoploteR: an R/Bioconductor package to plot customizable genomes displaying arbitrary data. Bioinforma. 2017, 33, 3088–3090. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btx346&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28575171&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F08%2F31%2F2020.08.25.20171595.atom) 28. 28.Shean, R. C.; Makhsous, N.; Stoddard, G. D.; Lin, M. J.; Greninger, A. L. VAPiD: a lightweight cross-platform viral annotation pipeline and identification tool to facilitate virus genome submissions to NCBI GenBank. BMC Bioinforma. 2019, 20, 48. 29. 29.Katoh, K.; Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 2013, 30, 772–780. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/molbev/mst010&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23329690&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F08%2F31%2F2020.08.25.20171595.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000317002300004&link_type=ISI) 30. 30.Rambaut, A.; Holmes, E. C.; O’Toole, Á.; Hill, V.; McCrone, J. T.; Ruis, C.; du Plessis, L.; Pybus, O. G. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nat. Microbiol. 2020. 31. 31.Li, W.; Godzik, A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinforma. 2006, 22, 1658–1659. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btl158&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=16731699&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F08%2F31%2F2020.08.25.20171595.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000238905700017&link_type=ISI) 32. 32.Larsson, A. AliView: a fast and lightweight alignment viewer and editor for large datasets. Bioinforma. 2014, 30, 3276–3278. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btu531&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25095880&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F08%2F31%2F2020.08.25.20171595.atom) 33. 33.Nguyen, L.-T.; Schmidt, H. A.; von Haeseler, A.; Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 2015, 32, 268–274, doi:doi:10.1093/nar/gkw256. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/molbev/msu300&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25371430&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F08%2F31%2F2020.08.25.20171595.atom) 34. 34.Hoang, D. T.; Chernomor, O.; von Haeseler, A.; Minh, B. Q.; Vinh, L. S. UFBoot2: Improving the Ultrafast Bootstrap Approximation. Mol. Biol. Evol. 2018, 35, 518–522. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/molbev/msx281&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=29077904&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F08%2F31%2F2020.08.25.20171595.atom) 35. 35.Kalyaanamoorthy, S.; Minh, B. Q.; Wong, T. K. F.; von Haeseler, A.; Jermiin, L. S. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat. methods 2017, 14, 587–589. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nmeth.4285&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28481363&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F08%2F31%2F2020.08.25.20171595.atom) 36. 36.Letunic, I.; Bork, P. Interactive Tree Of Life (iTOL) v4: recent updates and new developments. Nucleic acids Res. 2019, 47, W256–W259. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=org/10.1093/nar/gkz239&link_type=DOI) 37. 37.Rambaut, A.; Lam, T. T.; Carvalho, L. M.; Pybus, O. G. Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen). Virus Evol. 2016, 2, vew007. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/ve/vew007&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27774300&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F08%2F31%2F2020.08.25.20171595.atom) 38. 38.Suchard, M. A.; Lemey, P.; Baele, G.; Ayres, D. L.; Drummond, A. J.; Rambaut, A. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol. 2018, 4, vey016, doi:10.1093/bioinformatics/btx088. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/ve/vey016&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=29942656&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F08%2F31%2F2020.08.25.20171595.atom) 39. 39.Duchene, S.; Featherstone, L.; Haritopoulou-Sinanidou, M.; Rambaut, A.; Lemey, P.; Baele, G. Temporal signal and the phylodynamic threshold of SARS-CoV-2. *bioRxiv* 2020. 40. 40.Page, A. J.; Taylor, B.; Delaney, A. J.; Soares, J.; Seemann, T.; Keane, J. A.; Harris, S. R. SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments. Microb. Genomic- 2016, 2 . 41. 41.Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinforma. 2011, 27, 2987–2993, doi:10.1093/bioinformatics/btr509. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btr509&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21903627&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F08%2F31%2F2020.08.25.20171595.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000296099300009&link_type=ISI) 42. 42.Liu, Z.; Zheng, H.; Lin, H.; Li, M.; Yuan, R.; Peng, J.; Xiong, Q.; Sun, J.; Li, B.; Wu, J.; Yi, L.; Peng, X.; Zhang, H.; Zhang, W.; Hulswit, R. J..; Loman, N.; Rambaut, A.; Ke, C.; Bowden, T. A.; Pybus, O. G.; Lu, J. Identification of common deletions in the spike protein of SARS-CoV-2. J. Virol. 2020. 43. 43.Fung, T. S.; Liu, D. X. Human Coronavirus: Host-Pathogen Interaction. Annu. Rev. Microbiol. 2019, 73, 529–557. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1146/annurev-micro-020518-115759&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=31226023&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F08%2F31%2F2020.08.25.20171595.atom) 44. 44.Li, Q.; Wu, J.; Nie, J.; Zhang, L.; Hao, H.; Liu, S.; Zhao, C.; Zhang, Q.; Liu, H.; Nie, L.; Qin, H.; Wang, M.; Lu, Q.; Li, X.; Sun, Q.; Liu, J.; Zhang, L.; Li, X.; Huang, W.; Wang, Y. The impact of mutations in SARS-CoV-2 spike on viral infectivity and antigenicity. Cell 2020. 45. 45.Korber, B.; Fischer, W. M.; Gnanakaran, S.; Yoon, H.; Theiler, J.; Abfalterer, W.; Hengartner, N.; Giorgi, E. E.; Bhattacharya, T.; Foley, B.; Hastie, K. M.; Parker, M. D.; Partridge, D. G.; Evans, C. M.; Freeman, T. M.; de Silva, T. I.; McDanal, C.; Perez, L. G.; Tang, H.; Moon-Walker, A.; Whelan, S. P.; LaBranche, C. C.; Saphire, E. O.; Montefiori, D. C.; Angyal, A.; Brown, R. L.; Carrilero, L.; Green, L. R.; Groves, D. C.; Johnson, K. J.; Keeley, A. J.; Lindsey, B. B.; Parsons, P. J.; Raza, M.; Rowland-Jones, S.; Smith, N.; Tucker, R. M.; Wang, D.; Wyles, M. D. Tracking changes in SARS-CoV-2 Spike: evidence that D614G increases infectivity of the COVID-19 virus. Cell 2020. 46. 46.Becerra-Flores, M.; Cardozo, T. SARS-CoV-2 viral spike G614 mutation exhibits higher case fatality rate. Int. J. Clin. Pract. 2020, e13525. 47. 47.Grubaugh, N. D.; Hanage, W. P.; Rasmussen, A. L. Making sense of mutation: what D614G means for the COVID-19 pandemic remains unclear. Cell 2020. 48. 48.Stefanelli, P.; Faggioni, G.; Lo Presti, A.; Fiore, S.; Marchi, A.; Benedetti, E.; Fabiani, C.; Anselmo, A.; Ciammaruconi, A.; Fortunato, A.; De Santis, R.; Fillo, S.; Capobianchi, M. R.; Gismondo, M. R.; Ciervo, A.; Rezza, G.; Castrucci, M. R.; Lista, F.; On Behalf Of Iss Covid-Study Group Whole genome and phylogenetic analysis of two SARS-CoV-2 strains isolated in Italy in January and February 2020: additional clues on multiple introductions and further circulation in Europe. Euro surveillance: Bull. Eur. sur les Mal. Transm. = Eur. Commun. Dis. Bull. 2020, 25 . 49. 49.Giovanetti, M.; Angeletti, S.; Benvenuto, D.; Ciccozzi, M. A doubt of multiple introduction of SARS-CoV-2 in Italy: A preliminary overview. J. Med. Virol. 2020. 50. 50.Denison, M. R.; Graham, R. L.; Donaldson, E. F.; Eckerle, L. D.; Baric, R. S. Coronaviruses: an RNA proofreading machine regulates replication fidelity and diversity. RNA Biol. 2011, 8, 270–279. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.4161/rna.8.2.15013&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21593585&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F08%2F31%2F2020.08.25.20171595.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000289934300013&link_type=ISI)