SARS-CoV-2 outbreak in a tri-national urban area is dominated by a B.1 lineage variant linked to mass gathering events ====================================================================================================================== * Madlen Stange * Alfredo Mari * Tim Roloff * Helena MB Seth-Smith * Michael Schweitzer * Myrta Brunner * Karoline Leuzinger * Kirstine K. Søgaard * Alexander Gensch * Sarah Tschudin-Sutter * Simon Fuchs * Julia Bielicki * Hans Pargger * Martin Siegemund * Christian H Nickel * Roland Bingisser * Michael Osthoff * Stefano Bassetti * Rita Schneider-Sliwa * Manuel Battegay * Hans H Hirsch * Adrian Egli ## Abstract **Background** The first case of SARS-CoV-2 in Basel, Switzerland was detected on February 26th 2020. We present a phylogenetic study to explore viral introduction and evolution during the exponential early phase of the local COVID-19 outbreak from February 26th until March 23rd. **Methods** We sequenced SARS-CoV-2 naso-oropharyngeal swabs from positive 746 tests that were performed at the University Hospital Basel in the timeframe of our study. We successfully generated 468 high quality genomes from unique patients and called variants with our COVID-19 Pipeline (COVGAP). We analysed viral genetic diversity using PANGOLIN taxonomic lineages. To identify introduction and dissemination events we incorporated global SARS-CoV-2 genomes and inferred a time-calibrated phylogeny. We used epidemiological data to aid interpretation of phylogenetic patterns. **Findings** The early outbreak in Basel was dominated by lineage B.1 (83·6%), detected from March 2nd, although the first lineage identified was B.1.1. Within B.1, a clade defined by the SNP C15324T contains 68·2% of our samples (‘Basel cluster’), including 157 identical sequences at the root of the ‘Basel cluster’, suggesting local spreading events. We infer the origin of the ‘Basel cluster’ defining mutation to mid-February in our tri-national region. The remaining genomes map broadly over the global phylogenetic tree, evidencing several events of introduction from and/or dissemination to other regions of the world via travellers. We also observe family transmission events. **Interpretation** A single lineage variant dominated the outbreak in the City of Basel while other lineages such as the first (B1.1) did not propagate. We identify mass gathering events and less so travel returners and family transmission as causes for the local outbreak. We highlight the importance of adding specific questions to the epidemiological questionnaires that are collected, to obtain data on attendance of large gathering events and locations as well as travel history to effectively identify routes of transmissions in up-coming outbreaks. This phylogenetic analysis enriches epidemiological and contact tracing data, allowing, even retrospectively, connection of seemingly unconnected events, and can inform public health interventions. ## Introduction The COVID-19 pandemic has rapidly spread around the globe during the first six months of 2020. The causative coronavirus, SARS-CoV-2, is the subject of many studies using genomic analysis providing key insights into viral diversity across cities1, provinces2-5, countries6-11, and globally12. SARS-CoV-2 has an estimated mutation rate of 0.71-1.40×10−3 13, which translates to 21-42 mutations per year. Due to the accumulation of mutations, phylogenetic analysis of SARS-CoV-2 is becoming more granular over time14, providing increasing resolution of transmission dynamics and events. Comparisons of single nucleotide polymorphisms (SNPs) allows us to explore transmission events with highest resolution across communities. The identification of transmission routes is important, especially with various public health measures being introduced, such as lockdown policies, which have been implemented on country or regional levels to limit viral transmission. The impact of public health measures can be monitored through phylogenies3. Genomic data can also deliver insights into mutations and whether they alter virulence, or aid adaptation to novel hosts. The spike protein D614G mutation, for example, has been implicated in more effective transmission15, although the actual impact may be through fixation in an expanding lineage rather than conferring increased transmissibility per se16. The scope of this study is to provide a more granular picture of the phylogenetic diversification and propagation of the early-stage SARS-CoV-2 pandemic on a local scale. The City of Basel has a population of 175,350 inhabitants (median over the past five years) with half a million people in the Basel area. Situated in North-Western Switzerland, directly bordering both Germany and France, Basel has almost 34,000 workers commuting daily across the international borders17. Given this large exchange of people in this tri-national region, the fact that the neighbouring region Alsace, France, was already experiencing an intense epidemic18, and a low threshold testing strategy implemented weeks before the first case, we aim to explore the early stage of SARS-CoV-2 transmission dynamics in Basel and the surrounding area from the first case to one week post border closure. ## Results ### Characteristics of the longitudinal study This cohort study includes all patient samples from Basel-City and the surrounding area during the initial 26 days of the local outbreak, between February 26th and March 23rd. Only single, non-repeated tests per patient were considered eligible for phylogenetic analysis. This timeframe covers the first two positively tested cases in Basel on February 26th, which we were able to capture via early implementation of PCR-based detection by routine diagnostics, until the date of border closure plus seven days (March 23rd). From the first case on February 26th 2020 until March 23rd we had performed 6,943 PCR tests. Of these, 746 samples (10.7%) were SARS-CoV-2 positive (**Figure 1**). March 23rd had the maximum number of positive tests, with 66 cases. Of all PCR tested patients and positively tested patients during the study period, a majority were female with a median age of 42 and 49 years, respectively (**Table 1**). Of the PCR-confirmed cases, only 17 (2.3%) were in patients younger than 18 years. (**Table 1, Figure S1**). 418 (56%) were living in the canton of Basel-City (City of Basel, Riehen, Bettingen) and 328 (44%) were from the surrounding area. View this table: [Table 1.](http://medrxiv.org/content/early/2020/11/04/2020.09.01.20186155/T1) Table 1. Number and age summary of all tested patients, positively tested patients, and patients with successfully sequenced SARS-CoV-2 genomes, by sex. ![Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/11/04/2020.09.01.20186155/F1.medium.gif) [Figure 1.](http://medrxiv.org/content/early/2020/11/04/2020.09.01.20186155/F1) Figure 1. Epidemiological curve of the first COVID-19 wave in the city of Basel and hinterland, Switzerland. Positive (red line, dark grey area) and negative (light grey area) SARS-CoV-2 PCR tests are depicted from the beginning of the outbreak in February to March 23, 2020. Major events and imposed restrictions are marked by horizontal lines. First confirmed cases in Switzerland and Basel were on February 25th and February 26th, respectively. ### COVGAP pipeline validation No false positive variants were called (**Figure S4, Table 2, Table S1**). Ambiguous mapping was responsible for the failure to call the three indels, resulting in insufficient (<70%) coverage to be reliably called as a variant. This validation allowed us to determine the specificity (100%), sensitivity (94.2%), and accuracy (100%) of COVGAP, thus confirming its accuracy in SNP detection, as well as its ability to call the majority of indels from short read data. View this table: [Table 2.](http://medrxiv.org/content/early/2020/11/04/2020.09.01.20186155/T2) Table 2. Sensitivity, specificity, and accuracy of COVGAP. The COVGAP pipeline produced 468 (63%) high quality genomes for subsequent analysis, of the 746 samples taken. The remaining samples were either not available (N = 57), did not pass sequencing quality control (N = 156), or were duplicates from the same patient (N = 65). These 468 samples are subsequently referred to as the Basel area cohort. Of these, 240 (51.9%) were from female patients (**Table 1, Figure S1**), and 12 (2.6%) were from patients younger than 18 years. ### Phylogenetic lineages observed over time in the Basel area cohort Over the 26-day study period, 13 out of 91 globally circulating phylogenetic lineages were recorded in the Basel area; only one additional lineage was recorded in all Swiss sequences. Lineage B.1 dominated the cases during the initial phase of the outbreak (**Figure 2**), with 83.6% (N = 391) of sequenced samples (**Table 3**), being recorded for the first time in Basel on March 2nd. The first patient diagnosed at our hospital, on February 26th, had a virus belonging to lineage B.1.1. This lineage is seen sporadically through the outbreak with a maximum of six sequenced cases from March 16th. Lineage B.1 is associated with the Italian outbreak14, yet both B.1.1 (35.7%) and B.1 (51.0%) were the most prevalent lineages in Italy during this time span (**Figure 3**). From March 13th, rarer lineages are seen in the Basel area, such as B.1.1.6, a lineage that is associated with an Austrian origin14. Only two cases from the A lineage and sub-lineages were sequenced in the Basel area cohort (**Table 3**). View this table: [Table 3.](http://medrxiv.org/content/early/2020/11/04/2020.09.01.20186155/T3) Table 3. Number of cases harbouring the S-D614G mutation in spike protein encoding gene in each phylogenetic lineage (PANGOLIN definition ver. May 19) and total count, in Basel area cohort by March 23rd 2020. ![Figure 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/11/04/2020.09.01.20186155/F2.medium.gif) [Figure 2.](http://medrxiv.org/content/early/2020/11/04/2020.09.01.20186155/F2) Figure 2. Detection of SARS-CoV-2 lineages found in the Basel area cohort from the first detected case on February 26th to March 23rd 2020. Major events and imposed restrictions are marked by horizontal lines. Low abundant lineages increase after the end of winter school vacation (March 8th) and are introduced by travel returners. ![Figure 3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/11/04/2020.09.01.20186155/F3.medium.gif) [Figure 3.](http://medrxiv.org/content/early/2020/11/04/2020.09.01.20186155/F3) Figure 3. SARS-CoV-2 lineage diversity in neighbouring countries to Switzerland from first detected case until March 23rd, 2020. Number of lineages (L) and total number of genomes (N) per country in brackets, values within charts represent percentages. France was the first country in Europe that had confirmed COVID-19 cases on January 24th, followed by Germany on January 27th, Italy on January 31st, and some weeks later Austria and Switzerland followed on February 25th. Simpson diversity based on the available genomes and PANGOLIN lineage assignments is largest in Germany (3.87) and Austria (3.73), followed by Italy (2.52), it is smallest in Switzerland (1.62) and France (1.23). Basel-City mirrors the lineage proportions of Switzerland, while contributing half of Switzerland’s sequence data. ### Lineage diversity in Basel-City, Switzerland, and neighbouring countries Comparison with lineages found in neighbouring countries over this period shows that lineage B.1 dominates in all, but the numbers of lineages identified, and the proportions vary (**Figure 3**). Switzerland (77.1%) has a similarly large proportion of B.1 lineage to France (90.0%). We describe the viral diversity based on abundance of lineages (retrieved from GISAID, for details see **Supplementary material)** using a range of diversity indices (**Table S5**). Simpson diversity, which accounts for differences in sample abundance between countries, was highest in Germany (3.87) and Austria (3.73), followed by Italy (2.52); it was lowest in Switzerland (1.62) and France (1.23). The share of our samples that originates from Basel-City residents (376 samples) excluding sequences that were obtained from commuters mirrors the lineage proportions of Switzerland (**Figure 3**), while contributing 56% of all available sequencing data for Switzerland within this timeframe (GISAID database as of June 22nd,19,20). ### Basel samples in global phylogenetic context In order to better contextualize our findings, we analysed our virus genomes phylogenetically with a subset of global publicly available sequences (see **Supplementary material**; **Figure 4**). While phylogenetic lineages may show some geographical signal, lineages do not exclusively correspond to continents (**Figure 4A**), illustrating the degree of global interconnectivity and speed of spreading. The phylogenetic lineages recorded in the Basel area are distributed across the global phylogenetic tree (**Figure 4B**). Mismatch of ‘taxonomic’ assignment and phylogenetic origin of genomes assigned to B.1 can be seen, with several as yet unnamed sub-lineages apparent in the phylogeny (**Figure 4B**). We can identify a major clade, within lineage B.1, comprising 68.2% of our samples (319/468 samples with 264 (82.8%) from patients from cantons Basel-City and Basel-Landschaft; **Figure 5A**). The remaining Basel area sequences (31.8%) (**Figure 5 B-C, Figure S6 B-C**) are spread throughout the phylogeny and cluster with global genomes. Introductions and features of some of the clades are analysed in the following sections and in **Supplementary material**. ![Figure 4.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/11/04/2020.09.01.20186155/F4.medium.gif) [Figure 4.](http://medrxiv.org/content/early/2020/11/04/2020.09.01.20186155/F4) Figure 4. SARS-CoV-2 phylogeny of Basel area samples and genetic lineages (PANGOLIN) in a global context. **A**. Time tree of SARS-CoV-2 genomes from the Basel area cohort as well as subsampled global genomes (30 genomes per country and month), coloured by continent of origin.Amino acid mutations at internal nodes representing clade defining mutations are shown. **B**. Mirrored time tree coloured by genetic lineages sensu PANGOLIN v.May19 ([https://github.com/cov-lineages/](https://github.com/cov-lineages/)). Each tip with a circle represents a genome from the Basel area cohort, branches without circled tips represent global genomes, included to confer the global context of the Basel genomes. ![Figure 5.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/11/04/2020.09.01.20186155/F5.medium.gif) [Figure 5.](http://medrxiv.org/content/early/2020/11/04/2020.09.01.20186155/F5) Figure 5. Divergence trees plotting nucleotide divergence between 468 genomes and expanded clusters of genomes in selected phylogenetic lineages. **A**. Genomes from Basel area cohort in global context. Tree composition is identical to the time tree from Figure 4. Branches with circles at the tip represent genomes from the present study; branches without circles represent global genomes from GISAID. The major Basel cluster contains samples with up to five mutations. **B**. Zoom into a mixed cluster derived from B.1.1 with seven to nine mutations difference to the root. A single genomes assigned to lineage B.1.1.6 has an assumed origin in Austria; two genomes (B.1.1.10) most likely originate from the UK. **C**. Potential ski-holiday related cluster (C1059T) with seven samples having a confirmed association with skiing destinations. Note: Divergence can be translated to number of mutations difference to the root (Wuhan-Hu-1) by multiplication by with the SARS-CoV-2 genome size (29903 bases). ### The first identified introduction of SARS-CoV-2 to Basel The first two positively diagnosed patients with SARS-CoV-2 in Basel and first identified cases of COVID-19, *Patient 1* and *Patient 2*, travelled together to Italy. In our analysis, both *Patient 1* and *Patient 2* carried viruses from the B.1.1 lineage, the second most prevalent lineage in Italy at that time (**Figure 3**). Interestingly, the two virus genomes from *Patient 1* and *Patient 2* are separated by two SNPs, suggesting two independent infections (**Figure 5B**). Moreover, we did not identify minor alleles within these samples that would hint at double infection of the patients. In this case, the epidemiological cluster of *Patient 1* and *Patient 2* is not congruent with the phylogenetic inference. The virus genome of *Patient 1* carried a synonymous mutation at C313T in *ORF1ab*, which is found in samples from Israel, Hungary, Japan, USA, Argentina, Greece, India, Brazil, Morocco, and Netherlands among others (nextstrain.org), all sharing an unsampled common ancestor that emerged around February 25th (CI February 23-26th). This SNP is also found in ten other Basel area cohort genomes: eight of these were from a family and social friends cluster unrelated to *Patient 1* that were diagnosed between March 13th and March 22nd. The virus genome of *Patient 2* carried a synonymous mutation at T19839C in *ORF1ab* and not C313T, and clusters together with eight identical virus samples sampled between February 26th and March 23rd. Two of these are from family members of *Patient 2*, who tested positive two and six days later, suggesting a family route of infection. The epidemiological data of the other six patients suggests no direct transmission via *Patient 2*. One of the six returned from a Swiss ski resort three days prior to onset of symptoms. Two additional samples forming a family cluster (*Family 1*, **Figure 5B**) diagnosed on March 3rd, carried the T19839C plus non-synonymous mutation G28179A leading to amino acid change ORF8-G96S. One member of *Family 1* travelled with *Patients 1* and *2* to Italy and possibly got infected there with a yet different virus variant. ### Introduction of the Basel cluster The clade within lineage B.1, into which 68.2% (N = 319) of our Basel area cohort sequences fall, is characterized by a synonymous SNP C15324T in *ORF1ab*, henceforth referred to as the “Basel cluster”. *Patient 1* and *Patient 2* are not linked to this Basel cluster. The first sample within the Basel cluster, from March 2nd, was from a patient residing in Central Switzerland, who was transferred to a care facility in Basel where six further people tested positive between March 5th and March 17th with identical viral genomes. The source of the first infection is unknown. The second sample within the Basel cluster was from a patient diagnosed on March 3rd, who had attended a religious event in Alsace, France, that took place between February 17th and 21st. One other patient, who tested positive on March 9th and had symptoms 14 days prior to testing, also attended this event. ### Description of the Basel cluster The 319 genomes in the Basel cluster (Figure 5A) show a divergence of up to five SNPs up until March 23rd, although 157 genomes are identical and located at the root of the clade. This infers that the common ancestor originated between February 9th and February 17th. That the clade defining SNP C15324T (GISAID emerging clades label 20A/15324T) was registered globally for the first time on March 2nd simultaneously in the Basel area sample 42173111 and GISAID sample Germany/FrankfurtFFM7/2020 suggests unsampled circulation of this variant from mid-February. By searching all genomes available on GISAID (N = 80,189; as of August 12th, 2020), filtering for genomes belonging emerging clade 20A/15324 and sampled until March 23rd (N = 2,856), we find that subsequently the C15324T mutation has also been observed in other countries but remains most prevalent in Switzerland (NGISAID =57/213, 26.8%; NGISAID+this study = 386/675, 57.2%; first genome 42173111 from March 2nd) (**Table S3**). Of the 57 Swiss genomes with this mutation that were already deposited on GISAID, 26 also originate from the cantons Basel-City and Basel-Landschaft (Table S4). The mutation is also found in higher proportions in genomes from France (N = 69/369, 18.7%, first from March 3rd sample France/HF1870/2020), Luxembourg (N = 24/116, 20.7%, from March 8th sample Luxembourg/LNS2614631/2020), and Belgium (N = 40/268, 14.9%, from March 6th sample Belgium/NKR-030645/2020), but date later than the first recorded occurrence from Switzerland and Germany (**Table S3**). Subsequently, as of March 9th, descendants of this variant were recorded outside Europe (**Table S4**) in smaller proportions than in Basel (**Table S3**), which suggests dissemination from the Basel area. ### Potential ski-holiday related cluster (C1059T) Previous reports have identified viruses in lineage B.1 carrying SNP C1059T (amino acid change ORF1a-T265I) in travel returners from ski holidays in Ischgl, Austria9,21. We detected 21 (4.5%) viral samples within our cohort with these features (**Figure 5C**), divergent by up to two SNPs from the common ancestor. Samples date from March 11th to March 23rd with an inferred internal node age of February 21st (CI: February 20th-February 21st, 2020), fitting with possible infections in Ischgl from end of February to March 13th 22. Epidemiological data confirms that five patients from that cluster had returned from Austria, and three specifically from Ischgl, the fourth from Tyrol, before testing positive. Two additional patients returned from skiing in Swiss ski resorts. ### Spike protein mutation prevalent in Basel patients The spike protein S-D614G mutation is associated with the B.1 lineage and all those derived from this (**Figure 4**). As such, it occurs in 448 of the 468 (95.7%) samples from the Basel area. The SNP responsible has not been lost once in our sub-sampled dataset, but it is not present in B.2 or B.3 or other sister lineages to B.1 (**Table 3**). Among our samples, we found no significant difference in viral loads between patients with and without the S-D614G mutation (z = −0.881, p = 0.38). However, our cohort is biased to samples with higher viral load, as these were those that were successfully sequenced. ## Discussion We reconstruct the early events focusing on introduction and spread of SARS-CoV-2 in Basel and the surrounding area, Switzerland, from a phylogenetic perspective. We present COVGAP, a new combination of existing tools to effectively and efficiently mine SARS-CoV-2 genomes from Illumina paired end reads. Unlike other currently available tools23, COVGAP shows higher sensitivity levels in SNP calling from raw reads (100%), failing only in ambiguously mapped deletions and insertions. In such cases, it adopts a coverage-conservative approach, needed to reliably call variants in real world scenarios. The majority of genome variants in Basel are similar to those from France, Italy, and Germany. We found the presence of 13 SARS-CoV-2 lineages in our samples, with the beginning of the Basel outbreak being powered by the European B.1 lineage. In particular, a B.1 lineage variant with the C15324T mutation dominated the early phase of the local spread with 70% of samples forming a large Basel cluster. Compared to Victoria, Australia5, the UK24, or Austria (this study) the diversity seen arriving in Switzerland and Basel as determined using Simpson diversity is more limited, reflecting European rather than intercontinental connections. This diversity measure can be used to monitor viral introductions as an effect of travel restrictions in the future. The Basel cluster virus variant 20A/C15324T was first detected in Europe on March 2nd in Germany and Switzerland simultaneously. We locate its geographic origin to our tri-national region between February 9th and February 17th. Our epidemiologically-informed phylogenetic analysis indicates that the Basel cluster represents a larger transmission chain that was unchecked and spread effectively among unrelated people throughout Basel and eventually outside of Europe. The first recognized case in the large Basel cluster goes back to a patient in a care facility, in which several more infections occurred. The beginning of the COVID-19 outbreak described here coincided with the winter school holidays in Basel, from February 22nd to March 8th. During this time, many residents take the opportunity to travel, in particular to skiing resorts. Viral introductions from ski resorts are known from contract tracing data to have affected Germany25, Denmark21, Iceland9, France, Spain, and UK26, with Ischgl, Austria being a described source of many cases. Our data supports the finding that skiing resorts in Austria and Switzerland served as dissemination hotspots. This school holiday also provided the opportunity for the first identified SARS-CoV-2 introductions to Basel through two jointly returned travellers, notably each with different viral variants. Overall, however travel returners did not drive the outbreak in Basel evidenced by the low diversity of variants and proportion of such variants in our sample. A second likely source represents the many workers travelling daily across borders from France and Germany, particularly from heavily affected areas such as Alsace18. As the B.1 and B.1.1 lineages were dominant in France and Germany, these may be some sources of cases and transmissions. The timing of the epidemic in Basel also coincided with three major events. Firstly, a religious event from February 17th to 21st in Alsace that was described as a super-spreading event in France27. We confirmed that the virus genomes of two patients known to have attended are indeed situated at the root of the clade that constitutes the Basel outbreak. Secondly, carnival in the Basel area is celebrated over several weeks from early January, with numerous events, and thousands of active participants. The culmination is the UNESCO world-heritage ‘Basler Fasnacht’, scheduled this year (2020) for March 2nd-4th, but cancelled due to COVID-19. Notably, the active participants practice the piccolo and drums over weeks in closed rooms as a preparation for their performance during the carnival, and unofficial events are likely to have taken place. Thirdly, Basel hosted three international soccer events at the St. Jakob Stadium on February 15th (20,675 spectators), 23rd (20,265 spectators), and 27th (14,428 spectators). All three major events fell around the inferred date of origin of the Basel mutation C15324T and subsequent dissemination phase. A limitation of the current study is that we are likely to have missed some cases as not all symptomatic people were advised to be tested, especially children younger than 18 years old. Nevertheless, our cohort represents a very high sequencing density per detected case for a city (468 genomes from 746 PCR-confirmed cases in Basel area (62.7%) and from 10,680 PCR-confirmed cases nationwide (4.4%)) for this early phase of the pandemic28. The availability and integration of epidemiological data in the interpretation of phylogenetic clades underlines the validity of instrumentalising those tools for improving the understanding of SARS-CoV-2 outbreak dynamics. Utilizing the clades as the backbone for targeted epidemiological analysis of specific cases helped in grasping how mass gatherings, travel returners, and care facilities may influence an outbreak within a city. The epidemiological data that was collected for the Federal Office of Public Health (FOPH), as requested by law, helped tremendously to verify travel related links; however it was not designed to obtain data on local super-spreading events such as attendance to soccer games, visiting clubs, restaurants, bars, and concerts and future versions could be improved. The classical epidemiological context is very important to further explain molecular epidemiological links especially in a still not very diversified virus. In conclusion, the start of the outbreak of SARS-CoV-2 in the Basel area was characterized by a dominant variant, C15324T, within the B.1 lineage, which we infer to have arisen in mid-February in our tri-national region. Large gatherings (potential super spreading events) could have had profound effects on outbreak dynamics. Improved surveillance measures are needed in the management of an outbreak, including large-scale, active screening in the broader public, including more children to assess their role in transmission. Our analysis shows the potential of molecular epidemiology to support classical contact tracing, even retrospectively, in order to evaluate and improve measures to contain epidemics like COVID-19. ## Materials and Methods ### Patients, samples, and diagnosis Respiratory samples from the University Hospital Basel and the University Children’s Hospital Basel (UKBB) patients were tested for SARS-CoV-2: from January 23rd 2020 testing was based on current case definitions from the Federal Office of Public Health (FOPH); from 27th February additionally, all respiratory samples negative for other respiratory pathogens were tested. Patient samples which tested positive for SARS-CoV-2 29,30 up to and including March 23rd were considered eligible for the present study. In total 6,943 diagnostic tests were performed during the study period. The 746 positively tested cases came predominantly from the administrative unit of Basel-City, Riehen, and Bettingen (418, 58%), while the remaining patients were from Basel-Landschaft and neighbouring cantons and countries. For diagnosis, swabs from the naso- and oropharyngeal sites (NOPS) were taken, and combined into one universal transport medium tube (UTM, Copan). Total nucleic acids (TNAs) were extracted using the MagNA Pure 96 system and the DNA and viral RNA small volume kit (Roche Diagnostics, Rotkreuz, Switzerland) or using the Abbott m2000 Realtime System and the Abbott sample preparation system reagent kit (Abbott, Baar, Switzerland). Aliquots of extractions were sent for diagnosis to Charité, Berlin, Germany from January 23rd - 29th, and to Geneva to the National Reference Centre (NAVI) in Switzerland from January 29th. In-house analysis started February 27th as part of the hospital routine diagnostics as previously described 30. ### Whole genome sequencing (WGS) SARS-CoV-2 genomes were amplified following the amplicon sequencing strategy of the ARTIC protocol ([https://artic.network/ncov-2019](https://artic.network/ncov-2019)) with V.1 or V.3 primers 31. In detail, real-time reverse transcriptase (RT) reactions were run to a total volume of 10 μl extracted total nucleic acid. After some optimization, PCR used 25 cycles for samples with a diagnostic cycle threshold (Ct) value lower than 21 (viral loads higher than 8.2 log10 Geq/ml); 40 cycles for all other samples (lower viral load samples) and repeats. Purified amplicons were converted into Illumina libraries with Nextera Flex DNA library prep kit (Illumina) automated on a Hamilton STAR robot, using 5ng input DNA. 96 libraries were multiplexed and sequenced paired-end 150 nucleotides on an Illumina NextSeq 500 instrument. #### Consensus sequence generation and detection of mutations After demultiplexing using bcl2fastq software version v.2.17 (Illumina), COVGAP (COVid-19 Genome Analysis Pipeline) was used (**Figure S2**). This incorporates: quality filtering using trimmomatic software version v.0.38 32 to remove Illumina adaptors and PCR primer sequences from read ends; removal of reads smaller than 127 bases, and removal of reads with a phred score under 20 (calculated across a 4-base sliding window). Quality filtered reads were mapped to the Wuhan-Hu-1 reference MN908947.3 33 using the BWA aligner 34. Reads flagged as mapping to the reference were retained 35, and are deposited under project PRJEB39887. SNPs and indels with respect to the reference sequence were called using pilon version 1.23 36. Pilon summary metrics ‘alternative allele fraction’ (AF) and ‘depth of valid reads in pileup’ (DP) were used to identify major and minor alleles across all bases, which is not implemented in pilon itself. Major alleles were called if supported by 70% of the reads covering the variant locus (AF) for any locus with a minimum of 50x coverage (DP). Variants were applied to the reference to produce a consensus sequence; any base position with less than 50x coverage was masked with ambiguous characters (Ns) using BCFTools version 1.10.2 37. Consensus sequences were accepted for further analysis when containing up to 10% Ns. Summary statistics, logs, coverage plots, and genome stack plots were generated using R version 3.6.0 and packages Gviz v1.30 38, Sushi v1.23 39, seqinr v3.6.1 40, and ggplot2 v3.11 41. COVGAP also provides per genome quality control visual outputs (**Figure S5**) and is available at [https://github.com/appliedmicrobiologyresearch](https://github.com/appliedmicrobiologyresearch). Quality control statistics such as the relationship between Ct-value and number of mapped reads and coverage are presented in **Figure S3**. In general, we observed a negative trend linking Ct values and percentage of ambiguous bases (Ns) being called as a result of low coverage. Sequences passing the quality filter (n=533) showed a lower Ct value (median: 22.4±5.14) than the ones that failed (n =156; median: 35.75.9±5.75). #### COVGAP Validation We used a set of 15 randomly *in silico* mutated SARS-CoV-2 mock genomes for the validation of the specificity (identification of true negatives) and accuracy (identification of true negatives and true positives) of COVGAP. Additionally, the genome MT339040, which harbours an 81 nucleotide deletion in the ORF7a gene and a further seven SNPs relative to the reference 42 was used. Together, the mock genomes possess 38 mutations including 30 SNPs, six deletions and two insertions across the reference genome MN908947.3. The genomes were then shredded to artificial paired-end 150 nucleotide reads using SAMtools wgsim 37 and processed by COVGAP. For validation purposes, original mock genomes and the COVGAP generated genomes from the shredded reads were aligned using Seaview v4.6 43 and clustalw 44. A phylogeny was built using PhyML within Seaview with default parameters. ### Phylogenetic lineage assignment of Basel samples To assess the phylogenetic diversity of SARS-CoV-2 samples during the early phase of the pandemic we inferred the lineage assignment for each consensus sequence derived from the COVGAP pipeline using PANGOLIN ver. May 19th (Phylogenetic Assignment of Named Global Outbreak LINeages) 14 available at github.com/hCoV-2019/pangolin. Details on lineage summaries, describing which countries lineages have been reported from and where transmission events have been recorded, can be found at [https://github.com/hCoV-2019/lineages](https://github.com/hCoV-2019/lineages). Lineage assignments were used to aid visualization of phylogenetic diversity in Basel in a global context. For global sequences we used the PANGOLIN lineage assignments as provided by GISAID ([https://www.gisaid.org/](https://www.gisaid.org/); 19,20) (details next section), which were used for plotting purposes on phylogenetic trees. To compare the lineage diversity in Switzerland and Basel-City to neighbouring European countries (Austria, France, Germany, and Italy) during the early phase of the pandemic, we visualized relative abundances of lineages using all high-quality, on GISAID (downloaded June 22nd, 2020) available consensus sequences for the time until March 23rd from Austria (N = 188), France (N = 230), Germany (N = 133), and Italy (N = 98). For Switzerland (N = 673), we combined our sequences (N = 468) with other sequence data from Switzerland 48 published on GISAID. To infer the diversity for canton Basel-City (including Bettingen and Riehen) excluding sequences that were obtained from commuters, we used the Basel-City portion (N = 376) of the Basel area cohort excluding samples from patients from cities outside of the administrative district of Basel-City. We calculated Simpson diversity (inverse Simpson concentration) as implemented in the SpadeR package v.0.0.1 45-47, which controls for lineage abundance differences between the countries, which is dependent on available sequence data, and which ranges from 0 (no diversity) to indefinite (large diversity). ### Analysing Basel SARS-CoV-2 genomes in global phylogenetic context High-quality and full-length consensus sequences and corresponding metadata (sample ID, date of sample, geographic location of sampling, PANGOLIN lineage) from Swiss 48 and global viruses were downloaded from GISAID on June 22nd, 2020, making 49,284 individual genome sequences. 43,252 sequences were retained after filtering for genomes with under 10% ambiguous characters (Ns) (author Genivaldo Gueiros Z. Silva) 49. Metadata and consensus sequences of the Basel samples and global data from GISAID were combined for further joint analysis, which were performed using custom R scripts and the nextstrain command line interface analysis pipeline v.2.0.0 (nextstrain.org) and augur v.8.0.0 50. Dates in our study samples correspond to date of sampling. Sequences were filtered by date from December 1st 2019 to March 23rd 2020 using an R custom script in R version 4.0.0 51 and packages tidyr ver.1.1.0 52, dplyr ver. 1.0.0. 53, and readr ver. 1.3.1. 54: 15,973 consensus sequences, including the Basel area sequences, remained. These time-filtered sequences were sub-sampled by geographic location to 30 sequences per country and month. Non-human derived viruses as well as sequences with other ambiguous characters (Us), as well as those from cruise ships, and duplicated sequences defined by the nextstrain team as of June 24th ([https://github.com/nextstrain/ncov/](https://github.com/nextstrain/ncov/)) were excluded using *augur filter* 50 resulting in 2,485 sequences for the final phylogenetic analysis dataset. Consensus sequences were aligned to the NCBI Refseq sequence Wuhan-Hu-1 reference MN908947.3 using mafft v7.467 with method FFT-NS-fragment 55 and options --reorder –keeplength --mapout --kimura 1 -- addfragments --auto. The resulting alignment was end-trimmed to remove low-quality bases (bases 1-55; 29804-29903). We masked homoplasic sites (**Table S2**) that have no phylogenetic signal 56 (deposited at [https://github.com/W-L/ProblematicSites_SARS-CoV2](https://github.com/W-L/ProblematicSites_SARS-CoV2)). Please note, that this list is under constant development as number and diversity of sequence data evolves; we retrieved the data on June 19th, 2020. Masking was done using *augur mask*. The resulting alignment was analysed in IQ-TREE 2 57 for tree inference using *augur tree* with substitution model GTR+G. The tree in Newick format was then subjected, together with the date information of each genome and the initial sequence alignment, to an estimation of the evolutionary rate by a regression of the divergence (number of mutations) against the sampling date using TreeTime 58 implemented in *augur refine*. Genomes or branches that deviated more than four interquartile ranges from the root to the tip versus the time tree were removed as likely outliers. The resulting time-calibrated and divergence trees were re-rooted to MN908947.3 and MT291826.1, the first official cases and published genomes of SARS-CoV-2 from Wuhan, China. Ancestral trait reconstruction of each patient’s viral genome was done for region (continent) and country as well as region and country of exposure using *augur* traits with a sampling bias correction of 2.5. Internal nodes and tips (actual genomes) were annotated regarding their nucleotide and amino acid changes in relation to the reference using *augur ancestral* and *augur translate*, respectively. All data were exported as json files (supplementary files) using *augur export v2* to be visualized in *auspice* v2 50. Identified clades of interest were further inspected for existing epidemiological links using data collected by the University Hospital. ### Identifying genomes belonging to GISAID emerging clade A20/15324T To identify a possible geographic origin of the synonymous C15324T mutation in *ORF1ab*, we performed a search on all available GISAID genomes as of August 12th, 2020. We downloaded all high quality and complete genomes that were assigned do GISAID legacy clade G (corresponds to clade 20A) and PANGOLIN lineage B.1 (all three are mostly congruent 59) with a collection date between December 2019 and March 23rd, 2020 (N = 2,856). We used Nextclade version 0.3.5 ([https://clades.nextstrain.org](https://clades.nextstrain.org)) to infer genomic mutations and filtered for sequences that contained C15324T. This procedure allowed avoidance of homoplasic mutations at this site. Further, we downloaded metadata for all high quality and complete genomes (as of August 12th, 2020) irrespective of clade to calculate summary statistics of number of genomes sequenced per country. ### Identification of S-gene D614G mutation in Basel sequences We screened the early phase Basel sequences for the mutation at nucleotide position 23,403 based on the alignment to the ***Wuhan-Hu-1*** reference sequence MN908947.3. Viral load (Ct-value) of patients that carried lineages with a mutated S-D614G gene (N = 274) were compared to patients that carried the ancestral allele (N = 12) using a Mann-Whitney U test. ## Supporting information SupplementaRY material [[supplements/186155_file02.pdf]](pending:yes) ## Data Availability Sequencing data (viral reads only) was submitted to European Nucleotide Archive (ENA) under accession number PRJEB39887, consensus sequences were submitted to GISAID, bioinformatic pipelines are accessible on Github. [https://www.gisaid.org/](https://www.gisaid.org/) [https://www.ebi.ac.uk/ena/browser/home](https://www.ebi.ac.uk/ena/browser/home) ## Data accessibility Sequencing data (viral reads only) was submitted to European Nucleotide Archive (ENA) under accession number PRJEB39887, consensus sequences were submitted to GISAID, bioinformatic pipelines are accessible on Github. ## Ethics The study was conducted according to good laboratory practice and in accordance with the Declaration of Helsinki and national and institutional standards and was approved by the ethical committee (EKNZ 2020-00769). The clinical trial accession number is NCT04351503 (clinicaltrials.gov). ## Authors contributions AE and HH devised the project. KL and AG collected and prepared samples and associated data. MS performed the phylogenetic analysis and interpretation, and led the writing and revising of the report. AM constructed the COVGAP bioinformatic pipeline and released it on Github. TR and HSS prepared viral RNA for sequencing, directed the phylogenetic analysis and deposited genomic data to GISAID and ENA. MSch and KKS collected clinical and epidemiological data. MyB and RSS provided geographical expertise. JB, STS and SF provided public health and epidemiological expertise. HP, MSi, CHN, RB, MO, SB, and MB provided clinical expertise and valuable discussion on the results. All authors commented on the draft report and contributed to the final version. ## Competing interests statement The authors declare no competing interests. ## Acknowledgements We thank Daniel Gander, Christine Kiessling, Magdalena Schneider, Elisabeth Schultheiss, Clarisse Straub, and Rosa-Maria Vesco (University Hospital Basel) for excellent technical assistance with sequencing. Calculations were performed at sciCORE ([http://scicore.unibas.ch/](http://scicore.unibas.ch/)) scientific computing center at University of Basel, the support from the sciCORE team for the analysis is greatly appreciated. Support for the creation of schematic figures (S2) was provided by BioRender.com. We thank all authors who have shared their genomic data on GISAID especially the Stadler Lab from ETHZ for sharing other Swiss sequences. A full table outlining the originating and submitting labs is included as a supplementary file. No dedicated funding was used for this work. ## Footnotes * Reference added and acknowledgement amended * Received September 1, 2020. * Revision received November 4, 2020. * Accepted November 4, 2020. * © 2020, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), CC BY-NC 4.0, as described at [http://creativecommons.org/licenses/by-nc/4.0/](http://creativecommons.org/licenses/by-nc/4.0/) ## References 1. Tayoun, A. et al. Genomic surveillance and phylogenetic analysis reveal multiple introductions of SARS-CoV-2 into a global travel hub in the Middle East. BioRxiv 2020.05.06.080606 (2020). 2. Banu, S. et al. A distinct phylogenetic cluster of Indian SARS-CoV-2 isolates. bioRxiv, 2020.2005.2031.126136, doi:10.1101/2020.05.31.126136 (2020). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiYmlvcnhpdiI7czo1OiJyZXNpZCI7czoxOToiMjAyMC4wNS4zMS4xMjYxMzZ2MSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIwLzExLzA0LzIwMjAuMDkuMDEuMjAxODYxNTUuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 3. Lu, J. et al. Genomic Epidemiology of SARS-CoV-2 in Guangdong Province, China. Cell 181, 997-1003.e1009, doi:10.1016/j.cell.2020.04.023 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.cell.2020.04.023&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32359424&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F11%2F04%2F2020.09.01.20186155.atom) 4. Meredith, L. W. et al. Rapid implementation of SARS-CoV-2 sequencing to investigate cases of health-care associated COVID-19: a prospective genomic surveillance study. Lancet Infect Dis, doi:10.1016/s1473-3099(20)30562-4 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/s1473-3099(20)30562-4&link_type=DOI) 5. Seemann, T. et al. Tracking the COVID-19 pandemic in Australia using genomics. medRxiv, 2020.2005.2012.20099929, doi:10.1101/2020.05.12.20099929 (2020). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoibWVkcnhpdiI7czo1OiJyZXNpZCI7czoyMToiMjAyMC4wNS4xMi4yMDA5OTkyOXYxIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMTEvMDQvMjAyMC4wOS4wMS4yMDE4NjE1NS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 6. Candido, D. S. et al. Evolution and epidemic spread of SARS-CoV-2 in Brazil. Science, eabd2161, doi:10.1126/science.abd2161 (2020). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEzOiIzNjkvNjUwOC8xMjU1IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMTEvMDQvMjAyMC4wOS4wMS4yMDE4NjE1NS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 7. Díez-Fuertes, F. et al. Phylodynamics of SARS-CoV-2 transmission in Spain. bioRxiv, 2020.2004.2020.050039, doi:10.1101/2020.04.20.050039 (2020). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiYmlvcnhpdiI7czo1OiJyZXNpZCI7czoxOToiMjAyMC4wNC4yMC4wNTAwMzl2MSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIwLzExLzA0LzIwMjAuMDkuMDEuMjAxODYxNTUuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 8. Gámbaro, F. et al. Introductions and early spread of SARS-CoV-2 in France. bioRxiv, 2020.2004.2024.059576, doi:10.1101/2020.04.24.059576 (2020). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiYmlvcnhpdiI7czo1OiJyZXNpZCI7czoxOToiMjAyMC4wNC4yNC4wNTk1NzZ2MiI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIwLzExLzA0LzIwMjAuMDkuMDEuMjAxODYxNTUuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 9. Gudbjartsson, D. F. et al. Spread of SARS-CoV-2 in the Icelandic Population. N Engl J Med 382, 2302–2315, doi:10.1056/NEJMoa2006100 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMoa2006100&link_type=DOI) 10. Kumar, P. et al. Integrated genomic view of SARS-CoV-2 in India. bioRxiv, 2020.2006.2004.128751, doi:10.1101/2020.06.04.128751 (2020). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiYmlvcnhpdiI7czo1OiJyZXNpZCI7czoxOToiMjAyMC4wNi4wNC4xMjg3NTF2MSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIwLzExLzA0LzIwMjAuMDkuMDEuMjAxODYxNTUuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 11. Zehender, G. et al. Genomic characterization and phylogenetic analysis of SARS-COV-2 in Italy. J Med Virol, doi:10.1002/jmv.25794 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/jmv.25794&link_type=DOI) 12. eam, T. N. Genomic epidemiology of novel coronavirus - Global subsampling, <[https://nextstrain.org/ncov/global](https://nextstrain.org/ncov/global)> (2020). 13. Hill, V. & Rambaut, A. Phylodynamic analysis of SARS-CoV-2 | Update 2020-03-06, <[https://virological.org/t/phylodynamic-analysis-of-sars-cov-2-update-2020-03-06/420](https://virological.org/t/phylodynamic-analysis-of-sars-cov-2-update-2020-03-06/420)> (2020). 14. Rambaut, A. et al. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nat Microbiol, doi:10.1038/s41564-020-0770-5 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41564-020-0770-5&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32669681&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F11%2F04%2F2020.09.01.20186155.atom) 15. Zhang, L. et al. The D614G mutation in the SARS-CoV-2 spike protein reduces S1 shedding and increases infectivity. bioRxiv, doi:10.1101/2020.06.12.148726 (2020). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiYmlvcnhpdiI7czo1OiJyZXNpZCI7czoxOToiMjAyMC4wNi4xMi4xNDg3MjZ2MSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIwLzExLzA0LzIwMjAuMDkuMDEuMjAxODYxNTUuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 16. van Dorp, L. et al. No evidence for increased transmissibility from recurrent mutations in SARS-CoV-2. bioRxiv, 2020.2005.2021.108506, doi:10.1101/2020.05.21.108506 (2020). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiYmlvcnhpdiI7czo1OiJyZXNpZCI7czoxOToiMjAyMC4wNS4yMS4xMDg1MDZ2NSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIwLzExLzA0LzIwMjAuMDkuMDEuMjAxODYxNTUuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 17. Grenzgänger, <[https://www.statistik.bs.ch/haeufig-gefragt/arbeiten/grenzgaenger.html](https://www.statistik.bs.ch/haeufig-gefragt/arbeiten/grenzgaenger.html)( 18. Swissinfo.ch. Swiss hospitals to take French coronavirus patients, <[https://www.swissinfo.ch/eng/cross-border-care\_swiss-hospitals-take-french-coronavirus-patients/45634674](https://www.swissinfo.ch/eng/cross-border-care_swiss-hospitals-take-french-coronavirus-patients/45634674)> ( 19. Elbe, S. & Buckland-Merrett, G. Data, disease and diplomacy: GISAID’s innovative contribution to global health. Glob Chall 1, 33–46, doi:10.1002/gch2.1018 (2017). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/gch2.1018&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=31565258&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F11%2F04%2F2020.09.01.20186155.atom) 20. Shu, Y. & McCauley, J. GISAID: Global initiative on sharing all influenza data -from vision to reality. Euro Surveill 22, 30494, doi:10.2807/1560-7917.ES.2017.22.13.30494 (2017). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2807/1560-7917.ES.2017.22.13.30494&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28382917&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F11%2F04%2F2020.09.01.20186155.atom) 21. Bluhm, A. et al. SARS-CoV-2 Transmission Chains from Genetic Data: A Danish Case Study. bioRxiv, 2020.2005.2029.123612, doi:10.1101/2020.05.29.123612 (2020). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiYmlvcnhpdiI7czo1OiJyZXNpZCI7czoxOToiMjAyMC4wNS4yOS4xMjM2MTJ2MiI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIwLzExLzA0LzIwMjAuMDkuMDEuMjAxODYxNTUuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 22. Versteeg, B. et al. Genomic analyses of the Chlamydia trachomatis core genome show an association between chromosomal genome, plasmid type and disease. BMC genomics 19, 130, doi:10.1186/s12864-018-4522-3 (2018). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s12864-018-4522-3&link_type=DOI) 23. Xing, Y., Li, X., Gao, X. & Dong, Q. MicroGMT: A Mutation Tracker for SARS-CoV-2 and Other Microbial Genome Sequences. Front Microbiol 11, 1502, doi:10.3389/fmicb.2020.01502 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.3389/fmicb.2020.01502&link_type=DOI) 24. Pybus, O. G. et al. Preliminary analysis of SARS-CoV-2 importation & establishment of UK transmission lineages, <[https://virological.org/t/preliminary-analysis-of-sars-cov-2-importation-establishment-of-uk-transmission-lineages/507](https://virological.org/t/preliminary-analysis-of-sars-cov-2-importation-establishment-of-uk-transmission-lineages/507)> (2020). 25. Felbermayr, G., Hinz, J. & S, C. Après-ski: The Spread of Coronavirus from Ischgl through Germany, <[https://www.ifw-kiel.de/fileadmin/Dateiverwaltung/IfW-Publications/Gabriel\_Felbermayr/Apres-skiThe\_Spread\_of\_Coronavirus\_from\_Ischgl\_through\_Germany/coronavirus\_from\_ischgl.pdf](https://www.ifw-kiel.de/fileadmin/Dateiverwaltung/IfW-Publications/Gabriel\_Felbermayr/Apres-skiThe\_Spread\_of\_Coronavirus\_from\_Ischgl\_through_Germany/coronavirus_from_ischgl.pdf)> (2020). 26. Hodcroft, E. B. Preliminary case report on the SARS-CoV-2 cluster in the UK, France, and Spain. Swiss Med Wkly 150, doi:10.4414/smw.2020.20212 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.4414/smw.2020.20212&link_type=DOI) 27. Zeitung, B. Wir haben in Mulhouse die Epidemie-Phase erreicht, <[https://www.bazonline.ch/basel/region/gottesdienst-in-mulhouse-entwickelt-sich-zu-moeglichem-coronaherd/story/29069253](https://www.bazonline.ch/basel/region/gottesdienst-in-mulhouse-entwickelt-sich-zu-moeglichem-coronaherd/story/29069253)> ( 28. Joseph, S. J., Didelot, X., Gandhi, K., Dean, D. & Read, T. D. Interplay of recombination and selection in the genomes of Chlamydia trachomatis. Biol Direct 6, 28, doi:10.1186/1745-6150-6-28 (2011). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/1745-6150-6-28&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21615910&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F11%2F04%2F2020.09.01.20186155.atom) 29. Goldenberger, D. et al. Brief validation of the novel GeneXpert Xpress SARS-CoV-2 PCR assay. J Virol Methods 284, 113925, doi:10.1016/j.jviromet.2020.113925 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.jviromet.2020.113925&link_type=DOI) 30. Leuzinger, K. et al. Epidemiology of SARS-CoV-2 Emergence Amidst Community-Acquired Respiratory Viruses. J Infect Dis, doi:10.1093/infdis/jiaa464 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/infdis/jiaa464&link_type=DOI) 31. Quick, J. nCoV-2019 sequencing protocol. protocols.io <[dx.doi.org/10.17504/protocols.io.bdp7i5rn](http://dx.doi.org/10.17504/protocols.io.bdp7i5rn)> (2020). 32. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120, doi:10.1093/bioinformatics/btu170 (2014). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btu170&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24695404&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F11%2F04%2F2020.09.01.20186155.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000340049100004&link_type=ISI) 33. Wu, F. et al. A new coronavirus associated with human respiratory disease in China. Nature 579, 265–269, doi:10.1038/s41586-020-2008-3 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-020-2008-3&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32015508&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F11%2F04%2F2020.09.01.20186155.atom) 34. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760, doi:10.1093/bioinformatics/btp324 (2009). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btp324&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19451168&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F11%2F04%2F2020.09.01.20186155.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000267665900006&link_type=ISI) 35. Wala, J., Zhang, C. Z., Meyerson, M. & Beroukhim, R. VariantBam: filtering and profiling of next-generational sequencing data using region-specific rules. Bioinformatics 32, 2029–2031, doi:10.1093/bioinformatics/btw111 (2016). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btw111&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27153727&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F11%2F04%2F2020.09.01.20186155.atom) 36. Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One 9, e112963, doi:10.1371/journal.pone.0112963 (2014). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pone.0112963&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25409509&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F11%2F04%2F2020.09.01.20186155.atom) 37. Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993, doi:10.1093/bioinformatics/btr509 (2011). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btr509&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21903627&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F11%2F04%2F2020.09.01.20186155.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000296099300009&link_type=ISI) 38. Hahne, F. & Ivanek, R. Visualizing Genomic Data Using Gviz and Bioconductor. Methods Mol Biol 1418, 335–351, doi:10.1007/978-1-4939-3578-9_16 (2016). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/978-1-4939-3578-9_16&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27008022&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F11%2F04%2F2020.09.01.20186155.atom) 39. Phanstiel, D. H., Boyle, A. P., Araya, C. L. & Snyder, M. P. Sushi.R: flexible, quantitative and integrative genomic visualizations for publication-quality multi-panel figures. Bioinformatics 30, 2808–2810, doi:10.1093/bioinformatics/btu379 (2014). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btu379&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24903420&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F11%2F04%2F2020.09.01.20186155.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000343082900016&link_type=ISI) 40. 1. Bastolla U., 2. Porto M., 3. Roman H.E. D.,C. & J.R., L. in Structural Approaches to Sequence Evolution. Biological and Medical Physics, Biomedical Engineering (eds Bastolla U., Porto M., Roman H.E., & Vendruscolo M.) (Springer, 2007). 41. Wickham, H. ggplot2: Elegant Graphics for Data Analysis, <[https://ggplot2-book.org/](https://ggplot2-book.org/)> ( 42. Holland, L.A. et al. An 81-Nucleotide Deletion in SARS-CoV-2 ORF7a Identified from Sentinel Surveillance in Arizona (January to March 2020). J Virol 94,p doi:10.1128/jvi.00711-20 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1128/jvi.00711-20&link_type=DOI) 43. Gouy, M., Guindon, S. & Gascuel, O. SeaView Version 4: A Multiplatform Graphical User Interface for Sequence Alignment and Phylogenetic Tree Building. Molecular Biology and Evolution 27, 221–224, doi:10.1093/molbev/msp259 (2009). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/molbev/msp259&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19854763&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F11%2F04%2F2020.09.01.20186155.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000273704400003&link_type=ISI) 44. Sievers, F. et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 7, 539, doi:10.1038/msb.2011.75 (2011). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/msb.2011.75&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21988835&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F11%2F04%2F2020.09.01.20186155.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000296652600007&link_type=ISI) 45. Chao, A., Chiu, C. H. & Hsieh, T. C. Proposing a resolution to debates on diversity partitioning. Ecology 93, 2037–2051, doi:10.1890/11-1817.1 (2012). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1890/11-1817.1&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23094376&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F11%2F04%2F2020.09.01.20186155.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000309517200008&link_type=ISI) 46. Chao, A. & Jost, L. Estimating diversity and entropy profiles via discovery rates of new species. Methods in Ecology and Evolution 6, 873–882, doi:10.1111/2041-210X.12349 (2015). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/2041-210X.12349&link_type=DOI) 47. Chao, A., Wang, Y. & Jost, L. Entropy and the species accumulation curve: A novel entropy estimator via discovery rates of new species. Methods Ecol Evol 4, 1091–1100, doi:doi.org/10.1111/2041-210X.12108 (2013). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/2041-210X.12108&link_type=DOI) 48. Nadeau, S. et al. Quantifying SARS-CoV-2 spread in Switzerland based on genomic sequencing data. medRxiv, 2020.2010.2014.20212621, doi:10.1101/2020.10.14.20212621 (2020). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoibWVkcnhpdiI7czo1OiJyZXNpZCI7czoyMToiMjAyMC4xMC4xNC4yMDIxMjYyMXYyIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMTEvMDQvMjAyMC4wOS4wMS4yMDE4NjE1NS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 49. Cock, P. J. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423, doi:10.1093/bioinformatics/btp163 (2009). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btp163&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19304878&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F11%2F04%2F2020.09.01.20186155.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000266109500013&link_type=ISI) 50. Hadfield, J. et al. Nextstrain: real-time tracking of pathogen evolution. Bioinformatics 34, 4121–4123, doi:10.1093/bioinformatics/bty407 (2018). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/bty407&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=29790939&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F11%2F04%2F2020.09.01.20186155.atom) 51. Team, R. C. R: A language and environment for statistical computing, <[https://www.r-project.org/](https://www.r-project.org/)> ( 52. Wickham, H. & Henry, L. tidyr: Tidy Messy Data. R package version 1.1.0., <[https://CRAN.R611project.org/package=tidyr](https://CRAN.R611project.org/package=tidyr)> (2020). 53. Wickham, H., François, R., Henry, L. & Müller, K. dplyr: A Grammar of Data Manipulation. R package version 1.0.0., <[https://CRAN.R-project.org/package=dplyr](https://CRAN.R-project.org/package=dplyr)> (2020). 54. Wickham, H., J, H. & Francois, R. readr: Read Rectangular Text Data. R package version 1.3.1., <[https://CRAN.R-project.org/package=readr](https://CRAN.R-project.org/package=readr)> (2018). 55. Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30, 772–780, doi:10.1093/molbev/mst010 (2013). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/molbev/mst010&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23329690&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F11%2F04%2F2020.09.01.20186155.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000317002300004&link_type=ISI) 56. De Maio, N., C, W. & N, G. <[https://virological.org/t/issues-with-sars-cov-2-sequencing-data/473/10](https://virological.org/t/issues-with-sars-cov-2-sequencing-data/473/10)> ( 57. Minh, B. Q. et al. IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Mol Biol Evol 37, 1530–1534, doi:10.1093/molbev/msaa015 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/molbev/msaa015&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32011700&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F11%2F04%2F2020.09.01.20186155.atom) 58. Sagulenko, P., Puller, V. & Neher, R. A. TreeTime: Maximum-likelihood phylodynamic analysis. Virus Evol 4, vex042, doi:10.1093/ve/vex042 (2018). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/ve/vex042&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=29340210&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F11%2F04%2F2020.09.01.20186155.atom) 59. GISAID - Clade and lineage nomenclature aids in genomic epidemiology of active hCoV-19 viruses, <[https://www.gisaid.org/references/statements-clarifications/clade-and-lineage-nomenclature-aids-in-genomic-epidemiology-of-active-hcov-19-viruses/](https://www.gisaid.org/references/statements-clarifications/clade-and-lineage-nomenclature-aids-in-genomic-epidemiology-of-active-hcov-19-viruses/)> (