Abstract
The SARS-CoV-2 lineages B.1.1.7 and 501.V2, which were first detected in the United Kingdom and South Africa, respectively, are spreading rapidly in the human population. Thus, there is an increased need for genomic and epidemiological surveillance in order to detect the strains and estimate their abundances. Here, we report a genomic analysis of SARS-CoV-2 in 48 raw wastewater samples collected from three wastewater treatment plants in Switzerland between July 9 and December 21, 2020. We find evidence for the presence of several mutations that define the B.1.1.7 and 501.V2 lineages in some of the samples, including co-occurrences of up to three B.1.1.7 signature mutations on the same amplicon in four samples from Lausanne and one sample from a Swiss ski resort dated December 9 - 21. These findings suggest that the B.1.1.7 strain could be detected by mid December, two weeks before its first verification in a patient sample from Switzerland. We conclude that sequencing SARS-CoV-2 in community wastewater samples may help detect and monitor the circulation of diverse lineages.
Background
Viral RNA of SARS-CoV-2 infected persons can be found in faeces (Gupta et al. 2020) and accordingly, at the community level, in the wastewater collected in wastewater treatment plants (WWTPs) (Kitajima et al. 2020; Medema et al. 2020; Gonzalez et al. 2020). In principle, wastewater samples can provide a snapshot of the circulating viral lineages and their diversity in the community (Izquierdo Lara et al. 2020; Nemudryi et al. 2020; Martin et al. 2020; Crits-Christoph et al. 2020) and serve as an efficient and complementary approach to genomic epidemiology based on individual patient samples (Nadeau et al. 2020).
However, it is challenging to analyze wastewater samples for SARS-CoV-2 genomic composition, because samples are enriched in PCR inhibitors during the virus concentration steps, viral genomes may be fragmented, and sewage contains large amounts of bacterial, human and other viral DNA and RNA genomes (Cantalupo et al. 2011; Fernandez-Cassi et al. 2018; Martínez-Puchol et al. 2020). Even if amplified and sequenced successfully, the read data obtained from the mixture of viral genomes from different infected persons is difficult to interpret, because of amplification biases, sequencing errors, and insufficient phasing information. Detecting a viral lineage that is present only in a small fraction of infected persons, such as, for example, a recently mutated or imported new viral variant into an area with an ongoing local outbreak, is therefore a major challenge, similar to identifying low-frequency variants within single-patient samples (Kuipers et al. 2020; Karamitros et al. 2020; Rose et al. 2020). On the other hand, tracking the genomic composition of viral RNA in the community in multiple samples over time increases our ability to make reliable calls and offers an opportunity to monitor the circulation of diverse SARS-CoV-2 lineages.
The novel SARS-CoV-2 variants B.1.1.7 (or VOC-202012/01), which was first found in the UK (Rambau et al. 2020), and 501.V2 (or S501Y.V2), which was identified in South Africa (Tegally et al. 2020), have recently generated a lot of interest as they may be associated with increased infectivity and hence accelerated spread in the human population (Volz et al. 2020, Vöhringer et al. 2020) (Tegally et al. 2020). Although some of the mutations defining these lineages arose independently before – without any evidence for increased transmissibility so far (van Dorp et al. 2020) – the combination of 17 and 12 genetic alterations, defining B.1.1.7 and 501.V2, respectively, is unique and may have unexpected consequences on the phenotype. In Switzerland, the B.1.1.7 lineage was first detected in two clinical samples on December 24 (Bundesamt für Gesundheit 2020a), and the 501.V2 lineage in another two samples on December 27 (Bundesamt für Gesundheit 2020b). To our knowledge, one of the 501.V2 cases has no known epidemiological links to abroad and several recent B.1.1.7 cases identified in Geneva have no known epidemiological links to abroad indicating local transmission (Département de la sécurité, de l’emploi et de la santé, Republic et Canton de Geneve 2021). However, it remains unknown to what extent the new variants are already circulating in Switzerland. Sequencing community wastewater samples offers an efficient and cost-effective way to address this epidemiologically important question.
Here, we report a preliminary analysis of next-generation sequencing (NGS) data obtained from a total of 48 wastewater samples collected from three WWTPs in Switzerland between July 8 and December 21, 2020. We explore what SARS-CoV-2 wastewater genomic data can reveal about arising lineages and their circulation in the community and compare our findings to about 5000 SARS-CoV-2 patient sequences collected in Switzerland.
Methods
Patient sequences. Per-patient SARS-CoV-2 consensus sequences were downloaded from GISAID (Shu and McCauley 2017) for all samples collected in Switzerland between February 24 and December 24, 2020, and not identified as either B.1.1.7 or 501.V2 (see supplementary material for the list of accession numbers).
Wastewater sample collection and preparation. Raw wastewater samples were collected from three Swiss WWTPs: ARA Werdhölzli, Zurich (population connected: 450,000), STEP de Vidy, Lausanne (population connected: 240,000), and the WWTP of an alpine ski resort. Samples were concentrated and viral RNA was extracted using methods similar to those previously reported (Medema et al. 2020). In brief, 24-hour composite samples (Zurich and Lausanne) or grab samples (ski resort) were collected in 500ml polystyrene or polypropylene plastic bottles, shipped on ice, and stored at 4°C for up to 6 days before processing. Aliquots of 50 mL were clarified by filtration (2 µm glass fiber filter (Millipore) followed by a 0.22 µm filter (Millipore), Zurich samples), or by centrifugation (4,863 xg for 30 minutes, Lausanne and ski resort samples). Clarified samples were then concentrated using centrifugal filter units (Centricon Plus-70 Ultrafilter, 10kDa, Millipore, USA) by centrifugation at 3,000xg for 30 minutes. The resulting concentrate (up to 280 µL) was extracted in its entirety using the QiaAmp Viral RNA MiniKit (Qiagen, USA) according to the manufacturer’s instructions, adapted to the larger volumes, and using 80 µL eluent volume. RNA extracts were stored at −80°C before sequencing.
Sequencing. Frozen cDNA transcripts of RNA extracts from wastewater samples were analyzed using NGS. Amplicons were produced and libraries prepared according to the COVID-19 ARTIC v3 protocol (R&D 2020) and sequenced using the Illumina NovaSeq 6000 platform, resulting in paired-ends reads of length 250 bp each.
Mutation calling. Mutation calls were produced using V-pipe (Posada-Céspedes et al. 2020), a bioinformatics pipeline for end-to-end analysis of viral sequencing reads obtained from mixed samples. Low-frequency mutation calling was based on local haplotype reconstruction with ShoRAH (Zagordi et al. 2010a, 2010b, 2011; McElroy et al. 2013; Posada-Céspedes et al. 2020). Custom scripts were used for detecting nearby co-occurring mutations in the multiple read alignments.
Mutation reporting and testing. Samples from the same location were grouped by sampling date into those collected before October 23, 2020, or afterwards. The cut-off date was chosen to ensure that the earlier samples were (likely) taken before the emergence of the B.1.1.7 and 501.V2 variants. Fisher’s exact test was applied to test the null hypothesis that the total number of signature mutation observations is independent of sampling time. Contingency tables were obtained by counting in each group the number of times the signature mutations were observed to be present versus absent in the two groups. Missing observations were omitted from the test. The analysis was performed separately for the B.1.1.7 and 501.V2 signature mutations.
Results
The B.1.1.7 lineage is defined by 17 signature events (called mutations for simplicity), including 13 point mutations, 1 three-nucleotide change, and 3 deletions of length 3, 6, and 9 nucleotides each. The 501.V2 lineage is defined by 12 signature mutations, all of which are single point mutations (Table 1). The two lineages share one mutation, namely A23063T giving rise to the amino acid change N501Y in the spike protein. In order to interpret the observation of any of these signature mutations in wastewater samples, we first characterize the mutational background distribution in Switzerland by analyzing patient samples not identified as B.1.1.7 nor 501.V2. Next, we identify signature mutations in wastewater samples, first individually and then, if possible, in combination.
Signature mutations in patient samples
We first assessed the prevalence of signature mutations defining the lineages B.1.1.7 and 501.V2 in all 5192 non-B.1.1.7 and non-501.V2 consensus sequences from GISAID obtained from clinical samples collected in Switzerland before December 24 (Table 1). Among the 17 B.1.1.7 signature mutations, seven occurred in patient samples, in total 121 times (2.33%), each in 2 - 90 samples (0.04 - 1.73%). Only two samples displayed co-occurrence of two mutations, namely (C23604A, C23709T), and no other co-occurrence of mutations was displayed. For 501.V2, 9 out of the 12 signature mutations occurred in patient samples, most of them in only 1 - 8 sequences (0.02 - 0.15%), but A10323G occurred 28 times (0.54%), C28887T 37 times (0.52%), C1059T 203 times (3.91%), and G25563T 1174 times (22.61%). Some of these mutations co-occurred in the consensus sequences as pairs or triplets, namely (C1059T, G25563T, C28887T), (C1059T, A10323G, G25563T) and (C23664T, G25563T) each co-occurred once, (G25563T, C28887T) seven times, (A10323G, G25563T) 9 times, and (C1059T, G25563T) 199 times. Overall, in non-B.1.1.7 and non-501.V2 patient samples from Switzerland, several of the signature mutations had not been detected at all before December 24 or only in very few samples and in isolation.
Individual signature mutations in wastewater samples
In order to compare the patient-level findings to the mutations detectable in wastewater samples at the community level, we collected a total of 48 wastewater samples between July 9 and December 21, 2020, from three different WWTPs in Switzerland. In Lausanne, a total of 15 samples were taken between November 5 and December 21, in Zurich, 32 samples were collected between July 9 and December 21, and in the ski resort, one sample was taken on December 21. From all samples, viral RNA was extracted, DNA libraries prepared, and subjected to NGS using 2 x 250bp paired-end reads. After quality control, the median number of reads per sample was 1,269,169. The subsequent read mapping step aligned a median of 1,098,352 reads per sample, which resulted in a median per-sample median coverage of 2634.5 reads per position (range 0 - 28,958). The overall deep coverage allowed for calling low-frequency mutations in most samples down to a frequency of about 1/3000 on average.
We first analyzed the wastewater samples collected in Zurich (Figure 1). The 32 samples from July to late December cover a period before the first documented occurrence of the strains and a period afterwards when they might have migrated into Swiss communities. Combining all 32 samples, we find six of the 17 B.1.1.7 signature mutations, with a total of 19 occurrences: a 6bp spike deletion in seven samples, a 9bp Orf1ab deletion in seven samples, a 3bp N gene substitution in two samples, a 3bp spike deletion in one sample, an Orf1ab point mutation in one sample, and a spike point mutation in one sample. When separating the occurrence of mutations before and after October 23 (assuming that the new lineages were not present in Switzerland before that date), we observe three signature mutations each occurring in one of the 12 samples before October 23 versus 16 occurrences of five different mutations in 20 samples after October 23. Association between sampling time and number of signature mutations was assessed by comparing the number of times we found evidence for and against signature mutations between the early and the late group excluding missing data due to low coverage, and did not reach statistical significance (p = 0.093, Fisher’s exact test). For the 12 501.V2 signature mutations, we find six occurring 37 times in total (in 21, 7, 4, 2, 1, and 1 samples), 10 before and 27 after October 23, and a significant association of their occurrence before versus after October 23 (p = 0.036). The most prevalent mutation found in every sample after October 23 is the Orf3a mutation G25563T, which is highly prevalent in the Swiss population at 22.6 % according to GISAID (Table 1). Stratification of the GISAID data by sampling time, shows that the prevalence of this mutation increased from 12.6% before October 23 to 34.6% after October 23 in the population, which matches the trend observed in the wastewater data. Hence, the increase in frequency in wastewater samples is likely not related to occurrence of the 501.V2 variant, because among patient samples, few other signature mutations were observed in combination with G25563T, most frequently the pair (C1059T, G25563T) in only 199 out of 5192 (3.8%) consensus sequences.
As compared to the data from Zurich, the time series data from Lausanne starts later, with one sample from November 5 and 14 daily samples between December 8 and 21 (Figure 2). Overall, we find a larger number of signature mutations in the Lausanne samples compared to the Zurich samples. We observe twelve of the 17 B.1.1.7 signature mutations with a total of 28 observations across the 15 samples. The four B.1.1.7 mutations observed in Zurich are among them. For 501.V2, four out of twelve signature mutations occur, with the Orf3a mutation G25563T being present in all but one sample from Lausanne. For comparison, the same point mutation was found in Zurich in 2 out of 12 samples before and in all 19 samples after October 23 which had coverage at this position.
Finally, the sample from the ski resort is striking: here, we detected 10 out of the 17 B.1.1.7 mutations and one of the 12 501.V2 mutations (Figure 3). Given the patient background frequencies of individual mutations (Table 1), many of which are zero for the B.1.1.7 variant, and the very few combinations of at most three of them that were observed previously, it seems likely that the B.1.1.7 strain, which was first detected in patient samples from Geneva of December 22, was present in the December 21 wastewater sample.
In summary, the comparative analysis of individual signature mutations in both patient samples and in community wastewater samples showed their consistency and provided some indication for the presence of B.1.1.7 in a sample from the ski resort of December 21.
Co-occurrence of signature mutations in wastewater samples
Some of the signature mutations analyzed above are located in close proximity to each other on the SARS-CoV-2 genome. In this case, in addition to detecting individual mutations independently, co-occurring mutations can be observed directly on the same sequencing read or read pair coming from the same amplicon produced by the ARTIC v3 protocol. Specifically, we analyzed two regions on amplicons 92 and 93 which contain three and two B.1.1.7-defining mutations, respectively, and two 501.V2-specific mutations on amplicon 76 (Table 2). Co-occurrence of mutations provides higher confidence in the presence of the respective strain, as both independent biological generation and technical artifacts are much less likely to produce such mutational patterns. To validate this approach, we first searched for the patterns in two B.1.1.7-positive and two 501.V2-positive control patient samples and indeed detected co-occurrence of the respective B.1.1.7 and 501.V2 mutations (Table 2, last four rows). In addition, all patient and wastewater samples that we analyzed displayed, as expected, on amplicon 77 the A23403G signature mutation of the B.1 lineage, which is the most prevalent lineage (>99%) in Switzerland at the time of sampling (Alm et al. 2020; Stange et al. 2020).
In the wastewater sample from the ski resort of December 21, which, as reported above, shows a high prevalence of individual B.1.1.7 mutations, we found co-occurrence of the three B.1.1.7 signature mutations C27972T, G28048T, and A28111G on amplicon 92 in 514 out of 3689 reads (13.93%), but no reads from amplicon 93 with the two B.1.1.7 mutations A28111G and GAT28280CTA. Nevertheless, the large fraction of triple-mutated amplicon-92 reads indicates the presence of the B.1.1.7 lineage, consistent with reports of British tourists staying in the geographic region around the ski resort covered by the WWTP. We analyzed an additional four samples from Lausanne of December 9, 11, 14, and 21 that displayed some of the individual mutations. We found co-occurrences of the following B.1.1.7 signature mutations: two samples carried the signature combination (C27972T, G28048T, A28111G) on 1.09% and 1.14% of reads from amplicon 92, and two samples displayed the signature pair (A28111G, GAT28280CTA) on 2.28% and 2.74% of reads from amplicon 93.
To put the wastewater co-occurrence observations into the context of the background mutation distribution, in the GISAID patient consensus sequences (Table 1), we did not observe any co-occurrence of these signature mutations neither for the B.1.1.7 nor 501.V2 lineages. However, among the 5192 consensus sequences, mutation C27972T alone occurred in a total of 7 (0.13%) samples, 2 of them from the canton Zurich (1: 2020-08-20, 1: 2020-08-21), 4 from Berne (1: 2020-11-23, 1: 2020-12-06, 2: 2020-12-14) and one from Thurgau (2020-11-30), but never in co-occurrence with G28048T or A28111G. The signature mutation G23012A from the 501.V2 lineage only occurred twice in GISAID patient samples, namely in samples from Basel-Land (1: 2020-03-21) and Vaud (1: 2020-09-30). Mutation A23063T was never observed in any of the 5192 Swiss patients. Hence, the co-occurrences found in the wastewater samples (Table 2) cannot be explained by the background mutation distribution. They may have either been generated recently de novo or be the result of incomplete sampling of the genomes of B.1.1.7 viruses in the wastewater samples. Given that 5 out of 61 samples (all sampled in mid or late December, at the suspected time of B.1.1.7 occurrence) display these co-occurrence patterns and the high prevalence of individual B.1.1.7 signature mutations in the December-21 ski resort sample, it seems more likely that the B.1.1.7 variant was present in the four samples from Lausanne and the one sample from the ski resort.
Conclusion
The preliminary analysis of deep sequencing data from 48 community wastewater samples collected between July and December 2020 in Switzerland has identified several of the signature mutations that define the B.1.1.7 and 501.V2 lineage in several of the samples. In five samples, one dating as early as December 9, we have found up to three co-occurring B.1.1.7 signature mutations. These co-occurrences have not been observed in any clinical samples and hence any known circulating strains until December 24. This observation may be explained either by an unobserved new strain in Switzerland which is not part of B.1.1.7, or by B.1.1.7 having circulated already in the first part of December in Switzerland. Given that B.1.1.7 was identified first in Switzerland in a sample dated December 22 and none of the non-B.1.1.7 clinical samples carry the co-occurring mutations, the latter explanation appears more likely suggesting that the B.1.1.7 lineage was present in Switzerland already in early December.
We have also observed an increase in the number of individual signature mutations in wastewater samples after October in Zurich, consistent with the same trend in clinical samples. When analyzing mutations independently, only the December-21 ski resort finding with 10 out of 17 B.1.1.7 signature mutations provided some indication for the presence of this strain. By contrast, the co-occurrence analysis appears to be more powerful in detecting the new strains, owing to the genomic location of some of the signature mutations on the same amplicon.
Additional work is required to assess the sensitivity and specificity of both approaches in detecting new genomic variants. Although NGS data from mixed samples can only provide indirect evidence for the presence of a lineage and not a definite proof, wastewater-based SARS-CoV-2 genomics may support epidemiology by providing timely, inexpensive, non-invasive, unbiased community-level samples. Close-mesh time series data are particularly informative, not only for detailed monitoring of circulating lineages, but also for increased accuracy in detecting genomic alterations.
Data Availability
Data will be made available upon publication.
Data Availability
Requests can be forwarded to Maurizio Bartolucci maurizio.bartolucci{at}uslcentro.toscana.it or Matteo Benelli matteo.benelli{at}uslcentro.toscana.it
Author Contributions
Conceptualization: NB, TK, TRJ, CO, TS, XFC, IT
Data curation: IT, CC, KPJ
Formal Analysis: KJ, LF, DD
Funding acquisition: NB, TK, TRJ, CO, TS
Investigation: XFC, AK, PG, ES, CB, CA
Methodology: NB, XFC, KJ, DD, IT
Project administration: NB, TK, TRJ, CO, TS
Software: IT, KPJ, LF, DD, KJ
Supervision: NB, TK, TRJ, CO, TS, IT
Validation: NB, IT, KJ, KPJ Visualization: KJ
Writing – original draft: NB, IT, KJ, DD
Writing – review & editing: NB, TK, TRJ, CO, TS, KJ, KPJ, LF, XFC, DD, CA
Funding
This work was supported by the SIB Swiss Institute of Bioinformatics as a Competitive Resource, the Swiss National Science Foundation (project 31CA30_196538), Eawag discretionary funds and an EPFL COVID19 grant. XFC was a fellow of the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska–Curie Grant Agreement No. 754462.
Acknowledgements
The supplementary material contains the detailed acknowledgements of the originating and submitting laboratories of the GISAID data.