Abstract
Wastewater-based genomic surveillance of the SARS-CoV-2 virus shows promise to complement genomic epidemiology efforts. Multiplex tiled PCR is a desirable approach for targeted genome sequencing of SARS-CoV-2 in wastewater due to its low cost and rapid turnaround time. However, it is not clear how different multiplex tiled PCR primer schemes or wastewater sample matrices impact the resulting SARS-CoV-2 genome coverage. The objective of this work was to assess the performance of three different multiplex primer schemes, consisting of 150bp, 400bp, and 1200bp amplicons, as well as two wastewater sample matrices, influent wastewater and primary sludge, for targeted genome sequencing of SARS-CoV-2. Wastewater samples were collected weekly from five municipal wastewater treatment plants (WWTPs) in the Metro Vancouver region of British Columbia, Canada during a period of increased COVID-19 case counts from February to April, 2021. RNA extracted from clarified influent wastewater provided significantly higher genome coverage (breadth and median depth) than primary sludge samples across all primer schemes. Shorter amplicons appeared more resilient to sample RNA degradation, but were hindered by greater primer pool complexity in the 150bp scheme. The identified optimal primer scheme (400bp) and sample matrix (influent) was capable of detecting the emergence of mutations associated with genomic variants of concern, of which the daily wastewater load significantly correlated with clinical case counts. Taken together, these results provide guidance on best practices for implementing wastewater-based genomic surveillance, and demonstrate its ability to inform epidemiology efforts by detecting genomic variants of concern circulating within a geographic region.
Importance Monitoring the genomic characteristics of the SARS-CoV-2 virus circulating in a population can shed important insights into epidemiological aspects of the COVID-19 outbreak. Sequencing every clinical patient sample in a highly populous area is a difficult feat, and thus sequencing SARS-CoV-2 RNA in municipal wastewater offers great promise to augment genomic surveillance by characterizing a pooled population sample matrix, particularly during an escalating outbreak. Here, we assess different approaches and sample matrices for rapid targeted genome sequencing of SARS-CoV-2 in municipal wastewater. We demonstrate that the optimal approach is capable of detecting the emergence of SARS-CoV-2 genomic variants of concern, with strong correlations to clinical case data in the province of British Columbia. These results provide guidance on best practices on, as well as further support for, the application of wastewater genomic surveillance as a tool to augment current genomic epidemiology efforts.
Observation
Genomic surveillance of the SARS-CoV-2 virus plays a critical role in tracking its evolution during the current global COVID-19 pandemic (1–3). Recently, several emerging lineages of SARS-CoV-2, so-called variants of concern (VoCs), have been associated with increased levels of transmission (4), disease severity (5), and/or immune escape (6, 7). These VoCs have originated from various locations globally (4, 8), but are spreading within new geographic regions due to travel-associated and local transmission (9). Providing rapid detection of VoC infections within a population could thus help to inform effective public health outbreak mitigation strategies.
As the SARS-CoV-2 virus is shed in feces during infection (10), viral genome fragments can be detected in municipal wastewater, and have been associated with clinical case numbers within contributing regions (11–14). Previous work has demonstrated the potential to sequence SARS-CoV-2 fragments in municipal wastewater and detect single nucleotide variants (SNVs) that correspond to clinical cases in the contributing sewershed (15–17). As SARS-CoV-2 titers in wastewater are relatively low (11, 13), an enrichment step is typically needed prior to sequencing to improve sensitivity (15). The two main approaches for enriching SARS-CoV-2 RNA in wastewater include oligonucleotide based capture (15), and multiplex tiled PCR based targeted amplification (16, 17). The latter approach is promising for wastewater-based viral genomic surveillance due to its lower reagent cost and potential to be deployed rapidly and in remote locations (18). An important consideration for applying multiplex tiled PCR is the average amplicon length, as this can impact assay sensitivity in the case of RNA degradation (19). This could be particularly important for its application to wastewater based epidemiology, as SARS-CoV-2 particles and free RNA can undergo variable levels of degradation (20, 21), and may vary based on the type of wastewater sample matrix (e.g. influent versus primary sludge) (22). We therefore hypothesized that there may be an optimal tiled PCR amplicon size and wastewater sample matrix type that enables adequate genome coverage of SARS-CoV-2 for the identification of genomic VoCs.
Wastewater sample matrix and multiplex tiled PCR amplicon length impact SARS-CoV-2 genome coverage
We sequenced a total of 96 wastewater samples collected between February 7th to April 18th 2021 across five municipal WWTPs in Vancouver, Canada using three different primer schemes for multiplex tiled PCR of SARS-CoV-2: Swift Bioscience’s 150bp amplicon scheme (n = 10 total, 3 sludge and 7 influent), ARTIC 400bp amplicon scheme (23) (n = 62 total, 8 sludge and 54 influent), and Freed/midnight 1200bp amplicon scheme (24) (n = 24 total, 4 sludge and 20 influent) (detailed methods in Text S1). Sludge samples failed to produce libraries with over 32% breadth of genome coverage across all primer schemes and sample cycle thresholds (Ct’s) (Figure 1a-c). Conversely, influent wastewater samples produced libraries that had significantly higher breadth of coverage across all primer schemes (p<0.01, Tukey Test; Figure 1). One possible explanation for this finding could be that the sludge matrix was inhibitory to RT-PCR (11); however, no inhibition of RT-qPCR on sludge RNA extracts was detected using internal controls (Text S1, Table S2). Another potential reason for the lower genome coverage in sludge is that SARS-CoV-2 was more nonintact or its RNA more degraded with the direct sludge extraction compared to ultrafiltration of influent wastewater, as has been previously hypothesized (22). A third potential cause of discrepancies in genome coverage between sludge and influent wastewater samples could be higher off-target amplification in sludge extracts. Correspondingly, the sample type significantly impacted read mapping rates for all schemes after accounting for Ct values (p<0.01, two-way ANCOVA), with mean mapping rates of sludge samples being over 100-times lower than that of influent samples (0.01% vs. 11.3%, respectively; Table S1). Therefore, ultrafiltration of influent wastewater provided more suitable RNA extracts for multiplexed tiled PCR of SARS-CoV-2 than did direct extraction from wastewater sludge, likely due to a combination of greater SARS-CoV-2 RNA degradation and off-target amplification in sludge.
If the level of RNA degradation within a wastewater sample impacts the resulting SARS-CoV-2 genome coverage, we would expect to see less of a drop-off in coverage at high Ct’s for schemes with shorter amplicons. Indeed, we detected a significant effect of amplicon length on the breadth of genome coverage as a function of Ct (p<0.01, two-way ANCOVA). The median genome coverage with the 150bp amplicon scheme spanned one order of magnitude within influent wastewater samples with Ct values ranging from 31 to 37 (Figure 1d), while that from the 400bp and 1200bp schemes spanned 3.2 and 3.0 orders of magnitude, respectively (Figure 1e and 1f). Improvements with the 400bp versus the 1200bp scheme were marginal, yet 83% of paired influent samples with Ct values over 32.5 (10 of 12) showed higher breadth of coverage with the 400bp scheme (Figure S1). Thus, shorter amplicon schemes may be more robust to sample RNA degradation at higher Ct values. However, there was a tradeoff between amplicon length and genome coverage, as the magnitudes of the median genome coverage and breadth of coverage obtained with the 150bp scheme and influent samples were significantly lower than that of the 400bp scheme (p=0.022, 5.0e-9, respectively, Tukey Test). The lower breadth of coverage with the 150bp scheme could have been caused by more primer-primer interactions with a larger number of primers (19). Therefore, the 400bp primer scheme appears to strike a balance between resilience to sample RNA degradation and mitigating issues around primer pool complexity and multiplex amplicon balancing.
SARS-CoV-2 whole genome sequencing from wastewater captures emergence of genomic variants in a geographic region
The sequence data produced via the 400bp primer scheme and influent wastewater samples was used to measure the frequency of VoC-associated SNVs (Table S3) across the five WWTPs over the study period. SNVs associated with the VoC lineages, B.1.1.7 and P.1, both increased to a maximum mean frequency of 60% across all WWTPs, respectively (Figure 2a, Figures S2-S4), while that of B.1.351 did not substantially increase (Figure S5). These findings align with the results clinical screening and sequencing of patient samples over the same period within the province of British Columbia, during which P.1 and B.1.1.7 became the dominant lineages while B.1.351 did not appreciably spread (25) (Figure 2b, Figures S3 and S5). At the time of publishing, VoC frequency data for clinical cases was only available at the provincial level; yet the health service areas corresponding to the 5 WWTP sewersheds accounted for 74% of total cases in the province during the study period (25). The flow-normalized daily loads of P.1 and B.1.1.7 across all WWTPs (in genome copies/day) were strongly correlated with clinical case counts of those lineages within the province for the corresponding epidemiological weeks (log10-log10 transformed, R2 = 0.89 and 0.87, respectively; Fig. 2c and Figure S3). The frequency of VoC-associated SNVs within influent wastewater measured with multiplex tiled PCR is therefore suitable to monitor community transmission of genomic variants within a sewershed. The onset of P.1- and B.1.1.7-associated SNVs within influent wastewater followed different patterns for the five WWTPs, providing additional support that wastewater SARS-CoV-2 sequencing can illuminate localized spread of genomic variants on a regional scale (15, 17). The rapid turnaround time (∼3 days from sampling to data generation here), low capital cost and high portability of nanopore sequencing combined with highly multiplexed tiled PCR for SARS-CoV-2 sequencing of wastewater shows great promise to complement genomic epidemiology efforts during the COVID-19 pandemic by detecting the emergence of VoCs within a pooled population sample.
Data Availability
The raw reads associated with all samples are available in the Short Read Archive under BioProject PRJNA731975. The accession numbers for each sample are also provided in Table S1, along with the sample metadata.
Data Availability
The raw reads associated with all samples are available in the Short Read Archive under BioProject PRJNA731975. The accession numbers for each sample are also provided in Table S1, along with the sample metadata.
Acknowledgements
We would like to thank Farida Bishay, Rob MacArthur, Daisy Espinosa and Alvin Louie, and the entire Metro Vancouver Environmental Management & Quality Control WWTP Laboratory Staff for collecting and delivering wastewater samples for this study and providing sample metadata. We would also like to thank Ziwen Ran for help with method development, and Matthias Krushel for help with processing wastewater samples. We also thank the Molecular and Microbial Genomics and Environmental Microbiology Laboratories at BCCDC Public Health Laboratory for materials and access to testing equipment, and the BCCDC and BC Regional Health Authorities for publicly sharing data on clinical case counts and variants of concern. This work was funded by the Natural Sciences and Engineering Research Council of Canada (NSERC, Alliance Grant, ALLRP 554612-20), BCCDC Foundation, Metro Vancouver, and Innovate BC.