Abstract
Deep sequencing of wastewater to detect SARS-Cov-2 has been used during the COVID-19 pandemic to monitor viral variants as they appear and circulate in communities. SARS-CoV-2 lineages of an unknown source that have not been detected in clinical samples, referred to as cryptic lineages, are sometimes repeatedly detected in specific locations. We have continued to detect one such lineage previously seen in a Missouri site. This cryptic lineage has continued to evolve, indicating continued selective pressure similar to that observed in Omicron lineages.
Author Summary Monitoring sewage for SARS-CoV-2 has been an important part of understanding the dynamics of the virus’s spread and persistence within and across communities during the pandemic. We and others have also observed variants appearing in wastewater that do not appear in clinical sampling. Many of these variants not only possess genomic changes identical to or at the same position as those that have been observed in variants of concern, particularly currently circulating Omicron variants, but often acquire the changes before they have been observed in clinical samples. We report here the continued observation of a variant in Missouri wastewater, but not in clinical sampling, that has continued to evolve, gaining genomic changes that often are the same and predate changes seen in clinical samples. These observation add to our understanding of the selective pressures driving the evolution of SAR-CoV-2.
Introduction
Surveillance of wastewater for SARS-CoV-2 has been used to detect and track community circulating variants1,2. In addition to the variants of concern (VOCs) and other common variants, wastewater surveillance has also detected variants that have not been otherwise observed3,4.
These novel variants, which we call cryptic lineages, often persist in a sewershed for months or years and show signs of continued positive selection. The specific sources of cryptic lineages are unknown, though recent efforts have provided evidence of a human source5. Sequencing of immunocompromised individuals with persistent infection has also detected sequences with some similarities to the cryptic lineages6. However, a non-human source for some of the cryptic lineages observed in wastewater can not be ruled out. We have previously reported on a cryptic lineage found in a Missouri metropolitan area (MO45) in June of 20214. Since the initial observation of this cryptic lineage, it has been sporadically detected with evolving genotypes.
Results and Discussion
We use next generation sequencing of SARS-CoV-2’s RBD to monitor variants present in Missouri wastewater. Monitoring of MO45 began in March 2021 and continues to the present with roughly weekly sampling (Fig 1). Initially this sewershed was observed to primarily have the Alpha variant with some ancestral sequences. Beta, Gamma, Delta and Mu/Theta sequences were all observed later with Delta becoming the only variant detected by August 2021. Delta was then rapidly replaced by Omicron in December 2021. Since, various Omicron variants have circulated, generally with newer variants displacing older ones, resulting in a mixture of variants co-circulating in late 2022.
In addition to the defined variants, a cryptic lineage has also been sporadically detected, first in June 2021 and last in October 2022 (Fig 2). Initial sequences of this variant had K417T T478K E484A Q493K S494P Q498H amino acid changes relative to the ancestral sequence, with E484A and Q493K only appearing in one of the two first detections. E484A and Q493K were both observed in all subsequent sequences of this cryptic lineage, while S494P was not observed again. K417T and T478K had previously been observed in the Gamma and Delta variants respectively, but the other mutations had not yet appeared in any major VOCs.
Several amino acid changes occurred subsequent to the initial observation and appeared to become fixed in the lineage. On February 2, 2022 N460K was first observed in the cryptic lineage and was thereafter fixed. Likewise S477N and F486V were first observed in the cryptic lineage on April 5, 2022, and N440K on April 26, 2022, and in all detections since. N450D was first observed in the cryptic lineage on May 24, 2022. Though the lineage had N450Y on June 16, 2022 instead, the two subsequent detections of the lineage had N450D again. Several other changes were observed in the lineage over time, though none could be concluded to have become fixed. Of note, T547I and T572I both were observed in 3 samples each, but were not observed in the most recent detection. At the last detection of the MO45 cryptic lineage in October 2022, the lineage appeared remarkably similar to an Omicron lineage with 12 amino acid changes in its RBD that were all identical to, or at the same position as, changes found in Omicron lineages.
Most of the residue changes observed in the cryptic lineage predate the changes observed in Omicron. The initial detections of the cryptic lineage, months before the emergence of Omicron, already had two changes that were to be seen ubiquitously in Omicron lineages, T478K and E484A, and three changes at the same residues as changes common in Omicron lineages, K417T, Q493K and Q498H. Likewise, N460K, which appeared in the cryptic lineage in February of 2022, did not become prevalent in an Omicron background until six months later.
The convergence of the cryptic lineage and Omicron variants suggest similar selection pressures. The origin of Omicron and the origin of the MO45 cryptic lineage are unknown. At least in some cases, cryptic lineages appear to be derived from individuals with chronic SARS-CoV-2 infections. However, as the MO45 cryptic lineage hasn’t been traced, a non-human source can not be ruled out. Since the cryptic lineage in some cases acquired changes prior to Omicron, continued monitoring of waste water for such cryptic lineages may provide foreknowledge of changes, or at least the position of changes, likely to be selected for in the circulating Omicron variants.
Materials & Methods
Sample collection and RNA extraction
Collection and processing of samples were as previously described. Twenty-four-hour composite samples were collected at the MO45 wastewater treatment facility and maintained at 4 °C until they were delivered to the analysis lab, generally within 24 h of collection. Samples were then centrifuged at 3000× g for 10 min and followed by filtration through a 0.22 µM polyethersolfone membrane (Millipore, Burlington, MA, USA). Approximately 37.5 mL of wastewater was mixed with 12.5 mL solution containing 50% (w/vol) polyethylene glycol 8000 and 1.2 M NaCl, mixed, and incubated at 4 °C for at least 1 h. Samples were then centrifuged at 12,000× g for 2 h at 4 °C. Supernatant was decanted and RNA was extracted from the remaining pellet (usually not visible) with the QIAamp Viral RNA Mini Kit (Qiagen, Germantown, MD, USA) using the manufacturer’s instructions. RNA was extracted in a final volume of 60 µL. MiSeq Similar to our previous protocol, the primary RBD RT-PCR was performed using the Superscript IV One-Step RT-PCR System (Thermo Fisher Scientific,12594100). Primary RT-PCR amplification was performed as follows: 25°C (2:00) + 50°C (20:00) + 95°C (2:00) + [95°C (0:15) + 55°C (0:30) + 72°C (1:00)] × 25 cycles using the MiSeq primary PCR primers CTGCTTTACTAATGTCTATGCAGATTC and NCCTGATAAAGAACAGCAACCT. Secondary PCR (25 μL) was performed on RBD amplifications using 5 μL of the primary PCR as template with MiSeq nested gene specific primers containing 5′ adapter sequences (0.5 μM each) acactctttccctacacgacgctcttccgatctGTRATGAAGTCAGMCAAATYGC and gtgactggagttcagacgtgtgctcttccgatctATGTCAAGAATCTCAAGTGTCTG, dNTPs (100 μM each) (New England Biolabs, N0447L) and Q5 DNA polymerase (New England Biolabs, M0541S).
Secondary PCR amplification was performed as follows: 95°C (2:00) + [95°C (0:15) + 55°C (0:30) + 72°C (1:00)] × 20 cycles.
For Omicron exclusion amplification, the primary RBD RT-PCR was performed using the MiSeq primary PCR primers ATTCTGTCCTATATAATTCCGCAT and CCCTGATAAAGAACAGCAACCT (the first primer was changed to TATATAATTCCGCATCATTTTCCAC starting in May, 2022 to adapt to changing Omicron lineages) and secondary PCR used MiSeq nested gene specific primers containing 5′ adapter sequences (0.5 μM each) acactctttccctacacgacgctcttccgatctGTGATGAAGTCAGACAAATCGC and gtgactggagttcagacgtgtgctcttccgatctATGTCAAGAATCTCAAGTGTCTG.
A tertiary PCR (50 μL) was performed to add adapter sequences required for Illumina cluster generation with forward and reverse primers (0.2 μM each), dNTPs (200 μM each) (New England Biolabs, N0447L) and Phusion High-Fidelity or (KAPA HiFi for CA samples) DNA Polymerase (1U) (New England Biolabs, M0530L). PCR amplification was performed as follows: 98°C (3:00) + [98°C (0:15) + 50°C (0:30) + 72°C (0:30)] × 7 cycles +72°C (7:00). Amplified product (10 μl) from each PCR reaction is combined and thoroughly mixed to make a single pool. Pooled amplicons were purified by addition of Axygen AxyPrep MagPCR Clean-up beads (Axygen, MAG-PCR-CL-50) or in a 1.0 ratio to purify final amplicons. The final amplicon library pool was evaluated using the Agilent Fragment Analyzer automated electrophoresis system, quantified using the Qubit HS dsDNA assay (Invitrogen), and diluted according to Illumina’s standard protocol. The Illumina MiSeq instrument was used to generate paired-end 300 base pair reads. Adapter sequences were trimmed from output sequences using Cutadapt.
Data Availability
All data produced are available online on NCBI SRA.
Computational analysis
Sequencing reads were processed similar to previously described. Briefly, BBTools (Bushnell B. – http://sourceforge.net/projects/bbmap/) were used to merge paired reads, which were dereplicated with a custom script (https://github.com/degregory/Programs/blob/main/derep.py). Dereplicated sequences from RBD amplicons were mapped to the reference sequence of SARS-CoV-2 (NC_045512.2) spike ORF using Minimap2. Mapped amplicon sequences were then processed with SAM Refiner using the same spike sequence as a reference and the command line parameters “--Alpha 1.6 --foldab 0.6”.
For Fig 1, SAM Refiner covariant deconvolution outputs were matched to defined variants to determine the relative abundance for each sample using a custom script (modified from https://github.com/istaves/covid-variant-counter). For Fig 2, the same outputs of SAM Refiner for MiSeq sequences were collected and were processed to determine core haplotypes of the cryptic lineage. First sequences that contained fewer than 4 polymorphisms relative to the reference Wuhan I sequence or matched officially named variants were discarded. Remaining sequences were then processed to remove polymorphisms that never appeared in a sample at an abundance greater than .5%. In-frame deletions bypassed this removal. Condensed sequences that appear in at least two samples or had a summed abundance of at least 2% across all samples were passed on to further steps. All were sequences rendered into the figures using plotnine.
Author Contributions
DAG, MCJ, MR, CL, ES, and JW conceptualized and designed the study. DAG and MCJ performed formal analysis. JW, ES, MR, CR, CL, NN, CD, DAG and MCJ conducted the investigation. CR performed the sequencing. DAG wrote all the code. MCJ, CL, MR, and JW acquired funding. MCJ, DAG, JW, and MR writing and review.
Acknowledgements
We would like to acknowledge the University of Missouri Bioinformatics and Analytics Core for their services with MiSeq sequencing.