Abstract
Environmental surveillance and clinical diagnostics heavily rely on the polymerase chain reaction (PCR) for target detection. A growing list of microbial threats warrants new PCR-based detection methods that are highly sensitive, specific, and multiplexable. Here, we introduce a PCR-based icosaplex (20-plex) assay for detecting 18 enteropathogen and two antimicrobial resistance genes. This multiplexed PCR assay leverages the self-avoiding molecular recognition system (SAMRS) to avoid primer dimer formation, the artificially expanded genetic information system (AEGIS) for amplification specificity, and next-generation sequencing for amplicon identification. We benchmarked this assay using a low-cost, portable sequencing platform (Oxford Nanopore) on wastewater, soil, and human stool samples. Using parallelized multi-target TaqMan Array Cards (TAC) to benchmark performance of the 20-plex assay, there was 74% agreement on positive calls and 97% agreement on negative calls. Additionally, we show how sequencing information from the 20-plex can be used to further classify allelic variants of genes and distinguish sub-species. The strategy presented offers sensitive, affordable, and robust multiplex detection that can be used to support efforts in wastewater-based epidemiology, environmental monitoring, and human/animal diagnostics.
Introduction
Emerging infectious diseases, coupled with rising antibiotic resistance, are a threat to global public health.1 The diversity of pathogens that can cause illness necessitates pathogen detection methods that can identify multiple genetic targets. There is a need for low-cost, multi-target, detection methods, especially in regions where the burden of infectious diseases is high, resources are constrained, and there is a high diversity in the pathogens that are present.
A high burden of diarrheal illness exists in low- and middle-income countries (LMICs).2 The types of enteric pathogens (bacteria, virus, protozoa, helminths) present in LMICs contributing to disease are geographically diverse and location specific.3–5 As an example of geographic diversity, previous studies used highly parallelized quantitative polymerase chain reaction (qPCR) to survey a wide range of pathogens around the world.6 In Mozambique, Shigella spp. and Giardia spp. were the most prevalent pathogens whereas Campylobacter spp. and Giardia spp. dominated infections in children across eight other settings in South America, sub-Saharan Africa, and Asia.3,5
The need for multi-target detection assays is not limited to LMICs. In high-income countries (HICs), respiratory illnesses are common and diarrheal illnesses predominate through foodborne outbreaks.7,8 Characteristically, urban areas in HICs have sewered sanitation systems, which allow for active monitoring of the infectious disease burden through wastewater-based epidemiology (WBE).9 As an early example, Poliovirus was isolated from wastewater samples as part of the World Health Organization’s Global Polio Eradication Initiative (GPEI) to monitor for emergence/reemergence of the virus.10–12 Recently, SARS-CoV-2 was monitored in wastewater at ≈14,000 sites in 59 countries in March 2021.13 These latest efforts rely on methods such as qPCR and digital PCR (dPCR) to determine the level of infections in a population.14–16 Looking beyond the COVID-19 pandemic, monitoring programs are eager to expand surveillance to include enteric pathogens, other respiratory viruses, sexually transmitted infections, arboviruses, and antimicrobial resistance.17 However, expanding the range of targets for surveillance is both labor-intensive and costly, underscoring the urgent need for new multi-target capabilities that can efficiently and affordably address this challenge.
Complementing WBE efforts, other environmental surveillance strategies are premised on the environment (e.g., soil, water, air, fomites) serving as an intermediary between infected hosts.18 Environmental detection can facilitate surveying disease burden. For example, a study in Kenya found that positive detection of helminths (Ascaris lumbricoides, Trichuris trichiura, and Necator americanus) in a household’s soil was significantly associated with cases of helminth infections of household members.19 As a result, qPCR-based surveillance of helminths in soil is a promising alternative to more invasive stool-based surveillance.
Environmental detection is also used for determining dominant transmission pathways of pathogens. During the COVID-19 pandemic, studies sampled fomites using qPCR for SARS-CoV-2 RNA and concluded that fomites were unlikely to be a dominant transmission pathway.20,21 Similar methods have been proposed to survey the burden of antimicrobial resistance across the globe by sampling soil and water.22–24 To expand the scale and scope of environmental surveillance, highly multiplexed detection assays that can capture the diversity of possible microbial threats of interest are needed. Despite this pressing need, methods for multi-target detection generally fall short in achieving meaningful reductions in assay cost, necessitating a more selective approach in deciding which targets to prioritize in monitoring, surveillance, and detection efforts.
PCR-based multi-target detection strategies are typically limited by the number of targets that can be simultaneously amplified and identified. In conventional multiplexed PCR amplification, increasing the number of targets increases the likelihood of off-target reactions (e.g., primer dimer formation, non-specific amplification product). These off-target reactions result in reduced sensitivity or assay failure.25 Further, highly multiplexed PCR assays tend to have a narrow tolerance for changing reaction conditions and sample composition. For example, adding primers for new targets to an established multiplexed assay can result in primer cascading failure. Performing an assay in highly heterogeneous sample matrices, such as environmental DNA extracts, can also reduce PCR efficiency and result in false negatives. Additionally, multiplexed fluorescence-based detection assays, such as qPCR, are typically constrained (up to 5 targets) by the limited orthogonality of fluorescent reporter dye spectra.26
Other detection technologies have approached the ‘many-target’ problem by scaling down reaction volumes and parallelizing reactions in microfluidic devices.27 One widely used commercial platform, the TaqManTM Array Card (TAC), parallelizes qPCR reactions into micro-scale (≈1.5 µL) reactions.28 Even with this compartmentalization, qPCR and dPCR-based platforms still encounter significant challenges in scaling and accessibility due to the high capital costs of equipment, high variable costs for consumables, and the substantial cost and personnel time required for adding additional targets. To achieve highly multiplexed detection, beyond what is accessible with qPCR and dPCR, alternative solutions are needed.
Research in synthetic biology has produced various non-standard nucleotides that can be used to circumvent major obstacles associated with scaling multiplexed PCR amplification to larger ‘n’-plex reactions (Figure 1). The non-standard nucleic acids from the Self-Avoiding Molecular Recognition Systems (SAMRS: A*, T*, G*, and C*) are structurally modified versions of the standard DNA nucleobases (A, T, G, and C).29 Though structurally distinct, SAMRS nucleobases maintain an ability to base pair with standard DNA nucleobases (Figure 1c) but not with their SAMRS complement. For nucleic acid amplification, primers modified with SAMRS components anneal and amplify natural DNA/RNA. Conversely, formation of SAMRS:SAMRS pairs (T*:A* and C*:G*, Figure 1d) are thermodynamically disfavored. In amplification assays with SAMRS-containing primers, this results in a decrease in off-target primer-primer interactions. Selectively inserting SAMRS bases into primer sequences has been shown to reduce primer dimer formation and increase multiplexed assay sensitivity.30,31
Additionally, non-standard nucleic acids from the Artificially Expanded Genetic Information System (AEGIS) can be used to improve primer binding specificity.32 AEGIS nucleotides (Z:P pair, Figure 1b), such as Z (6-amino-5-nitro-3-(1′-β-d-2′-deoxyribofuranosyl)-2(1H)pyridine) and P (2-amino-8-(1′-β-d-2′-deoxyribofuranosyl), can form highly-specific base pairs orthogonal to the standard, natural set (T:A and C:G).33,34 Since AEGIS bases are not found in nature, primers containing AEGIS nucleotides can be used to amplify template targets containing complementary AEGIS sequences while avoiding off-target amplification.35
Various detection assays have been developed that leverage properties of SAMRS and AEGIS bases for viral pathogen detection, including arboviruses (e.g., Zika, Dengue, Chikungyunya),36,37 coronaviruses (e.g., RSV, MERS-CoV, Influenza A/B, SARS-CoV-2),38,39 human papillomavirus (HPV),40 and norovirus.41 Despite proving utility of SAMRS and AEGIS nucleobases in assay design, these assays were either non-multiplexed or multiplexed but required an expensive strategy for target readout (XMAP Luminex array detection).
In this work, we develop an ‘icosaplex’ (20-plex) PCR-based sequencing assay able to detect 20 enteric pathogen and antimicrobial resistance gene targets. This 20-plex assay greatly expands on prior work that targeted viruses to new pathogens (bacteria, protozoa, and helminths) and to environmental sample types. The 20-plex assay amalgamates individual primer sets for 20 targets from a previous collection of work and achieves effective multiplexing by incorporating SAMRS-AEGIS nucleotides into primers chosen for biological reasons. To circumvent the target identification limitations of PCR, we leveraged nanopore sequencing (Oxford Nanopore Technologies), a third-generation sequencing method that is inexpensive, portable, and can provide sequencing results in real time. This study is the first to use SAMRS-AEGIS primers for highly multiplexed PCR in combination with nanopore sequencing for microbial surveillance applications. Sequencing information provides additional insight into gene alleles and subspecies that would otherwise be missed through presence/absence methods. The target panel and detection method were chosen for application areas in environmental detection and surveillance efforts in resource constrained settings, such as LMICs. We benchmarked performance of the 20-plex assay in three sample matrices: wastewater, soil, and human feces.
Results and Discussion
Pathogen and Antimicrobial Resistance Gene Target Selection
We developed a multiplexable assay that can detect a broad range of microbial threats relevant to global health. We chose 20 genes of interest that encompass a wide range of enteropathogens (bacteria, protozoa, and one helminth) and two clinically important antimicrobial resistance genes (ARGs) (Table 1). QPCR assays for these 20 targets have previously been reported (Table S1).
Twelve of our targets are genes specific to pathogenic E. coli. These represent five of the major pathogenic E. coli subtypes and Shigella spp. In LMICs, pathogenic E. coli is a leading cause of diarrheal illness.5,42,43 The enterotoxigenic E. coli (ETEC) subtype is associated with moderate-to-severe diarrhea which can lead to additional severe clinical outcomes.43 LMICs also experience a high incidence of soil-transmitted helminth infections.44 It is estimated that 738 million people globally are infected with helminths of the genus Ascaris.45 In HICs like the United States, Campylobacter spp., non-typhoidal Salmonella, Shigella spp., and Giardia intestinalis, are leading causes of reported foodborne illnesses.46 In both LMICs and HICs, pathogenic bacteria pose an even larger threat to human health if they acquire antimicrobial resistance activity. BlaNDM and mcr-1 are globally distributed ARGs that confer resistance to the last line of defense antibiotics reserved for difficult-to-treat infections.47,48 Many of the 20 gene targets chosen in our panel were used by other studies to detect pathogens and ARGs in human feces,3,6 wastewater,49 and environmental samples. 50–52
Design and validation of 20-plex primers
Primer sequences from previously reported PCR and qPCR assays for the 20 gene targets were used as a starting point for 20-plex PCR primer design (Table S1). Initial 40 primer sequences (1 forward, 1 reverse for each gene target) were chosen to accommodate a single annealing temperature (60 °C) during PCR cycling. At these temperatures and at a high relative abundance of primer to target, various cross-primer interactions can occur to form primer dimers and off-target amplicons, each reducing assay sensitivity (Figure 2a). To combat primer dimer formation (Figure 2b), we modified all 40 standard DNA primers with SAMRS nucleobases using the PrimerCompare software developed at the Foundation for Applied Molecular Evolution (FfAME). PrimerCompare took standard DNA primer sequences that have proven targets, primer concentrations, salt concentrations, and thermodynamic parameters (maximum ΔG for hairpins and dimers) as inputs to simulate potential primer-primer interactions. These interactions include self-dimerization, cross-primer dimerization, and hairpin structures.
SAMRS containing primers for the 20 targets (Table 1) were then synthesized and validated in single target PCR amplification reactions (Figure S1). A qPCR melt curve for each primer set, with and without added synthetic DNA template, showed no primer dimer formations in the no template controls (NTC, Figure S2). We then tested the effectiveness of the SAMRS primers compared to standard DNA primers at reducing primer dimers in multiplexed reaction conditions (Table S2-S5). 20-plex PCR was performed with and without synthetic template added (Table S6). By agarose gel electrophoresis, primer dimer products were observed with standard DNA primers, but not SAMRS-containing primers (Figure 2c, Figure S3). While both standard DNA and SAMRS-containing primers showed target amplification in the 20-plex, the identity of the individual amplicons could not be resolved through gel electrophoresis alone.
While incorporation of SAMRS bases helped decrease primer dimers in multiplex PCR reactions, addition of a 5′-overhang tag to primers could be used for downstream barcoding and attachment of sequencing adapters (Table S2). To avoid off-target amplification, 5′-overhang tag sequences should be distinct from sequences that could be present in samples of interest. The generalized design of a 5′-overhang tag, however, is challenging due to the unknown metagenetic composition of many sample matrices (e.g., wastewater, soil, surface water, fomites, feces). We overcame this obstacle by introducing non-standard AEGIS P nucleobases into the 5′-overhang tag. Since AEGIS bases form a highly specific orthogonal base pair to the standard DNA bases (Z:P, Figure 1b, Figure 2d), AEGIS containing primers should solely bind to complementary AEGIS-tagged regions.
Various design choices were made to minimize design complexity and reagents that would be required for performing multiplexed assays that use AEGIS components. First, the AEGIS tag sequences used a 5-letter alphabet composed of the standard DNA bases (A, T, G, C) and one of the AEGIS bases (P). In amplification reactions, end-users would therefore only need access to complementary nucleotide triphosphate, dZTP, rather than both dZTP and dPTP. Second, we chose to use a single AEGIS tag sequence for both forward and reverse primers. We previously observed that using a single tag sequence in multiplexed PCR reactions reduced overall primer dimer formation and increased detection sensitivity (data not shown). The final AEGIS tag sequence (AGCPCTCGPTTC) was selected due to low propensity for hairpin formation, as determined computationally. This AEGIS tag sequence is appended to the 5′-end of the 40 SAMRS-containing primers used in this work (Table S2).
To multiplex samples, we then created 10 unique barcoding primers that contained a 24-nt barcode region using sequences from an Oxford Nanopore Technologies barcoding kit. These barcoding primers contained the barcode sequence and a downstream region homologous to the common 5′-tag of the 20-plex SAMRS-AEGIS primers (Table S3). The universal 5′ AEGIS tag thus serves as the priming region for the barcoding primers either in the same PCR reaction (one-pot amplification) or a subsequent PCR reaction (sequential amplification, Figure 2e). Though barcoding primers discussed in this work were designed to be compatible with Oxford Nanopore demultiplexing workflows, a similar design strategy can be used for barcoding applications on other sequencing platforms (Table S3).
Optimization of a sequential ‘two-step’ 20-plex PCR reaction
A unique challenge of multiplexing in complex samples of unknown metagenomic composition is that gene targets are not present in equimolar amounts. For certain sample types, targets in the same sample could be present at gene copy numbers that vary by orders of magnitude. If barcoding and target amplification occur in one reaction, rather than sequentially, higher abundance targets will bias amplification and consume barcoding primers, reducing assay sensitivity for lower abundance targets. We tested this hypothesis by performing both one-pot and sequential amplification of two targets using synthetic templates, stx2 (present at 10 or 102 copies/µL) and aaiC (104 or 105 copies/µL). When compared to one-pot PCR, performing barcoding in a separate PCR reaction (sequential PCR) decreased differences in abundances of stx2 and aaiC amplicons (Figure S4).
Subsequently, PCR optimization was used to identify optimal reaction and cycling conditions. For the optimal number of cycles in each step, we found 40 cycles (as is used for qPCR) during the first round of amplification followed by 15 cycles in the second round for barcoding minimized amplification bias and maximized barcoded targets over other combinations tested (Figure S4, S5). In the first round of PCR, we found a uniform concentration of each primer (0.2 µM of each primer, 8 µM total primer) minimized observed amplification bias (Figure S6). Under these optimized 20-plex PCR reaction and cycling conditions, primer dimers were still observed with standard DNA primers, but not with SAMRS-AEGIS primers (Figure S7).
Finally, we incorporated nanopore sequencing, a low capital cost, portable sequencing platform, as a read-out for detection of the SAMRS-AEGIS 20-plex reaction. Amplification was performed on the 20 synthetic template mixtures at two initial concentrations for each target: 10 and 104 copies/µL. Samples were sequenced on a MinION flow cell, basecalled, and demultiplexed. All 20 targets were detectable by nanopore sequencing at initial template concentrations of 10 and 104 copies/µL (Figure S8, S9, Table S6). In both reaction conditions, less reads were observed for three assay targets: stx1, STh, aatA. Though additional optimization (e.g., adjusting primer concentrations) could be performed to improve relative amplification of these three targets, many factors in environmental samples that cannot be controlled likely play a larger role in determining differential amplification. For example, the absolute and relative abundance of each target in real samples cannot be optimized. For design simplicity, we opted to continue with equimolar SAMRS-AEGIS primer concentrations.
As designed, this SAMRS-AEGIS 20-plex PCR reaction overcomes challenges that must be addressed for sensitive detection of multiple targets in environmental samples. Pathogen and antimicrobial resistance genes can be in low abundance,52–54 necessitating modifications that avoid primer dimerization. Inclusion of 1-3 SAMRS nucleotides in the seed region of the 20-plex PCR was effective at eliminating detectable primer dimer formation as seen by both gel electrophoresis and qPCR-based melting curve analysis. Target species in environmental samples are often differentially abundant and many times orders of magnitude different.54 We performed two PCR reactions sequentially - Reaction 1 uses 40 cycles to detect low abundance species or amplicons with low amplification efficiency, while Reaction 2 uses AEGIS nucleotides to introduce 24-nt sample barcodes for nanopore sequencing in order to minimize background, non-specific amplification. With equimolar amounts of all SAMRS-AEGIS primers, this workflow was sensitive enough to detect all 20 targets using synthetic templates at 10 copies/µL for each target by nanopore sequencing.
20-plex assay performance in environmental samples
Previously, we hypothesized that AEGIS nucleotides in the primers could help avoid non-specific ‘background’ amplification in environmental samples. To test this hypothesis, we compared the sequencing outputs of the 20-plex assay using SAMRS-AEGIS primers to standard DNA primers, in three sample types: wastewater, soil, and human feces (Figure S10-S14, Table S7-9). Nanopore sequencing reads were demultiplexed and binned into one of four categories: (1) fully map to target; (2) partially map to target; (3) map to primer regions, but not target; (4) unmapped. For all sample types, the SAMRS-AEGIS 20-plex assay had significantly more reads align to targets and fewer reads mapping to only primer regions compared to standard DNA 20-plex assay (Figure 3). Though variations between sample matrices were readily observable, the SAMRS-AEGIS 20-plex assay had between 1.8 – 7.5 times more read alignments to the full-length targets compared to reads derived from the standard DNA 20-plex assay. Conversely, the standard DNA 20-plex assay resulted in an average of 2.4 – 4 times more reads aligning only to primers, but not target, compared to reads from the SAMRS-AEGIS 20-plex assay.
The observed increase in on-target alignment of the SAMRS-AEGIS 20-plex assay highlights the importance of non-standard nucleotides as an indispensable component of this 20-plex assay. Reads aligning to only primers constituted the majority of reads from the standard DNA 20-plex assay. Reads in this category constitute a mixture of off-target products, including non-specific amplicons of environmental DNA and primer dimers. Two purification steps involved in preparing the nanopore sequencing library involve steps that partially remove primer dimers that would have been present. As such, a lower fraction of read in the ‘map to primer only’ category could be traced to primer dimers. For sequencing-based detection, the sensitivity of this assay is dependent on sequencing depth. Minimizing wasted sequencing effort on off-target amplicons is critical for minimizing assay costs since it allows users to multiplex more samples per sequencing flow cell.
Comparison between 20-plex assay and parallelized detection with TaqManTM Array Cards
To evaluate the performance of the SAMRS-AEGIS 20-plex assay against an established method (Figure 4a), we compared assay results obtained from the 20-plex assay to those from TaqManTM Array Cards (TAC). TAC assays are qPCR-based and use a highly parallelized architecture to detect multiple targets. Due to their convenience, sensitivity, and potential for semi-quantitative detection, TAC assays are widely used in diagnostic and environmental surveillance settings.6,51 Unlike the SAMRS-AEGIS 20-plex, TAC assays also require a fluorescent probe for target identification. Both assays allow for sample multiplexing, with TAC assays limited to eight samples per card. For these comparisons, 10 samples from the SAMRS-AEGIS 20-plex reactions were multiplexed in a single MinION nanopore sequencing flow cell.
Comparing target detection between TAC and the SAMRS-AEGIS 20-plex assay, we observed a 74% PPA (positive percent agreement) and a 97% NPA (negative percent agreement) between these two methods (Figure 4b). Among the discrepancies, 13 out of 63 cases involved targets detected by TAC but not by the SAMRS-AEGIS 20-plex assay, while 50 out of 63 cases involved targets detected by the SAMRS-AEGIS 20-plex assay but not by TAC (Figure 4b-e). To confirm the read-to-target assignments in the SAMRS-AEGIS 20-plex assay was not due to mapping error or improper demultiplexing, reads were aligned against reference sequences and manually inspected for processing errors. All 50 targets identified in the SAMRS-AEGIS 20-plex assay, but not in the TAC assay, could be fully mapped to properly barcoded reads. Additionally, no reads in the no-template controls (NTCs) for the SAMRS-AEGIS 20-plex assay could be mapped to assay targets.
These results provide evidence that the SAMRS-AEGIS 20-plex has sensitivity in the tested environmental sample matrices similar to TAC; for certain targets, the SAMRS-AEGIS 20-plex may be more sensitive. One possible reason for the observed differences by method is the assayed template input: TAC uses a maximum of 21 ng template per singleplexed well, while the 20-plex assay uses a maximum of 100 ng template in the first round of PCR. Another possible explanation stems from differences in cycling conditions: TAC uses 40 qPCR cycles, while the SAMRS-AEGIS 20-plex assay uses 40 cycles for first round amplification and 15 for second round of tagged amplification (net: 45 cycles accounting for dilution).
SAMRS-AEGIS 20-plex assays reveal additional information about microbial threats
We next asked whether additional information is provided by the sequencing data obtained from the SAMRS-AEGIS 20-plex assay. For each sample and each positive gene target with at least 10 mapped reads, consensus sequences were generated and dereplicated to explore diversity across samples. STh, also known as ST1b, encodes a heat stable enterotoxin produced by enterotoxigenic E. coli (ETEC).55 Of the 12 samples that were positive by STh, three unique STh variants could be identified (Figure 5a). These variants closely match unnamed STh variants present in databases (Supplementary File 1).
For antimicrobial resistance genes, mcr-1 and blaNDM, we sought to identify different alleles that could be amplified using the SAMRS-AEGIS 20-plex primer set by mapping reads from positive samples to all alleles in the Comprehensive Antibiotic Resistance Database (CARD).56 While reads mapped to more than one allele sequence, the putative alleles are highly similar (e.g., mcr-1.20 and mcr-1.14 have 1 bp different in the amplicon region) and could not be distinguished from nanopore sequencing error. One strategy to distinguish highly similar alleles within the same sample with nanopore sequencing is incorporating unique molecular identifiers (UMIs) in the primers.57 Alternatively, higher accuracy sequencing platforms such as Illumina or PacBio could be used.
Beyond toxins and ARGs, the SAMRS-AEGIS 20-plex also targeted the 18S rRNA gene to identify protozoan pathogens. The CR18S assay targets the 18S rRNA gene in Cryptosporidium spp.58 12 of the 14 CR18S SAMRS-AEGIS 20-plex assays that were positive for CR18S were also positive by TAC. Consensus sequences revealed six unique 18S alleles from these samples. Three of these alleles mapped at 100% identity to previously observed variants found in sequence databases, including an unnamed Cryptosporidium sp. isolate, a Cryptosporidium meleagridis isolate, and a Cryptosporidium hominis isolate (Figure 5b). The remaining variants were found to map with lower homology (approximately 90% ID) to uncultured alveolates.
Finally, we were able to observe gene variants belonging to two subspecies of Campylobacter jejuni. The hipO gene encodes for hippurate hydrolase in C. jejuni.59 From sequencing we were able to observe eight unique variants (Figure 5c). Four of these variants, with nucleotides T33/T37 in the amplicon, closely map to C. jejuni subspecies jejuni while the other four, with nucleotides C33/C37, closely map to C. jejuni subsp. doylei. Two of the C. jejuni subsp. jejuni hipO variants and two of the C. jejuni subsp. doylei hipO variant matched with 100% ID to previously observed hipO genes. Two of the conserved polymorphisms in the hipO doylei variant overlap with the probe binding region for the TAC assay (probe: T48/G47; doylei C38/A47). Only one of eight samples positive for hipO with the SAMRS-AEGIS 20-plex was positive using TAC.
Sequencing results from the SAMRS-AEGIS 20-plex provided additional insight regarding microbial threats that would have otherwise been missed through a presence/absence-based approach. C. jejuni is an important pathogen in LMICs.60 A 2016 study of eight birth cohorts across South America, sub-Saharan Africa, and Asia found that 85% of children are carriers of Campylobacter spp. before the age of one.60 C jejuni is also an important food-borne pathogen in HICs primarily transmitted via poultry products.61 More samples were positive for C. jejuni (hipO) than C. coli (GlyA) in both wastewater samples from WA and child fecal samples from Ecuador. Within C. jejuni, two subspecies exist that display differing phenotypic and clinical case presentations. The lesser-known C. jejuni subsp. doylei is more associated with bacteremia and is known to cause gastritis, in addition to enteritis.62 Consensus sequences from our 20-plex assay showed half of the hipO amplicons were more similar to C. jejuni subsp. doylei than C. jejuni subsp. jejuni (Supplementary File 1), though some samples contained a mixture of both species. The ability to distinguish these two subspecies is a notable feature of the 20-plex and provides important information that is useful for both LMIC and HIC settings.
Two assays in this work target eukaryotic 18S rRNA genes: CR18S (Cryptosporidium spp.) and G18S (Giardia spp.). Of positive G18S samples from the 20-plex assay, all consensus sequences mapped with 100% identity to Giardia intestinalis, the causative species of disease in humans.63 We observed six variants in Cryptosporidium 18S rRNA amplicons, (Supplementary File 1), three of which mapped to uncultured alveolates at a lower identity (approx. 90% ID). While the alveolate genus and species is unknown, the positive detection in both SAMRS-AEGIS 20-plex and TAC is possibly the result of off-target amplification of related, but non-pathogenic organisms in the alveolate genera. Given the high variation in the observed amplicons for the CR18S genes, more specific assays that target C. hominis and C. parvum,64 the causative species of illness in humans, may be warranted.65 Nonetheless, the sequencing used in the SAMRS-AEGIS 20-plex assay provides a general means to interrogate variants within a sample and distinguish between false positives and true positives.
Though the SAMRS-AEGIS 20-plex assay proved to be sensitive and more information-rich than probe-based amplification strategies, there are some notable limitations. Probe-based strategies that are qPCR-based, such as the TAC assay and dPCR are quantitative. As developed, the 20-plex assay is not capable of relative quantification due to the use of sequential rounds of PCR, nor can the assay be used for absolute quantification since it relies on sequencing. Additionally, sensitivity of assay targets when multiplexing is highly dependent on the differential abundance of target species. Optimization of concentrations for specific targets, combined with a priori knowledge of expected environmental abundances, may be required for improving sensitivity in certain sample types. Though the use of nanopore sequencing is well-suited for resource limited settings, the low nominal basecalling accuracy (95%) limits our ability to resolve multiple alleles within the same sample. To distinguish between multiple alleles in a single sample, much higher sequencing coverage with nanopore or other higher accuracy NGS (e.g., Illumina) would be required. Finally, as the 20-plex assay is built around detection of extracted genetic material from samples, it is not suited for detection of viable organisms.
Despite these limitations, the SAMRS-AEGIS 20-plex assay strategy presented is a promising alternative to conventional multi-target PCR detection methods. We estimate that multiplexing 10 samples and 20 targets (20-plex) on a single nanopore MinION flow cell would cost approximately $4.00 per target and $80.00 per sample (Table S10, Supplementary File 2), which is similar to the per-target and per-sample costs of 20 parallelized assays on the TAC platform. Where the SAMRS-AEGIS 20-plex assay design excels is at scale. Assuming a fixed read coverage, sequencing the 20-plex reactions on an Illumina NovaSeq S4 would drop assay costs to approximately $0.55 per target or $11.00 per sample. Setting aside variable costs, using nanopore sequencing as a readout still offers a lower entry barrier in capital costs compared to qPCR, dPCR, and other NGS platforms, making it well-suited for work in resource constrained settings, such as LMICs.
Lastly, we highlight that the strategy presented in this work is not limited to the 20 targets described here. SAMRS-AEGIS primers are premised on orthogonality, offering an element of modularity for target choices that can be adapted to specific geographic contexts or modified to include emerging threats. Coupled with the additional insight gained from sequencing, this approach has the potential to significantly enhance our understanding of pathogens and antibiotic resistance globally, paving the way for more effective public health interventions.
Methods
Sample collection and nucleic acid extraction
10 wastewater samples were obtained from treatment plants from Washington (WA) State. 25-50 mL of wastewater was centrifuged at 5000 x g for 20 minutes at 4 °C. The resulting pellet was resuspended in 200 µL of supernatant. Nucleic acids from 200 µL of resuspended wastewater solids were extracted using AllPrep PowerViral DNA/RNA Kit (Qiagen, Hilden, Germany), omitting the use of β-mercaptoethanol. Purified wastewater DNA was eluted in RNase-free water to a final volume of 100 µL.
10 soil samples were collected from three dog parks located in Seattle, WA. Nucleic acids from 0.25 g of each sample were extracted using DNeasy PowerSoil Pro Kit (Qiagen) following standard protocol. Purified soil DNA was eluted in Solution C6 (10 mM Tris-HCl buffer) to a final volume of 50 µL.
10 fecal samples from children were obtained from the ECoMiD cohort study in northwest Ecuador.66 The child stool samples were collected at 18 months of age. Nucleic acids were extracted from 0.22 grams of stool samples using a modified QIAamp Fast DNA Stool Mini Kit (Qiagen). Purified fecal DNA was eluted in Buffer ATE (10 mM Tris-HCl, 0.1 mM EDTA, 0.04% NaN3) to a final volume of 200 µL. The ECoMiD study protocol was approved by the institutional review boards of the University of Washington (UW; IRB STUDY00014270), Emory University (IRB00101202), and the Universidad San Francisco de Quito (2018–022M). The study protocol was also reviewed and approved by the Ministry of Health of Ecuador (MSPCURI000253-4).
SAMRS-AEGIS Primer Design
We selected 40 standard DNA primers from 20 qPCR assays reported in previous literature (Table S1). These primers were modified with SAMRS nucleobases to prevent primer dimer in a 20-plex PCR assay. SAMRS modifications were designed using an iterative approach. Software developed at FfAME (PrimerCompare) took all 40 standard DNA primers along with primer, salt, and Mg++ concentrations (200 nM, 60 mM, and 2 mM, respectively) and output potential primer-primer interactions including self-dimerization and hairpin structures. Using filters in the software, we concentrate on only the most detrimental structures with sufficiently low ΔG values for hairpins and dimers, as well as dimers with 3′ to 3′ overlaps within a short footprint (4-8 nt). These become our primary SAMRS substitution regions. We then identified between 1-3 bases for SAMRS substitutions in the 3′-overlap region that can destabilize the largest proportion of predicted structures. PrimerCompare incorporates SAMRS nearest neighbor thermodynamic data and allows us to run the SAMRS modified set as input to evaluate if further substitutions are required, along with checking the Tms and ΔGs of modified primers. This process continues until an optimal set of primers is designed.
Once all 40 standard DNA primers were modified with SAMRS, we added a common AEGIS tag to the 5′-end. The 5′ overhang sequence facilitates the attachment of barcode and sequencing adapters in PCR. The AEGIS tag (AGCPCTCGPTTC) was designed to allow 2 AEGIS bases separated by 3 or more standard DNA bases and to have a Tm of at least 60 °C. The designed 40 SAMRS-AEGIS primers are listed in Table S2. Further, AEGIS barcode sequences used for sample multiplexing were designed by concatenating a barcode from Oxford Nanopore Technologies Native Barcoding Kit (SQK-NBD112.24) with the AEGIS tag sequence, which are listed in Table S3.
SAMRS-AEGIS Primer Synthesis
SAMRS or AEGIS containing oligonucleotides were synthesized on Mermade 12 instruments, using standard phosphoramidite methods with minor changes to the coupling time of AEGIS phosphoramidites (2 min for AEGIS, 1 min for standard DNA bases and SAMRS). Solid support was a Mermade style column packed with controlled pore glass (CPG) at 1000 Å pore size. Oligonucleotides were synthesized as either DMT-on or DMT-off, followed by diethylamine wash (10% in ACN) at the end of the synthesis. DMT-off oligonucleotides were deprotected in aqueous ammonium hydroxide (28-33% NH3 in water) at either 65 °C for 3 hours or 55 °C overnight, purified by ion-exchange HPLC (Dionex DNAPac PA-100, 22×250 column), and desalted over SepPak C18 cartridges (Waters Corp., Milford, MA). Oligonucleotides synthesized as DMT-on were deprotected using the same method, followed by purification on Glen-Pak cartridges (GlenResearch, Sterling, VA). The purity of each oligonucleotide was analyzed by analytical ion-exchange HPLC (Dionex DNAPac PA-100, 2×250 column). The oligonucleotides were sent out for ESI mass spectrometry (Novatia LLC, Newtown, PA) to confirm their molecular weights.
Sequential Multiplex PCR reaction and cycling conditions
Unless otherwise specified, first round PCR was performed at 20 µL scale and contained 1X Quantitect Multiplex PCR NoROX master mix (Qiagen), 0.2 µM final concentration of each primer (40 primers listed in Table S2, S4 for SAMRS-AEGIS and standard DNA assays, respectively), and nucleic acid template. For reactions that used synthetic templates, 10 - 105 gene copies/µL of synthetic templates (IDT, Table S6) were added as specified. Synthetic templates were ordered as IDT gBlock Gene Fragments, except LT, ipaH, G18S, and ITS1, which were ordered as two single-stranded oligos. Oligos for LT, ipaH, G18S, and ITS1 were annealed by adding 20 µM of each oligo in 100 mM of NaCl and 10 mM Tris-EDTA (pH 8.0) buffer and incubating at 90 °C for 3 minutes, then cooling at 0.1 °C/s until reaching 20 °C. For reactions using environmental or fecal DNA extracts, up to 100 ng of DNA extract or 5 µL of volume were added (Table S7). First round PCR reactions with SAMRS-AEGIS primers also contained a 0.05 mM final concentration of dZTP (Firebird Biomolecular Sciences, Alachua, Fl). No template control (NTC) reactions were run in parallel with samples, with the template volume replaced by nuclease-free water.
With the exception of experiments where cycling conditions are explicitly varied, first round amplification in sequential PCR was amplified using the following cycling conditions: initial denaturation at 95 °C for 15 min; followed by 40 cycles of (1) 95 °C for 30 s and (2) 60 °C for 60 s; a final extension 72°C for 5 min; then holding step at 12 °C.
1 µL of the PCR product was then used as the template for a second PCR reaction. The second PCR reaction contained template, 1X Quantitect Multiplex PCR NoROX master mix (Qiagen), 2 µM of 24-mer barcoding primer (Table S3, S5) in 30 µL of volume, or 20 µL when specified. For reactions that contained SAMRS-AEGIS primers, 0.05 mM of dZTP was added. No template control reactions for each barcoding primer were run in parallel with the samples, with the template volume replaced by nuclease-free water.
The second round PCR reactions were amplified using the following cycling conditions: 95 °C for 15 min, followed by 15 cycles of (1) 95 °C for 30 s and (2) 60 °C for 60 s, followed by 72°C for 5 min and a final holding step at 12 °C. After each round of PCR, amplicons were analyzed by gel electrophoresis on a 3% (w/v) agarose gel stained with GelGreen, and visualized using a blue light transilluminator.
Nanopore library preparation and data acquisition
Prior to library preparation, all barcoded samples were purified using magnetic DNA-binding beads (Sergi Lab Supplies, Seattle, WA) with a 2:1 bead-to-sample ratio (v/v). Samples were washed twice with 70% ethanol, and eluted in nuclease-free water to a final volume of 12 µL. Purified DNA was quantified on a DeNovix Fluorometer, and barcoded samples were pooled equally by weight. A subset of SAMRS-AEGIS and standard DNA NTCs were also sequenced. Nanopore sample preparation followed standard MinION Genomic DNA by Ligation protocol using the SQK-LSK114 kit with the two following modifications: 1) During the DNA repair and end prep step, the NEBNext FFPE Repair Mix was omitted to avoid potential SAMRS-AEGIS removal by repair enzymes. The volume of the repair mix was replaced by nuclease-free water. 2) To preserve short fragments, the magnetic DNA-binding bead-to-sample ratio was increased to 2:1. Up to 1.3 pmol of pooled samples were loaded into the flowcell. MinION flow cells used in this work were from the R10.4.1 series. Nanopore flow cells were used once per sample without washing, and data collection proceeded for 72 h. A summary of nanopore sequencing runs is shown in Table S8.
Nanopore data collection, basecalling, and processing
Nanopore data acquisition was performed using MinKNOW version 23.07.12. Data was collected in FAST5 format for experiments with synthetic templates, and POD5 format for environmental/fecal samples. FAST5 files were converted to POD5 format using the pod5 package (ONT, version 0.3.2).67 Raw POD5 data files were basecalled using Dorado (ONT, version 0.6.2+14a7067) using the super accurate model (dna_r10.4.1_e8.2_400bps_sup@v4.2.0) and a minimum q-score threshold of 7.68 Sample barcodes were demultiplexed using the Dorado demux command with the “--no-trim” flag and a custom barcode configuration file that contained the 24-nt barcodes used in this work.
Demultiplexed reads were aligned to a database containing barcoded reference sequences using BLAST Command Line Tool (blastn, NCBI, version 2.9.0+) with the following flags: -- outfmt 10, --max-target-seqs 1.69 After alignment, top hits for each read with at least 95% coverage were stored as an initial match. The resulting reads were then passed through a more stringent alignment using bowtie2 (version 2.3.5.1) with the following flags: --very-sensitive, -- local.70 Bowtie2 alignment reference sequences contained target sequences without the barcode region for the G18S, eae, CR18S, LT, and ipaH assays, and without the barcode or priming region for the remaining assays. For sub-analysis of hipO alleles, both hipO variants from C. jejuni subsp. jejuni and C. jejuni subsp doylei were included in the reference sequences.
Reads that passed bowtie2 alignment were further aligned to the fully barcoded target sequences. Consensus sequences for each target within each sample were generated from these aligned reads using medaka (ONT, version 1.12.1) commands: ‘consensus’ and ‘stitch’.71 For the ‘consensus’ command, no lower limit on the number of sequences required to generate a consensus was placed at consensus generation stage. However, only alignments generated from at least 10 sequences were used for downstream analysis. The ‘stitch’ command used the following flag: –no-fillgaps. Consensus alignment % ID was calculated using the BLAST Command Line (blastn) with the following flags: --outfmt 10, --max-target-seqs 1.
TaqManTM Array Card assays
1X TaqManTM Fast Advanced PCR master mix (Thermo Fisher Scientific) was used for all assays. A maximum amount of either 1400 ng or 20 µL of DNA was loaded into a TaqManTM Custom Plated Assay Microarray Card (Table S7). Six samples were run on each card with a positive and negative control. For the positive control, 1×103 copies/µL of synthetic templates from IDT containing all 20 targets were used (sequences provided in Table S6). For the negative control, volume of template DNA was replaced by nuclease-free water. Before running, the loaded card was spun down twice at 300 x g for 1 min. The TaqManTM Array Card Sealer was then used to seal the card. The QuantStudio 7 Flex System (Thermo Fisher Scientific) was used, with qPCR cycling conditions set at 92 °C for 10 min, followed by 40 cycles of 95 °C for 1 s and 60 °C for 20 s. Data analysis was performed using Design & Analysis Software (version 2.8.0). The fecal samples were run as part of the ECoMiD study using TAC cards with AgPath-ID One-Step RT-PCR master mix and did not have the mcr-1 assay.
Comparison of read distributions between the SAMRS-AEGIS 20-plex and standard DNA 20-plex assays
Of the 30 environmental samples collected in this work, 10 wastewater, 10 soil, and 10 fecal samples were processed by the SAMRS-AEGIS 20-plex assay while 9 wastewater, 8 soil, and 3 fecal samples were processed by the standard DNA 20-plex assay. For both assay results, nanopore reads were demultiplexed then binned into one of four categories. Reads were classified as “Target (full)” if they successfully mapped to the intended target following the pipeline outlined in the “Nanopore data collection, basecalling, and processing” section. This pipeline used an initial 95% query mapping filter to remove partial alignments. Reads that mapped to barcodes, primer region, and target amplicon region, but with <95% coverage, were binned as “Target (partial)”. Reads that did not align to the target amplicon region, but did align to barcode and primer regions were binned as “primer”. Reads in this category could include primer dimers and other non-specific amplification products. The remaining reads, which did not fall under the previous categories, were binned as “None”. Reads mapping to G18S, LT, and ipaH were excluded from this analysis since they were detected in the NTC of the standard DNA 20-plex assay. Visuals for read bins were generated using R (version 4.3.2).
Comparison of target detection between the SAMRS-AEGIS 20-plex assay and the TaqManTM array cards in environmental samples
10 wastewater, 10 soil, and 10 fecal samples were analyzed for the presence of gene targets by both the SAMRS-AEGIS 20-plex assay and TaqManTM array cards (TAC). For the SAMRS-AEGIS 20-plex assay, an assay wa considered positive if at least one read successfully mapped to its target according to the pipeline described in the “Nanopore data collection, basecalling, and processing” section; otherwise, it was considered negative. For TAC, an assay was considered positive if at least one of the two replicates in a card reported a Ct value <40; otherwise, it was considered negative. For TAC assays in fecal samples, the mcr-1 assay was not available and was excluded from analysis.
Agreement and disagreement between SAMRS-AEGIS and TAC for each assay and across all samples were visualized on a plotted matrix. Plots were generated using Python (version 3.8.0). Percent positive agreement (PPA) and percent negative agreement (NPA) was calculated using the following formula:
Identification and visualization of pathogen and antimicrobial resistance gene alleles
Reads were processed as described previously with the inclusion of hipO variant sub-analysis specifications. Consensus sequences generated in each sample with a coverage >10 reads were aligned to reference sequences of target gene, then dereplicated. Alignments were visualized using Integrative Genomics Viewer (version 2.16.2).72 Regions of interest were manually extracted and expanded for visualization. To identify if putative allele sequences had previously been observed, BLASTN webserver was used to map consensus sequences against NCBI core non-redundant nucleic acid database (core_nt).
Data Availability
Assay results summary (number of mapped reads and TAC Ct values) are provided in a Supplementary data file upon publication in a peer reviewed journal. The demultiplexed nanopore sequencing basecalls (FASTQ) for each sample analyzed in this work have been deposited in the sequence reads archive (SRA) under Bioproject PRJNA1150247.
The demultiplexed nanopore sequencing basecalls (FASTQ) for each sample analyzed in this work have been deposited in the sequence reads archive (SRA) under Bioproject PRJNA1150247 (Table S9).
Author Contribution
Project conceptualization was performed by ERF, JAM, and ZY. Methodology for this work was developed by HK, SMP, LM, and ZY. SAMRS-AEGIS oligonucleotides were synthesized by CC and CM. Fecal samples were contributed by KL and NAZ. Laboratory experiments were performed by HK and NAZ. Data analysis was conducted by HK, SMP, KB, ZY, JAM, and ERF. Visualization of data and results was performed by HK, JAM, and ERF. This project was supervised by ERF. Writing of original draft was carried out by HK, JAM, ERF, and ZY. Reviewing and editing of the manuscript was performed by all.
Conflict of Interest
S.A.B and Z.Y. own the intellectual property of AEGIS and SAMRS. Many AEGIS and SAMRS components are commercially available from Firebird Biomolecular Sciences, LLC (www.firebirdbio.com, Email: support{at}firebirdbio.com). The remaining authors declare no competing interests.
Funding
LM, SMP, KMB, CC, SAB, and ZY were supported by the LRE Diagnostics grant 1R01AI135146-01A1. NAZ and KL were supported by R01AI137679. Laboratory infrastructure and hardware used for this study was supported by the University of Washington Interdisciplinary Center for Exposures, Diseases, Genomics, and Environment funded by the NIEHS (P30ES007033). JM and HK were supported by University of Washington Royalty Research Fund (RRF).
Acknowledgments
We thank the wastewater treatment plants for collecting samples for this work.
References
- (1).↵
- (2).↵
- (3).↵
- (4).
- (5).↵
- (6).↵
- (7).↵
- (8).↵
- (9).↵
- (10).↵
- (11).
- (12).↵
- (13).↵
- (14).↵
- 15.
- (16).↵
- (17).↵
- (18).↵
- (19).↵
- (20).↵
- (21).↵
- (22).↵
- (23).
- (24).↵
- (25).↵
- (26).↵
- (27).↵
- (28).↵
- (29).↵
- (30).↵
- (31).↵
- (32).↵
- (33).↵
- (34).↵
- (35).↵
- (36).↵
- (37).↵
- (38).↵
- (39).↵
- (40).↵
- (41).↵
- (42).↵
- (43).↵
- (44).↵
- (45).↵
- (46).↵
- (47).↵
- (48).↵
- (49).↵
- (50).↵
- (51).↵
- (52).↵
- (53).
- (54).↵
- (55).↵
- (56).↵
- (57).↵
- (58).↵
- (59).↵
- (60).↵
- (61).↵
- (62).↵
- (63).↵
- (64).↵
- (65).↵
- (66).↵
- (67).↵
- (68).↵
- (69).↵
- (70).↵
- (71).↵
- (72).↵