Abstract
Laboratory tests for the accurate and rapid identification of SARS-CoV-2 variants have the potential to guide the treatment of COVID-19 patients and inform infection control and public health surveillance efforts. Here we present the development and validation of a COVID-19 variant DETECTR® assay incorporating loop-mediated isothermal amplification (LAMP) followed by CRISPR-Cas12 based identification of single nucleotide polymorphism (SNP) mutations in the SARS-CoV-2 spike (S) gene. This assay targets the L452R, E484K, and N501Y mutations associated with nearly all circulating viral lineages. In a comparison of three different Cas12 enzymes, only the newly identified enzyme CasDx1 was able to accurately identify all three targeted SNP mutations. We developed a data analysis pipeline for CRISPR-based SNP identification using the assay from 91 clinical samples (Ct < 30), yielding an overall SNP concordance and agreement with SARS-CoV-2 lineage classification of 100% compared to viral whole-genome sequencing. These findings highlight the potential utility of CRISPR-based mutation detection for clinical and public health diagnostics.
Introduction
The emergence of new SARS-CoV-2 variants threatens to substantially prolong the COVID-19 pandemic. SARS-CoV-2 variants, especially Variants of Concern (VOCs) (1, 2), have caused resurgent COVID-19 outbreaks in the United States (2–5) and worldwide (1, 6, 7), even in populations with a high proportion of vaccinated individuals (8–11). Mutations in the spike protein, which binds to the human ACE2 receptor, can render the virus more infectious and/or more resistant to antibody neutralization, resulting in increased transmissibility (12), and/or escape from immunity, whether vaccine-mediated or naturally acquired immunity (13, 14). Variant identification can also be clinically significant, as some mutations substantially reduce the effectiveness of available monoclonal antibody therapies for the disease (15).
Tracking the evolution and spread of SARS-CoV-2 variants in the community is critical to inform public policy regarding testing and vaccination, as well as guide contact tracing and containment effects during local outbreaks (16, 17). Virus whole-genome sequencing (WGS) and single nucleotide polymorphism (SNP) genotyping are commonly used to identify variants Harper, 2021 #100;Oude Munnink, 2021 #99}, but can be limited by long turnaround times and/or the requirement for bulky and expensive laboratory instrumentation. Diagnostic assays based on clustered interspaced short palindromic repeats (CRISPR) (18) have been developed for rapid detection of SARS-CoV-2 in clinical samples (13, 19–22),, and a few have obtained Emergency Use Authorization (EUA) by the US Food and Drug Administration (FDA) (23, 24). Some advantages of these assays for use in laboratory and point of care settings include low cost, minimal instrumentation, and a sample-to-answer turnaround time of under 2 hours (19, 22, 25, 26).
Here we present the development of a CRISPR-based COVID-19 variant DETECTR® assay (henceforth abbreviated as DETECTR® assay) for the detection of SARS-CoV-2 mutations and evaluate its performance on 91 clinical samples using WGS as a comparator method (Fig. 1A). The assay combines RT-LAMP pre-amplification followed by fluorescent detection using a CRISPR-Cas12 enzyme. We perform a comparative evaluation of multiple candidate Cas12 enzymes and demonstrate that robust assay performance depends on the specificity of the newly identified CRISPR-Cas12 enzyme called CasDx1 in identifying three key SNP mutations of functional relevance in the spike protein, N501Y, L452R, and E484K (27).
(A) Workflow comparison between the DETECTR® assay and SARS-CoV-2 whole-genome sequencing (WGS). (B) Schematic of CRISPR-Cas gRNA design for SARS-CoV-2 S gene mutations. (C) Schematic of multiplexed RT-LAMP primer design showing the SARS-CoV-2 S gene mutations and gRNA positions. (D) Heat map comparison of three different Cas12 enzymes tested using 10 nM PCR-amplified synthetic gene fragments. (E) Dot plot showing the number (n=6) of positive replicates across a 4-log dynamic range of the RT-LAMP products (F) Heat map comparison of end-point fluorescence of three different Cas12 enzymes tested against heat-inactivated viral cultures. Replicates (n=6) generated using RT-LAMP were pooled and CRISPR-Cas12 reactions were then run in triplicate (n=3).
Results
Identifying the optimal CRISPR-Cas12 enzyme for SNP detection
To determine the optimal Cas12 enzyme for SNP detection, we evaluated three different CRISPR-Cas effectors with trans-cutting activity: LbCas12a, AsCas12a, and a novel Cas12 enzyme called CasDx1. We initially screened guide RNAs (gRNAs) with CasDx1 and LbCas12a for activity on synthetic gene fragments encoding regions of the SARS-CoV-2 S-gene with either wild-type (WT) or mutant (MUT) sequences at amino acid positions 452, 484, and 501 (Fig. 1B-C). From this initial activity screen, we identified the top-performing gRNAs for each S-gene variant encoding either L452R, E484K or N501Y (Fig. 1D). Further evaluation of these guides using CasDx1, LbCas12a and AsCas12a with their cognate gRNAs on synthetic gene fragments revealed differences in SNP differentiation capabilities, with CasDx1 showing the clearest SNP differentiation between wild-type (WT) and mutant (MUT) sequences for all three S-gene variants (Fig. 1D and Fig. S1A). In comparison, LbCas12a was capable of differentiating SNPs at positions 452 and 484, but not 501, whereas AsCas12a was only able to differentiate the SNP at position 452 (Fig. 1D and Fig. S1A).
We next tested SNP differentiation capabilities on heat-inactivated viral cultures using the full DETECTR® assay, consisting of RNA extraction, multiplexed RT-LAMP amplification (Fig. 1C), and CRISPR-Cas12 detection with guide RNAs targeting part of the spike receptor-binding domain (RBD) (Fig. 1B). The LAMP primer design incorporated two sets of six primers each, with both sets generating overlapping spike RBD amplicons that spanned the L452R, E484K, and N501Y mutations. We chose to adopt a redundant LAMP design for two reasons: first, this approach was shown to improve detection sensitivity in initial experiments; second, we sought to increase assay robustness given the continual emergence of escape mutations in the spike RBD throughout the course of the pandemic (13). The tested viral cultures included an ancestral SARS-CoV-2 lineage (WA-1) containing the wild-type spike protein (D614) targeted by the approved mRNA (Pfizer and Moderna) (28, 29) and DNA adenovirus vector (Johnson and Johnson) (30) vaccines, variants being monitored (VBMs) that were previously classified as VOCs or variants of interest (VOIs), including Alpha (B.1.1.7), Beta (B.1.351), Gamma (P.1), Epsilon (B.1.427 and B.1.429), Kappa (B.1.617.1), and Zeta (P.2) lineages, and the current VOC Delta (B.1.617.2) lineage (31). Heat-inactivated viral culture samples representing the seven SARS-CoV-2 lineages were quantified by digital droplet PCR across a 4-log dynamic range and used to evaluate the analytical sensitivity of the pre-amplification step. RT-LAMP amplification was evaluated using six replicates from each viral culture. We observed consistent amplification for all seven SARS-CoV-2 lineages with 10,000 copies of target input per reaction (200,000 copies/mL) (Fig. 1E), which is comparable to the target input of >200,000 copies/mL viruses (<30 Ct value) required for sequencing workflows used in SARS-CoV-2 variant surveillance (32, 33).
To evaluate the specificity of the different Cas12 enzymes, amplified material from each viral culture was pooled and the SNPs resulting in the L452R, E484K and N501Y mutations were detected using CasDx1, LbCa12a and AsCas12a. Similar to the results found using gene fragments, CasDx1 correctly identified the wild-type (WT) and mutational (MUT) targets at positions 452, 484 and 501 in each LAMP-amplified, heat-inactivated viral culture (Fig. 1F and Fig. S1B). In comparison, LbCas12a was capable of differentiating WT from MUT at position 501 on LAMP-amplified viral cultures but showed much higher background for the WT target at position 452 and higher background for both WT and MUT targets at position 484 for (Fig. 1F and Fig. S1B). Additionally, AsCas12a was able to differentiate WT from MUT targets at position 452 albeit with substantial background but was unable to differentiate WT from MUT targets at positions 484 and 501 (Fig. 1F and Fig. S1B). From these data, we concluded that CasDx1 would provide more consistent and accurate calls for the L452R, E484K and N501Y mutations. We thus proceeded to further develop the assay using only the high-fidelity CasDx1 enzyme.
Development of a data analysis pipeline for calling COVID-19 variant SNPs with the DETECTR® assay
To develop a data analysis pipeline for calling SARS-CoV-2 SNP mutations and assign lineage classifications with the DETECTR® assay (Fig. 2A-B), we first used data collected from SNP synthetic gene fragment controls (n=279) that included all mutational combinations of 452, 484 and 501 (see Methods). Based on the control sample data, we generated allele discrimination plots (34, 35) to define boundaries that separate the WT and MUT signals (Fig. S4A). Clear differentiation between WT and MUT signals was observed when plotting the ratio against the average of the WT and MUT transformed values on a mean average (MA) plot (34, 35) (Fig. S4B), with 100% concordance for SNP identity at positions 452, 484, and 501 for the control samples.
(A) Interpretation table summarizing the SARS-CoV-2 mutations in this study associated with the corresponding lineage classification. (B) Schematic of data analysis pipeline describing the RT-LAMP QC and subsequent CasDx1 signal scaling. The scaled signals were compared across SNPs and the calls were made for each RT-LAMP replicate. The combined replicate calls defined the mutation call, which informed the final lineage classification. (C) Three representative clinical samples of different SARS-CoV-2 lineages depict the workflow of the DETECTR® assay. Raw fluorescence curves of each sample run in RT-LAMP amplification and subsequent triplicate DETECTR® reactions targeting both WT and MUT SNPs for L452(R), E484(K), and N501(Y). Box plot visualization of the end point fluorescence in DETECTR® across each SNP for the three representative clinical samples. Calls were made for each SNP by evaluating the median values of the DETECTR® calls and overall calls through the LAMP replicates, and given a designation of WT, MUT, or NoCall. Final calls are made on the lineage determined by each SNP. Blue represents WT and green represents MUT, with RT-LAMP replicates (n = 3), CasDx1 replicates (n = 3 per LAMP replicate) and shading around kinetic curves indicates ±1.0SD.
Performance evaluation of the DETECTR® assay using clinical samples
Next, we assembled a blinded dataset consisting of 93 COVID-19 positive clinical samples (previously analyzed by viral WGS) and the SNP controls run in parallel. These samples were extracted, amplified in triplicate RT-LAMP reactions (Fig. S2), and processed further as triplicate CasDx1 reactions for each LAMP replicate (Fig. S3). A total of nine replicates were thus generated for each sample to detect WT or MUT SNPs at positions 452, 484, and 501. The DETECTR® data analysis pipeline was then applied to each sample to provide a final lineage categorization (Fig. 2A-C). For a biological RT-LAMP replicate to be designated as either WT or MUT, the same call needed to be made from all three technical CasDx1 replicates (Fig. S5A). A final SNP mutation call was made based on ≥1 of the same calls from the three biological replicates, with replicates that were designated as a No Call ignored (Fig. S5A-C). After excluding two samples that were considered invalid because the fluorescence intensity from RT-LAMP amplification did not reach a pre-established threshold determined using receiver-operator characteristic (ROC) curve analysis (Fig. S2 and Fig. S6), we evaluated a total of 807 CasDx1 signals from the 91 remaining clinical samples, generating up to 9 replicates for each clinical sample (Fig. S5B). Differentiation of WT and MUT signals according to the allele discrimination plots was more pronounced at positions 484 and 501 than position 452 (Fig. S4), whereas the MA plots showed clear separation of WT and MUT calls for all three positions (Fig. 3A and Fig. S4). The variant calls made on each sample were consistent with the difference in median values of the log-transformed signals as determined using the data analysis pipeline (Fig. S7).
(A) MA-plots showing CasDx1 SNP detection replicates (n = 807) for each SARS-CoV-2 mutation across 91 clinical samples. WT is denoted by blue dots, MUT is denoted by green dots, NoCall is denoted by orange dots and NTC is denoted by grey dots. (B) Alignment of final mutation calls comparing the DETECTR® and SARS-CoV-2 WGS assay results across 91 clinical samples after discordant samples (indicated by red asterisk) were resolved. (C) Final lineage classification on each clinical sample by the DETECTR® assay compared to the SARS-CoV-2 lineage determined by viral WGS. (D) Final Positive Predictive Agreement (PPA), Negative Predictive Agreement (NPA) and concordance values for each WT and MUT SNP from the evaluation of the DETECTR® assay against the SARS-CoV-2 WGS comparator assay after discordant samples were resolved.
We then unblinded the viral WGS results to evaluate the accuracy of the DETECTR® assay for SNP calls and lineage classification. There were 14 discordant SNP calls out of 272 (94.9% SNP concordance) distributed among 11 clinical samples out of 91 (Fig. S8A-C). Among the 11 discordant samples, one sample (COVID-31) was designated a ‘no call’ at position 452 by viral WGS and thus lacked a comparator, two samples were designated a ‘no call’ due to flat WT and MUT curves (COVID-41 and COVID-73), four samples had similar WT and MUT curve amplitudes, suggesting a mixed population (COVID-03, COVID-56, COVID-61 and COVID-81) (Fig. S8A), and four samples had SNP assignments discordant with those from viral WGS (COVID-12, COVID-13, COVID-20 and COVID-63) (Fig. S8A).
Given that the comparison data had been collected over an extended time period, we surmised that sample stability issues arising from aliquoting and multiple freeze-thaw cycles may have accounted for the observed discrepancies. To further investigate this possibility, the 11 discordant clinical samples were re-extracted from original respiratory swab matrix and re-analyzed by running both viral WGS and the DETECTR® assay in parallel. Re-testing of the samples resulted in nearly complete agreement between the two methods, with the exception of two SNPs that were identified as E484Q in two samples by WGS but were incorrectly called E484 (WT) by the DETECTR® assay (Fig. 3B-C and Fig. S8D). Thus, based on discrepancy testing, the positive predictive agreement (PPA) between the DETECTR® assay and viral WGS at all three WT and MUT SNP positions was 100% (272 of 272, p<2.2e-16 by Fisher’s Exact Test) (Fig. 3D). The corresponding negative predictive agreement (NPA) was 91.4% as the E484Q mutation for two SNPs was incorrectly classified as WT. Nevertheless, the final viral lineage classification for the 91 samples after discrepancy testing showed 100% agreement with viral WGS (Fig. 3D and Table S1).
Discussion
In this study, we developed a CRISPR-based DETECTR® assay for the detection of SARS-CoV-2 variants. We evaluated three CRISPR-Cas12 enzymes, two commercially available (LbCas12a from NEB and AsCas12a from IDT) and one proprietary (CasDx1 from Mammoth Biosciences). Based on a head-to-head comparison of these enzymes, we observed clear differences in performance, with CasDx1 demonstrating the highest fidelity as the only enzyme able to reliably detect all three of the targeted SNPs. A data analysis pipeline was developed to differentiate between WT and MUT signals with the DETECTR® assay, yielding an overall SNP concordance of 100% (272/272 total SNP calls) and 100% agreement with lineage classification compared to viral WGS. Taken together, these findings show robust agreement between the DETECTR® assay and viral WGS for identification of SNP mutations and variant categorization. Thus, the DETECTR® assay provides a faster and simpler alternative to sequencing-based methods for COVID-19 variant surveillance.
Our results show that the choice of Cas enzyme is important to maximize the accuracy of CRISPR-based diagnostic assays and may need to be tailored to the site that is being targeted. As currently configured with only three SNP targets, the DETECTR® assay cannot resolve individual major variants, except for Alpha. However, given the rapid emergence and dynamic shifts in the distribution of variants over time (13), it is likely that tracking of key mutations, several of which are suspected to arise by convergent evolution (36), rather than tracking of variants, will be critical for surveillance as the pandemic continues. Furthermore, we also develop a data analysis pipeline for CRISPR-based SNP calling that can readily incorporate additional targets and offers a blueprint for automated interpretation of fluorescent signals that will become more complex as the degree of multiplexing increases.
Although CRISPR-based diagnostic assays have been previously demonstrated for the detection of SARS-CoV-2 variants, these studies have limitations in coverage of circulating lineages and in the extent of clinical sample evaluation. For example, the miSHERLOCK variant assay uses LbCas12a (NEB) to detect N501Y, E484K and Y144Del covering eight lineages (WA-1, Alpha, Beta, Gamma, Eta, Iota, Mu and Zeta) and was tested only on contrived samples (RNA spiked into human saliva) (20). Additionally, the SHINEv2 assay uses LwaCas13a to detect 69/70Del, K417N/T, L452R and 156/157Del + R158G covering eight lineages (WA-1, Alpha, Beta, Gamma, Delta, Epsilon, Kappa and Mu) and was tested with only the 69/70Del gRNAs on 20 Alpha-positive NP clinical samples (37). In comparison, the DETECTR® assay presented here uses CasDx1 to detect N501Y, E484K and L452R covering 11 lineages (WA-1, Alpha, Beta, Gamma, Delta, Epsilon, Eta, Iota, Kappa, Mu and Zeta) and 91 clinical samples representing seven out of the 11 lineages were tested with successful detection of all 7.
Some limitations of our study are as follows. First, as previously mentioned, the DETECTR® assay currently detects only three SNPs, which may not provide enough resolution to identify a specific lineage. Second, we observed variable performance of the assay in SNP discrimination, with more potential overlap in the calls between WT and MUT for the 452 position than for the other two sites, increasing the risk of misidentification. Third, our 484 gRNA was unable to differentiate E484Q from E484 (WT) in clinical samples, which could impact the accuracy of lineage classification (we note, however, that the E484Q mutation corresponds to a different nucleotide position in the affected 484 codon than the E484K mutation). These first 3 limitations could potentially be addressed by the incorporation of additional gRNAs to the assay to provide specific and redundant coverage and to improve identification of specific lineages. Fourth, due to the multiplexed S-gene LAMP primer design, the limit of detection of the DETECTR® assay is higher than our previously published SARS-CoV-2 DETECTR® assay (19), and thus only clinical samples with a Ct < 30 were tested in our study. Incorporation of the N-gene target as a separate reaction in the assay may be necessary for covering the dynamic range of COVID-19 positive samples if simultaneous detection and SNP/variant identification is desired. Finally, the current study focuses on the development and validation of a variant DETECTR® assay using conventional laboratory equipment. Future work will involve implementation onto automated, portable systems for use in point of care settings.
In the near term, we suggest the use of the DETECTR® assay as an initial screen for the presence of a rare or novel variant (e.g., carrying both L452R and E484K or carrying all three SNPs) that could be reflexed to viral WGS. As the sequencing capacity for most clinical and public health laboratories is limited, the DETECTR® assay would thus enable rapid identification of variants circulating in the community to support outbreak investigation and public health containment efforts. Furthermore, identification of specific mutations associated with neutralizing antibody evasion, such as E484K (14), could inform patient care with regards to the use of monoclonal antibodies that remain effective in treating the infection (15). As the virus continues to mutate and evolve, the DETECTR® assay can be readily reconfigured by validating new gRNAs and pre-amplification LAMP primers and gRNAs that target emerging mutations with clinical and epidemiological significance. For example, we postulate that the newly emerging Omicron variant, containing at least 30 mutations in the spike protein and 11 mutations in the spike RGD region targeted by the assay, could be detected by increasing degeneracy in the LAMP primers and adding at least one gRNA to be able to distinguish this variant from the others. Over the longer term, a validated CRISPR assay that combines SARS-CoV-2 detection with variant identification would be useful as a tool for simultaneous COVID-19 diagnosis in individual patients and surveillance for infection control and public health purposes.
Materials and Methods
Human Sample Collection and Ethics Statement
Remnant nasopharyngeal and/or oropharyngeal (NP/OP) samples and plasma samples from laboratory confirmed SARS-CoV-2 positive patients were retrieved from the UCSF Clinical Laboratories and stored in a biorepository until processed. Remnant sample biobanking was performed with a waiver of consent and according to no-subject contact study protocols approved by the UCSF Institutional Review Board (protocol numbers 10-01116 and 11-05519).
Synthetic Gene Fragments
Wild-type (WT) and mutant (MUT) synthetic gene fragments (Twist) were PCR amplified using NEB 2x Phusion Master Mix following the manufacturer’s protocol. The amplified product was cleaned using AMPure XP beads following manufacturers protocol at a 0.7x concentration. The product was eluted in nuclease-free water and normalized to 10 nM. All nucleic acids used in this study are summarized in Table S2.
Clinical sample acquisition and extraction
De-identified residual SARS-CoV-2 RT-PCR positive nasopharyngeal and/or oropharyngeal (NP/OP) swab samples in universal transport media (UTM) or viral transport media (VTM) were obtained from the UCSF Clinical Microbiology Laboratory. All samples were stored in a biorepository according to protocols approved by the UCSF Institutional Review Board (protocol number 10-01116, 11-05519) until processed.
All NP/OP swab samples obtained from the UCSF Clinical Microbiology Laboratory were pretreated with DNA/RNA Shield (Zymo Research, # R1100-250) at a 1:1 ratio. The Mag-Bind Viral DNA/RNA 96 kit (Omega Bio-Tek, # M6246-03) on the KingFisher Flex (Thermo Fisher Scientific, # 5400630) was used for viral RNA extraction using an input volume of 200 μl of diluted NP/OP swab sample and an elution volume of 100 μl. The Taqpath™ COVID-19 RT-PCR kit (Thermo Fisher Scientific) was used to determine the N gene cycle threshold values.
Heat-inactivated culture acquisition and extraction
Heat-inactivated cultures of SARS-CoV-2 Variants Being Monitored (VBM), Variants of Concern (VOC) or Variants of Interest (VOI) were provided by the California Department of Public Health (CDPH).
RNA from heat-inactivated SARS-CoV-2 VBM/VOC/VOI isolates were extracted using the EZ1 Virus Mini Kit v2.0 (Qiagen, # 955134) on the EZ1 Advanced XL (Qiagen, # 9001875) according to the manufacturer’s instructions. For each culture, six replicate LAMP reactions were pooled into a single sample. DETECTR® was performed on a 1:10 dilution of the 10,000 cp/rxn LAMP amplification products.
COVID-19 variant DETECTR® assay
Two LAMP primer sets, each containing 6 primers, were designed to target the L452R, E484K and N501Y mutations in the SARS-CoV-2 Spike (S) protein (Supplemental Table). Sets of LAMP primers were designed from a 350 bp target sequence spanning the 3 mutations using Primer Explorer V5 (https://primerexplorer.jp/e/). Candidate primers were manually evaluated for inclusion using the OligoCalc online oligonucleotide properties calculator (38) while ensuring that there was no overlap with either primers from the other set or guide RNA target regions that included the L452R, E484K, and N501Y mutations.
Multiplexed RT-LAMP was performed using a final reaction volume of 50 μl, which consisted of 8 μl RNA template, 5 μl of L452R primer set (Eurofins Genomics), 5 μl of E484K/N501Y primer set, 17 μl of nuclease-free water, 1 μl of SYTO-9 dye (ThermoFisher Scientific), and 14 μl of LAMP mastermix. Each of the primer sets consisted of 1.6 μM each of inner primers FIP and BIP, 0.2 µM each of outer primers F3 and B3, and 0.8 µM each of loop primers LF and LB). The LAMP mastermix contained 6 mM of MgSO4, isothermal amplification buffer at 1X final concentration, 1.5 mM of dNTP mix (NEB), 8 units of Bst 2.0 WarmStart DNA Polymerase (NEB), and 0.5 ul of WarmStart RTx Reverse Transcriptase (NEB). Plates were incubated at 65°C for 40 minutes in a real-time Quantstudio™ 5 PCR instrument. Fluorescent signals were collected every 60 seconds
40nM CasDx1 (Mammoth Biosciences), LbCas12a (EnGen® Lba Cas12a, NEB) or AsCas12a (Alt-R® A.s. Cas12a, IDT) protein targeting the WT or MUT SNP at L452(R), E484(K) or N501(Y) was incubated with 40nM gRNA in 1X buffer (MBuffer3 for CasDx1, NEBuffer r2.1 for LbCas12a and AsCas12a) for 30 min at 37°C. Dx1 gRNAs were used with both CasDx1 and LbCas12a, whereas AsCas12a gRNAs were used with AsCas12a (Table S2). 100nM ssDNA reporter (/5Alex594N/TTATTATT/3IAbRQSp/, IDT) was added to the RNA-protein complex. 18μL of this DETECTR master mix was combined with 2 μL target amplicon. The DETECTR assays were monitored for 30 min at 37°C in a plate reader (Tecan).
Digital PCR
Samples were evaluated at 3 dilutions (1:100; 1:1,000; and 1:10,000) using the ApexBio Covid-19 Multiplex Digital PCR Detection Kit (Stilla Technologies) according to the manufacturer’s protocol. The controls (positive and negative provided by UCSF, the Kit Controls, and an internal control) were run with the samples in duplicate. The dilutions were used to determine the most accurate concentration which was determined from the N gene concentration.
Sequencing methods
Complementary DNA (cDNA) synthesis from RNA via reverse transcription and tiling multiplexed amplicon PCR were performed using SARS-CoV-2 primers version 3 according to a published protocol (39). Libraries were constructed by ligating adapters to the amplicon products using NEBNext Ultra II DNA Library Prep Kit for Illumina (New England Biolabs, # E7645L), barcoding using NEBNext Multiplex Oligos for Illumina (New England Biolabs, # E6440L), and purification with AMPure XP (Beckman-Coulter, # 63880). Final pooled libraries were sequenced on either Illumina NextSeq 550 or Novaseq 6000 as 1×300 single-end reads (300 cycles).
SARS-CoV-2 viral genome assembly and variant analyses were performed using an in-house bioinformatics pipeline. Briefly, sequencing reads generated by Illumina sequencers (NextSeq 550 or NovaSeq 6000) were demultiplexed and converted to FASTQ files using bcl2fastq (v2.20.0.422). Raw FASTQ files were first screened for SARS-CoV-2 sequences using BLASTn (BLAST+ package 2.9.0) alignment against the Wuhan-Hu-1 SARS-CoV-2 viral reference genome (NC_045512). Reads containing adapters, the ARTIC primer sequences, and low-quality reads were filtered using BBDuk (version 38.87) and then mapped to the NC_045512 reference genome using BBMap (version 38.87). Variants were called with CallVariants and a depth cutoff of 5 was used to generate the final assembly. Pangolin software (version 3.0.6) (40, 41) was used to identify the lineage. Using a custom in-house script, consensus FASTA files generated by the genome assembly pipeline were scanned to confirm L452R, E484K, and N501Y mutations.
Discordant sample retesting
Eleven samples were re-extracted as described above for the NP/OP swab samples and evaluated by viral WGS as described above. The samples were then thawed (XXX freeze/thaws) and amplified using the LAMP protocol described above and evaluated using the DETECTR® assay as described above.
DETECTR® data analysis pipeline
Quality Control Metric for the LAMP Reaction
Prior to processing DETECTR® data from the clinical samples, we collected data indicating the success or failure of the samples to amplify in the LAMP reaction. The absolute truth was based on visual inspection of LAMP curves This absolute truth was used to develop thresholds for the LAMP reactions. The positive and negative controls from the LAMP reactions were used to derive the thresholds to qualify the samples. Two sets of thresholds were used: time threshold and fluorescence rate threshold. The positive LAMP controls were assumed to represent an ideal sample and displayed a classic sigmoidal rise of fluorescence over time and the NTC represented the background fluorescence. It was hypothesized that a sample will ideally have positive control like fluorescence kinetics. However due to the presence of high background in some samples, a mean value between controls for each plate was chosen as threshold. After this, the fluorescence values at a time threshold of 18 minutes were collected. The time point is of importance here to rule out those samples that would amplify closer to the endpoint, signifying the LAMP intermediates to be the majority contributors of the rise in the signal and not the actual sample itself. A score was assigned for each sample which was calculated as a ratio of rate of fluorescence rate threshold to the rate of fluorescence value at 18 minutes for each sample. The hypothesis is that if this ratio of rate of fluorescence between controls and samples is less than 1, then samples have failed to reach the minimum fluorescence required to be called out as amplified and if the ratio is greater than or equal to 1, then samples have amplified sufficiently. To identify the exact score value for a qualitative QC metric, an ROC analysis was done on scores and the absolute truth (Fig. S6).
Data Analysis for CRISPR-based SNP calling
Each well has a guide specific to the mutant or the wild-type SNP. The comparison is important to assign a genotypic call to the sample. The DETECTR® reactions across the plate are not comparable to each other. For this purpose, the endpoint fluorescence intensities are normalized in each well to its own minimum intensity. This term is called fluorescence yield. The fluorescence yield can be compared across wells in a plate under the assumption that each well will have a similar minimum fluorescence starting point. Irrespective of the highest levels of the fluorescence intensities observed across samples, the yield for a given target must ideally remain the same assuming that similar concentrations of samples/target are being compared. This aids in normalizing the signal and comparing replicates across the wells in the same plate.
The wildtype and mutant target guides on NTC must ideally not show any change in intensity over time. The fluorescence yield for NTC must remain constant across replicates, plates and close to 1.
On the contrary, if a sample has a fluorescence yield of 1, then it qualifies for a No Call.
General rules for variant calling
NTC was assigned NTC
If the Contrast of the sample for a SNP was between minimum and maximum contrast for the plate, then the sample is assigned a NoCall.
If the Size of the sample is lower than the Size of the NTC on the plate, then the sample is assigned a NoCall.
SNP Calls
We used the following procedure to evaluate the concordance between sequencing and DETECTR® technologies for genotypic classification of the clinical cohort dataset. First, we considered all samples and SNPs for which both sequencing and DETECTR® data was present in the distributed files by matching the SNP IDs and sample names. This included cleaning and curing the dataset which had failed LAMP reactions and identifying WT and MUT based on the spacer fluorescent. This yielded a preliminary data set containing 279 calls across three SNPs against 93 samples. After eliminating samples that had failed to amplify in the LAMP reaction but were assigned a genotype, the resulting final analysis data consisted of 272 calls (WT, MUT and NoCall) spread across three SNPs and 91 samples. For each of the three SNPs in the analysis data set, we identified and recorded both sequencing and DETECTR® genotypes (including NoCalls and LAMP Fails) for each of the 93 patients. The 91 patients include the individuals for whom actual sequencing data was available.
Statistical analysis
SNP Calls
For each SNP in the analysis, we computed a variety of statistics evaluating the concordance between genotype calls on the two different technologies. The concordant and discordant genotypes were visualized through contingency tables. For each SNP, there are three possible genotypes (WT, MUT and No Call). The concordance rates were calculated without the samples that failed the LAMP reaction (Fig. 3B and Table S1). The 2×2 cross tables classify all three SNPs across all the samples between sequencing and DETECTR® technologies (Fig. 3B and Table S1). The data transformation and statistical analysis was done in R (42).
Author contributions
C.L.F., J.S.C., and C.Y.C. conceived and designed the study. C.Y.C, and V.S. coordinated the SARS-CoV-2 whole-genome sequencing efforts and RT-LAMP primer design and testing. C.L.F., J.P.B., J.C., and J.S.C. designed guide RNAs for CRISPR-Cas12 testing. B.M., J.P.B., R.N.D., E.S., and C.G.H. tested guide RNAs and ran DETECTR® experiments. V.S., N.B., B.W., A.S.-G., K.R., J.S., S.M., and C.Y.C. collected samples. C.L.F., V.S., B.M., V.N., J.P.B., J.C., J.S.C., and C.Y.C. analyzed data. C.L.F., V.S., V.N., B.M., J.P.B., J.C., J.S.C., and C.Y.C. wrote the manuscript. All authors read the manuscript and agree to its contents.
Competing interests
C.Y.C. is the director of the UCSF-Abbott Viral Diagnostics and Discovery Center and receives research support from Abbott Laboratories, Inc. C.L.F., B.M., V.N., J.P.B., R.N.D., E.S., C.G.H., J.C., and J.S.C. are employees of Mammoth Biosciences. C.Y.C. is a member of the scientific advisory board for Mammoth Biosciences. The other authors declare no competing interests.
Data and Materials Availability
All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. The CasDx1 protein can be provided by Mammoth Biosciences to the extent feasible, pending scientific review and a completed material transfer agreement. Requests for the CasDx1 protein should be submitted to Janice Chen at janice{at}mammothbiosci.com.
Supplementary Figures
(A) Raw fluorescence curves from three Cas12 enzymes (CasDx1, LbCas12a, AsCas12a) complexed with WT and MUT SNP gRNAs run on PCR-amplified gene fragments representing WT and MUT SNP targets. (B) Raw fluorescence curves from three Cas12 enzymes (CasDx1, LbCas12a, AsCas12a) on eight heat-inactivated viral culture samples from various SARS-CoV-2 lineages, a no target control (RT-LAMP) and CasDx1 detection controls (WT, MUT and NTC). CasDx1 replicates (n = 3), ±1.0SD
The raw fluorescence RT-LAMP amplification curves for each of the clinical samples analyzed (n = 3 replicates). Each line is representative of the median ±1.0SD of the three RT-LAMP replicates for each sample. RT-LAMP replicates that passed QC are represented in navy blue and failed LAMP replicates are shown in orange. Only valid RT-LAMP replicates were used in subsequent data analysis.
Each clinical sample was amplified with RT-LAMP in triplicate, and the resulting amplicons were detected by CasDx1 in triplicate. The raw fluorescence curves show WT detection in blue and MUT detection in green. Each line is representative of the median ±1.0SD of the CasDx1 replicates (n = 3) for each WT and MUT guide for each of the RT-LAMP replicates (n = 3), represented by different patterned lines.
(A) Allele discrimination plot visualizing the scaled signals from the COVID Variant DETECTR® assay on gene fragments. The allele discrimination plots represent scatter plots of scaled WT and MUT fluorescence values plotted against each other. (B) Contrast-Size plots of the COVID Variant DETECTR® assay data on gene fragments to decrease ambiguity of the scaled signals, a ratio of the WT and MUT transformed values are plotted against the average of the WT and MUT transformed values on the MA plot. (C) Allele discrimination plot visualizing the scaled signals from the COVID Variant DETECTR® assay on clinical sample. (D) MA plots of the COVID Variant DETECTR® assay on the gene fragments (n = 30 WT; n = 30 MUT for each SNP) and no template controls (n = 33 WT; n = 33 MUT for each SNP) used to test the data analysis.
(A) The DETECTR® assay workflow from LAMP amplification to SNP identification. (B) Schematic showing the relationship between clinical samples, LAMP replicates and CasDx1 replicates that culminate in a final SNP call. (C) Heat map showing CasDx1 signal (n = 3) per every LAMP replicate (n = 3) for each SNP on every clinical sample reflecting samples prior to discordance testing.
Thresholds for LAMP quality analysis were derived to determine which samples had amplified sufficiently. The exact score value for this qualitative QC metric was determined using a ROC analysis.
Box plots of all the clinical samples illustrate the spread of the scaled signals for each of the samples across the replicates in the experiment. SNP calls were made on each sample agreement with the median values depicted on the box plot of the sample, which also provided an analytical confirmation of the DETECTR® results.
(A) Raw fluorescence CasDx1 curves for the clinical samples with discordant DETECTR® and WGS results. WT detection is represented by blue lines and MUT detection is represented by green lines. Each line is representative of the median ±1.0SD of the CasDx1 replicates (n = 3) for each guide for each of the LAMP replicates (n = 3), and each RT-LAMP replicate is represented by different patterned lines. (B) Visualization of the COVID Variant DETECTR® and SARS-CoV-2 WGS assays showing the alignment of final calls. Across all of the clinical samples in this cohort, 80 out of the 91 clinical sample COVID Variant DETECTR® assay calls were consistent with the SARS-CoV-2 WGS calls. (C) Summary of re-testing of discordant samples from the original clinical sample shows nearly all SNP discrepancies are resolved.
A summary table of the final SNP calls from the DETECTR® assay and the SARS-CoV-2 whole genome sequencing assay after discordant testing. The table includes the lineage classification from DETECTR® calls as well as the PANGO lineage and WHO labels assigned to the WGS calls. Ct values from running an FDA EUA authorized SARS-CoV-2 RT-PCR assay, the Taqpath™ COVID-19 RT-PCR kit, are shown. Discordant samples were reflexed back for reprocessing (*); COVID-63 was classified as a Delta variant by WGS despite its Q484 SNP call. (†).
A list of guide RNAs, reporter molecules, LAMP primers and synthetic gene fragment targets with their respective suppliers.
Acknowledgments
We thank the UCSF Center for Advanced Technology core facility (Delsy Martinez and Tyler Miyasaki) for their efforts in high-throughput sequencing of viral cDNA libraries using the Illumina NovaSeq 6000 instrument, and Mary Kate Morris from the California Department of Public Health for providing the heat-inactivated viral cultures. We also thank Lucas Harrington and Teresa Peterson for their critical review of this manuscript.
This work has been funded by the Innovative Genomics Institute (IGI) at UC Berkeley and UC San Francisco (C.Y.C.), the Sandler Program for Breakthrough Biomedical Research (C.Y.C.), US Centers for Disease Control and Prevention contract 75D30121C10991 (C.Y.C.), and Mammoth Biosciences.