Abstract
Background Mycobacterium tuberculosis whole-genome sequencing (WGS) using Illumina technology has been widely adopted for genotypic drug susceptibility testing (DST) and outbreak investigation. Oxford Nanopore Technologies is reported to have higher error rates but has not been thoroughly evaluated for these applications.
Methods We analyse 151 isolates from Madagascar, South Africa and England with phenotypic DST and matched Illumina and Nanopore data. Using PacBio assemblies, we select Nanopore filters for BCFtools (software) detection of single nucleotide polymorphisms (SNPs). We compare transmission clusters identified by Nanopore and the United Kingdom Health Security Agency Illumina pipeline (COMPASS). We compare Illumina and Nanopore WGS-based DST predictions using Mykrobe (software).
Findings Nanopore/BCFtools identifies SNPs with median precision/recall of 99·5/90·2% compared with 99·6/91·9% for Illumina/COMPASS. Using a threshold of 12 SNPs for putative transmission clusters, Illumina identifies 98 isolates as unrelated and 53 as belonging to 19 distinct clusters (size range 2-7). Nanopore reproduces this distribution with addition of 5 singleton isolates to distinct clusters and merging of two cluster pairs. Illumina-based clusters are also replicated using a 5 SNP threshold. Clustering accuracy is maintained using mixed Illumina/Nanopore datasets. Genotyping resistance variants is highly concordant, with 0(4) discordant SNPs (indels) across 151 isolates genotyped at >3000 (60,000) SNPs (indels).
Interpretation Illumina and Nanopore sequence data provide comparable cluster-identification and DST results.
Funding Academy for Medical Sciences (SGL018\110), Oxford Wellcome Institutional Strategic Support Fund (ISSF TT17 4). Swiss South Africa Joint Research Award (Swiss national science Foundation and South African national research foundation).
Evidence before this study Two key types of information can be obtained from laboratory testing of M. tuberculosis isolates to help directly guide public health interventions: drug susceptibility testing (DST) to guide therapy, and bacterial typing to enrich understanding of the epidemiology and guide interventions to mitigate transmission.
DST is typically performed by the “gold standard” culture-based phenotyping method or nucleic acid amplification assays targeting specific resistance-conferring mutations. Studies over the last 7 years have shown that prediction of susceptibility profile using Illumina-technology genome sequence data is possible, and can be automated. In a key publication, the CRyPTIC consortium and UK 100,000 Genomes project evaluated the method on over 10,000 genomes including prospectively sampled isolates and showed that for first-line tuberculosis (TB) drugs (isoniazid, rifampicin, ethambutol, pyrazinamide) a pan-susceptibility profile is accurate enough to be used clinically. The genetic basis of resistance remains imperfectly understood for second-line TB drugs, in particular for new and repurposed drugs (bedaquiline, clofazimine, delamanid, linezolid). Prior work in the field of genotypic DST was heavily based on Illumina technology, which provides short (70-300 base pair) sequence reads of very high quality. Many different softwares (e.g. TBProfiler, Mykrobe, MTBseq, kvarq) have been designed for sequence analysis and genotypic DST. However, the increasingly used Nanopore sequencing platforms yield very different data with much longer sequence reads (frequently over 1kb) and higher error rates including systematic biases. To date, very limited evaluation of Nanopore-based drug susceptibility prediction has been performed using the only two compatible tools (Mykrobe (n=5 independent samples), TBProfiler (n=3 independent samples)).
Molecular typing of M. tuberculosis allows lineage identification and detection of putative transmission clusters. In the last decade, multiple M. tuberculosis molecular epidemiology studies have shown how genomic information can complement traditional epidemiology in identifying person-to-person transmission clusters with a high level of resolution. Typically, the number of single nucleotide polymorphism (SNP) disagreements between genomes, or SNP distance, is calculated and single-linkage clustering is performed for genomes falling within retrospectively established transmission thresholds of either 5 or 12 SNPs. Just as with DST, these thresholds were established with Illumina sequencing data. The increased error rate in Nanopore sequencing is believed to lead to inflated SNP distances if standard genome analysis tools are used. Prior to this study it was unknown what impact on isolate-clustering this would incur.
Added value of this study Full-scale adoption of genomic sequencing in tuberculosis reference laboratories has so far taken place in a limited number of settings - England, the Netherlands, and New York State - all using Illumina-based sequencing data. Building on current evidence, specific WHO technical guidance and diversification and democratisation of technology, sequencing is expected to be increasingly used in tuberculosis control globally. For the first time, our study offers 4 key deliverables intended to inform adoption of Nanopore technology as an alternative, or a complement, to Illumina. First: a systematic head-to-head comparison of Nanopore and Illumina data for M. tuberculosis drug susceptibility profiling and isolate clustering, including quantitative metrics for cluster precision and recall. Second: an assessment of the impact of mixed Illumina and Nanopore data on clustering which represents an increasingly common challenge. Third: an open-source software pipeline allowing research and reference laboratories to replicate our analytical approach. Fourth: a publicly available curated test set of 151 isolates, including matched Illumina and Nanopore sequence data, and (for a subset of seven isolates) high-quality PacBio assemblies, for method development and validation.
Implications of all the available evidence Catalogues of drug resistance conferring mutations will keep improving, especially for new and repurposed drugs. Our data confirms that Illumina and Nanopore sequencing technologies can be used to identify those mutations equally accurately in M. tuberculosis. Bacterial molecular typing is constantly shown to support the understanding of disease transmission and tuberculosis control in new settings. The bioinformatics tools and filters we have developed, assessed, and made publicly available allow the use of Nanopore or mixed-technology data to appropriately cluster genetically related isolates. We provide a measure of the expected level of over-clustering associated with Nanopore technology. This study confirms that Illumina and Nanopore sequence data provide comparable DST results and isolate cluster-identification.
Competing Interest Statement
ZI, SGL and NR had travel and accommodation costs reimbursed when speaking at an Oxford Nanopore Technology (ONT) conference in 2017. SGL and NR previously received consumables from ONT when establishing Nanopore sequencing capacity in Madagascar. ONT matched the contributions from the Longitude Prize Discovery Award to ZI, TW in 2017 to provide consumables for sequencing in Vietnam and India. ONT did not provide funding (direct or in kind) for this project, and had no input or knowledge of the design, data analysis or paper writing. Funders had no input into the design, data analysis or paper writing of this project.
Funding Statement
SGL is supported by the Fond de Recherche Sante Quebec. TMW is a Wellcome Trust Clinical Career Development Fellow (214560/Z/18/Z). AK was supported by a Carnegie Corporation Developing Emerging Academic Leaders Early Career Fellowship. AK, MG and RW acknowledge support from the Tuberculosis Omics Research Consortium, headed by Prof Annelies Van Rie, funded by the Research Foundation Flanders (FWO), under grant No. G0F8316N (FWO Odysseus). This study was funded inpart by by the Wellcome Trust [214560/Z/18/Z; ISSF 204826/Z/16/Z].
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Footnotes
↵* Joint senior authors
Data Availability
All data produced in the present work are contained in the manuscript