Abstract
Background Enteric fever is a serious public health concern. The causative agents, Salmonella enterica serovars Typhi and Paratyphi A, are frequently antimicrobial resistant (AMR), leading to limited treatment options and poorer clinical outcomes. We investigated the genomic epidemiology, resistance mechanisms and transmission dynamics of these pathogens at three urban sites in Africa and Asia.
Methods Bacteria isolated from febrile children and adults at study sites in Dhaka, Kathmandu, and Blantyre were sequenced and AMR determinants identified. Phylogenomic analyses incorporating globally-representative genome data, and ancestral state reconstruction, were used to differentiate locally-circulating from imported pathogen variants.
Findings S. Paratyphi A was present in Dhaka and Kathmandu but not Blantyre. S. Typhi genotype 4.3.1 (H58) was common in all sites, but with different dominant variants (4.3.1.1.EA1 in Blantyre; 4.3.1.1 in Dhaka; 4.3.1.2 in Kathmandu). Resistance to first-line antimicrobials was common in Blantyre (98%) and Dhaka (32%) but not Kathmandu (1.4%). Quinolone-resistance mutations were common in Dhaka (99.8%) and Kathmandu (89%) but not Blantyre (2.1%). AcrB azithromycin-resistance mutations were rare (Dhaka only; n=5, 1.1%). Phylogenetic analyses showed that (a) most cases derived from pre-existing, locally- established pathogen variants; (b) nearly all (98%) drug-resistant infections resulted from local circulation of AMR variants, not imported variants or recent de novo emergence; (c) pathogen variants circulated across age groups. Most cases (67%) clustered with others that were indistinguishable by point mutations; individual clusters included multiple age groups and persisted for up to 2.3 years, and AMR determinants were invariant within clusters.
Interpretation Enteric fever was associated with locally-established pathogen variants that circulate across age groups. AMR infections resulted from local transmission of resistant strains. These results form a baseline against which to monitor the impacts of control measures.
Funding Wellcome Trust, Bill & Melinda Gates Foundation, European Union’s Horizon 2020, NIHR.
Evidence before this study Current knowledge of the enteric fever pathogen populations in Dhaka, Kathmandu, and Blantyre comes from retrospective analysis of isolates captured from routine diagnostics or treatment trials. Due to these study designs, most focus on either adult or paediatric cohorts, which complicates assessment of pathogen variant transmission across age groups. Many studies report prevalence of antimicrobial resistance (AMR) and associated mechanisms amongst enteric fever cases. Genomic studies at these sites and elsewhere have identified the spread of AMR clones, and a recent genomic study quantified the inter- and intra-continental spread of resistant S. Typhi between countries. However, PubMed search of “(typhoid OR (enteric fever)) AND (genom*)” identified no studies quantifying the relative proportion of resistant infections that is attributable to local transmission of resistant variants vs imported strains or de novo emergence of AMR.
Added value of this study We estimate the vast majority (98%) of drug-resistant enteric fever cases identified in our study resulted from local circulation of resistant variants.
Further, we show genetically indistinguishable pathogen variants (either resistant or susceptible) persisting for up to 2.3 years and causing infections across all age groups (under 5 years; 5-15 years; ≥15 years).
Implications of all the available evidence While inter-country transfer of resistant enteric fever pathogens does occur and is concerning, the burden of drug-resistant enteric fever at the study sites is currently caused mainly by transmission of locally-established variants, and transmits across age groups. These data confirm assumptions made in models of vaccine impact regarding heterogeneity of pathogen variants and AMR across age groups, and support that childhood immunisation programmes can be expected to reduce the overall burden of resistant infections in endemic settings.
Introduction
The global burden of enteric fever is estimated at 14.3 million cases annually1 and is concentrated in South Asia and Sub-Saharan Africa. The causative agents are Salmonella enterica serovars Typhi and Paratyphi A (S. Typhi and S. Paratyphi A), though the latter is rarely identified in Africa. Mortality and complications occur at significantly higher rates in the absence of effective antimicrobial therapy. Antimicrobial resistant (AMR) pathogen variants are common, but the specific drugs and causative mechanisms vary widely. Most regions have seen a decline in resistance to the older first-line drugs (chloramphenicol, co- trimoxazole and ampicillin; the combination of which is defined as multi-drug resistant, MDR) and an increase in fluoroquinolone non-susceptibility (ciprofloxacin MIC ≥0.06 mg/L)2, 3. The emergence of resistance to commonly-used oral agents cefixime and azithromycin has been reported sporadically, mainly in South Asia4–7, and extensively-drug resistant (XDR, defined as MDR plus resistant to ciprofloxacin and third-generation cephalosporins) S. Typhi is now common in Pakistan4. Both regional transfer and local clonal expansion of AMR variants have been documented7–10, but the extent to which AMR disease burden is attributable to transmission of endemic AMR variants, versus importation of exogenous AMR variants or de novo emergence of AMR in the local population, has not been specifically quantified in settings with endemic disease.
Vaccines against S. Typhi infection have been used in travellers for decades, but mass immunisation programmes have not been applied in most endemic areas. Recently-licensed Gavi-supported typhoid conjugate vaccines (TCVs) offer new opportunities to reduce the burden of S. Typhi infection11. Trials have shown these vaccines to be safe and immunogenic in children (from 9 months of age), with >80% efficacy against incident S. Typhi infection12–15. In Pakistan and Zimbabwe, TCV immunisation campaigns have been deployed to help control XDR and ciprofloxacin-resistant S. Typhi outbreaks, respectively16, 17. These successful campaigns were followed by introduction of national typhoid immunisation programmes18, 19, which are now being considered by several other countries.
It is important to monitor the impact of new vaccine programmes on the underlying pathogen populations. TCVs do not cross-protect against S. Paratyphi A infection, and S. Paratyphi A may expand to fill the niche. TCVs are effective against some AMR S. Typhi18, 19, but it is unknown whether they will be equally effective against all S. Typhi variants, or promote emergence of vaccine-escape mutants or AMR variants. Baseline data on pathogen populations is therefore essential; whole-genome sequencing (WGS) is now the standard as it provides high-resolution data simultaneously on lineage diversity, resistance mechanisms and transmission dynamics. We recently assessed the enteric fever burden in three urban sites (Blantyre, Malawi; Kathmandu, Nepal; Dhaka, Bangladesh) as part of the Strategic Typhoid Alliance across Africa and Asia (STRATAA)20. Current knowledge of the enteric fever pathogen populations at STRATAA sites comes from retrospective analysis of isolates captured from routine diagnostics3 or treatment trials10 conducted using different protocols, and usually separated into adult or paediatric focused studies3, 10. The available data indicate that S. Typhi H58 (subclade 4.3.1 and derived genotypes) has been dominant across South Asia as well as Eastern and Southern Africa, including the STRATAA sites, for many years2, 3, 8–10. In Malawi, S. Typhi epidemics have been documented since the late 1990s and are now associated almost entirely with MDR genotype 4.3.1, which has been clonally expanding since its arrival in 2009; the disease is now considered endemic8, 9. In contrast, in Bangladesh and Nepal, S. Typhi and S. Paratyphi A have been hyperendemic for decades and probably centuries; this is reflected in a diversity of co-circulating pathogen genotypes, although the populations have been dominated by 4.3.1 for the last two decades2, 3, 21.
Here we use WGS to investigate the pathogen populations underlying enteric fever at STRATAA sites, collected prospectively in defined catchment areas using the same protocol, providing valuable baseline data ahead of vaccine trials12–15 and planned national immunisation programmes19. We characterise the pathogen populations and AMR determinants, investigate transmission patterns across age groups, and quantify the proportion of AMR cases that are attributable to local transmission of endemic AMR variants.
Results
Pathogen diversity and population structure
We sequenced n=731 unique typhoidal Salmonella blood-culture isolates from febrile individuals recruited in the STRATAA catchment areas during passive surveillance for enteric fever20, 22 (see Methods, Tables 1 and S1). The distribution of serotypes and genotypes is shown in Fig. 1. Genomic analysis confirmed20 that serovar Paratyphi A was present in both the South Asian sites (n=95, 21.0% of sequenced cases in Dhaka; n=14, 10.1% of sequenced cases in Kathmandu) but not in Blantyre. Across all three sites, S. Typhi 4.3.1 (H58) genotypes were most frequently observed, representing between 42.3-97.9% of sequenced enteric fever cases (Table 1). However, different H58 subtypes were present at each of the three sites, which also differed markedly in terms of the overall diversity of pathogen variants causing enteric fever (Fig. 1, Table 1). Enteric fever cases in Blantyre were caused almost exclusively by S. Typhi 4.3.1.1.EA1 (97.9%), and thus displayed very low genotype diversity (Simpson’s index = 0.05). Enteric fever cases in Kathmandu were more diverse (Simpson’s index = 0.67), with one-third caused by a high abundance non-H58 S. Typhi genotype (3.3.2) and one-half by S. Typhi 4.3.1.2 (Fig. 1, Table 1). Dhaka was the most diverse setting (Simpson’s index = 0.80), with two common lineages of H58 S. Typhi (genotypes 4.3.1.1, 39.2% and 4.3.1.3.Bdq, 2.7%), four non-H58 S. Typhi genotypes with ≥25 sequenced cases each (2.0.1, 2.1.7, 2.3.3, 3.2.2), and two common S. Paratyphi A genotypes each associated with ≥20 sequenced cases (2.3.3 and 2.4.4) (Fig. 1, Table 1).
We considered three age groups that reflect broadly different contact networks: pre-school age (under 5 years and interacting mainly with other household members), school-age children (5-15 years, interacting with household members and other school-age children), and working age (≥15 years old, interacting with household members and the wider community). In all settings, all three age groups were infected with a diverse range of S. Typhi and S. Paratyphi A genotypes (Fig. 2a). Similarly, at the sub-genotype level, the age groups were intermingled in the maximum-phylogenetic trees (Figs. S4-6), with no evidence of clustering by patient age group (K=7.16x10-6-2.24x10-5, where K<1 indicates no signal). Within each site, the local prevalence of S. Paratyphi A, and of H58 amongst S. Typhi, was similar across age groups and sexes (Fig. 2a). In Dhaka, where the diversity of serovars and genotypes was greatest, there was a significant association between increasing age groups and prevalence of S. Paratyphi A (OR 1.93, [95% CI, 1.39-2.71], p=1x10-4 using logistic regression; Fig. 2b), and of non-H58 S. Typhi genotypes (OR 1.5 [95% CI, 1.08-1.95], p=0.01; Fig. S1a) (see Table S2). Across all sites, n=93 enteric fever cases were considered severe (see Methods); these were all caused by S. Typhi and were less common in Kathmandu (n=13, 10.5% vs n=62, 17.4% in Dhaka and n=18, 12.8% in Blantyre; see Fig. 2c, Table S3). Severe disease was not significantly associated with patient age-group or sex, S. Typhi genotype (H58 vs other) or MDR using logistic regression models (Fig. 2b, Table S3).
At each STRATAA site, all local pathogen variants co-circulated throughout the period of surveillance (Fig. 1). The dominant genotypes detected in each setting matched those identified in earlier studies2, 3, 9, and contextualisation with global genomes supports that most cases derive from locally-established pathogen variants that are now endemic in their respective settings (Fig. S2). The exception to this was a cluster of genotype 3.3.2 in Kathmandu, which appears to have been imported from elsewhere in South Asia, with ancestral sequences isolated from Bangladesh (Fig. S3a). This 3.3.2 cluster was first isolated in the Kathmandu STRATAA catchment in February 2017 and increased in numbers, peaking in November and December 2017 (85-90% of monthly S. Typhi cases, n=9-6 per month), before declining (Fig. S3b).
Antimicrobial resistance
Genetic mechanisms of resistance are summarised in Table 2. Acquired resistance genes for first-line drugs, associated with MDR, were detected in S. Typhi 4.3.1.1 from Blantyre and Dhaka. Plasmid-borne quinolone-resistance gene qnrS was detected in S. Typhi 4.3.1.3.Bdq in Dhaka, and mutations in the quinolone resistance determining region (QRDR, associated with non-susceptibility to fluoroquinolones) were common across all S. Typhi and S. Paratyphi A genotypes in Dhaka and Kathmandu but present in just n=3 4.3.1.1.EA1 in Blantyre. Azithromycin resistance-associated mutation acrB-R717Q was detected in two S. Typhi and three S. Paratyphi A from Dhaka.
Acquired resistance genes were almost exclusively integrated into the S. Typhi chromosome, most frequently mediated by the translocation of a MDR Tn2670-like composite transposon (carrying genes catA1, dfrA7, blaTEM-1, strAB, sul1 and sul2; 38.4% of all genomes in the main surveillance periods), and occasionally by transposon Tn2670 (carrying genes catA1, dfrA7, and sul1; n=27) or Tn6029 (carrying genes blaTEM-1, strAB, sul2; n=1). The exception to this was the qnrS gene, which was carried by an IncFIBK plasmid (also carrying genes blaTEM-1, sul2, tet(A), n=12). In Dhaka, MDR S. Typhi was significantly less common in older age groups (OR 0.70, [95% CI, 0.52-0.94], p=0.02) (Fig. S1b).
Antimicrobial susceptibility phenotypes were available for a subset of isolates and non- susceptibility was mostly well-predicted by known molecular mechanisms (in S. Typhi, 99% positive predictive value for first-line drugs and >96% for ciprofloxacin; in S. Paratyphi A, 99% positive predictive value for ciprofloxacin; Tables S4-5). Phenotypic assessment of azithromycin susceptibility is challenging23 and susceptibility thresholds are poorly defined. Three of five acrB mutants tested resistant for azithromycin. Several other isolates showed azithromycin-resistant phenotypes but had wildtype acrB genes and 23S alleles, no acquired macrolide resistance genes, and a genome-wide screen did not identify any novel variants associated with the reported phenotypes (see Supplementary Methods).
We used ancestral state reconstruction of AMR determinants on a global phylogeny to differentiate emergence of AMR from transmission of resistant strains (summarised in Table 3). In Blantyre, all MDR cases were attributed to local transmission of MDR S. Typhi 4.3.1.1.EA1 (Fig. S4)9. In contrast, all three instances of QRDR mutations in Blantyre were attributed to local evolution, arising independently in the endemic MDR S. Typhi 4.3.1.1.EA1 strain background and with no evidence of transmission to secondary cases (Fig. S4).
In Dhaka, all resistance to first-line drugs was attributed to local transmission of S. Typhi 4.3.1.1 (MDR) and 4.3.1.3.Bdq (ampicillin-resistant) populations that had become endemic prior to the surveillance period (Fig. S5)2, 33. Nearly all of the 451 sequences carrying QRDR mutations in Dhaka (n=356 S. Typhi, n=95 S. Paratyphi A) were also attributed to local transmission of endemic strains (Table 3, Fig. S5). The exceptions were one S. Paratyphi A 2.4.4 (Fig. S3c; de novo evolution), and one S. Typhi 4.3.1.2 that was likely imported from India3, 7, 10 (Fig. S3d). Notably, the other ciprofloxacin-resistant S. Typhi cases in Dhaka were all attributed to local transmission of the endemic strain 4.3.1.3.Bdq (Fig. S5). AcrB-R717Q mutations were either inherited from locally established populations (n=1 S. Typhi 4.3.1.1, Fig. S3e; n=2 S. Paratyphi A 2.4.4, Fig. S3c), or arose independently in local populations (n=1 S. Typhi 3.3.2, Fig. S3f; n=1 S. Paratyphi A 2.4.4, Fig. S3c).
In Kathmandu, resistance to former first-line drugs was rare (n=2, 1.4%), occurred only in S. Typhi 4.3.1.1 (Fig. S6), and resulted from local transmission of endemic strains pre-dating STRATAA surveillance (Table 3). Most cases (85.5% of S. Typhi, all S. Paratyphi A) carried a single QRDR mutation. These result from 11 different clusters, each with a characteristic gyrA mutation (see Fig. S6), most of which (97% of S. Typhi, 93% of S. Paratyphi A) were already present in the local population and carrying QRDR mutations before the surveillance period (Table 3). The exceptions were one S. Paratyphi A 2.3 (Fig. S3g) and one S. Typhi 4.3.1.1 (Fig. S3h) that were closely related to sequences from India (8-9 single nucleotide variants (SNVs)); and twoS. Typhi 3.2.2 with no close relatives in the genome collections (Fig. S3a).
Transmission dynamics
Substitution mutations accumulate too slowly in S. Typhi and S. Paratyphi A to infer specific transmission events from sequence data24; hence, to explore transmission patterns we instead examined groups of cases in the STRATAA dataset that formed zero-SNV clusters (i.e. no SNVs detected in the core genome), which we interpret as being linked either by a common source or by chains of transmission during which no substitutions have arisen. Two-thirds (67%) of cases fell into zero-SNV clusters (60% in Blantyre, 69% in Dhaka, 59% in Kathmandu). Most clusters were of size n=2 cases (40 clusters) or n=3 cases (22 clusters), but there were also 25 large clusters of ≥5 cases (accounting for 21% of cases in Blantyre, 51% in Dhaka, 30% in Kathmandu). The median time between consecutive cases in the same cluster was 16 days (interquartile range 3-49 days). Median time between first and last cases per cluster was 136 days (interquartile range 37–273 days, i.e. ∼1.2–9.1 months) and the maximum was 854 days (28 months, i.e. 2.3 years); zero-SNV clusters spanning >1 year were detected in all sites. This confirms the slow substitution rate and highlights the difficulty in resolving individual transmission events for typhoidal pathogens. Cases from the same zero-SNV clusters showed significant temporal clustering (reduced pairwise temporal distances compared to unclustered isolates; see Fig. 3) in all settings and for both serovars.
There was some evidence of geographical clustering of zero-SNV cases compared to unclustered cases in the South Asian settings (S. Typhi and S. Paratyphi A in Dhaka, median distance 373 m vs 540 m and 421 m vs 524 m, respectively; S. Typhi in Kathmandu, 646 m vs 796 m). Fig. 3 shows properties of the 25 larger zero-SNV clusters (≥5 cases). Just 10 of these showed evidence of spatial clustering, as might be expected for transmission following a point-source outbreak: nine in Dhaka (S. Typhi 2.0.1, 2.1.7, and 4.3.1.1, spanning up to ∼2 years each; S. Paratyphi A 2.3.3 and 2.4.4, ∼14 months) and one in Kathmandu (S. Typhi 4.3.1, 36 days) (see Fig. 3).
AMR profiles were homogeneous within clusters (Fig. 3), consistent with transmission of AMR variants. In Kathmandu, cases caused by the largely imported strain S. Typhi 3.3.2 were more likely to be in zero-SNV clusters (71%, in six clusters) compared with the endemic S. Typhi 4.3.1.2 (61%, 11 clusters), other S. Typhi (38%, 1 cluster) or S. Paratyphi A (36%, 2 clusters) (p=0.03 using Chi-square test). In Dhaka, some genotypes had significantly higher clustering rates (4.3.1.1, 82%; 4.3.1.3, 75%; 2.0.1, 68%) than others (2.3.3, 59%; S. Paratyphi A, 62%) (p<1x10-6 using Chi-square test), which may reflect greater intensity of transmission20. In all settings, all three age groups were equally likely to be involved in zero- SNV clusters (64-66% per age group, p>0.4 in all settings, using Chi-square test), and most clusters (77%) were detected across age groups (Fig. 3).
Discussion
This study provides a comprehensive, WGS-resolved, view of the pathogen populations underlying enteric fever in three urban sites in areas of high disease burden. Use of the same passive surveillance protocols across all sites, in defined study catchments and including all ages, allows the results to be compared between sites. The genotype prevalences we estimated are in line with other studies of enteric fever pathogens reported from the same cities2, 3, 7–10, 21, 25, 26 , but we also specifically addressed pathogen diversity across age groups. We show that although the disease burden is highest in children20, the same pathogen variants (serovars, genotypes, and AMR) circulate across age groups (pre-school, school-age, and adult), with near-identical sequences (0-SNV clusters) identified across age groups (Figs. 2- 3, S4-S6). This supports the assumption of homogeneous mixing between age groups in models of TCV impact. It also supports the idea that vaccinating children may have a secondary impact by reducing transmission to adults, although the directionality of transmission is not resolvable from WGS and no statistically significant effect has yet been observed in TCV trials.
Importantly, this study explicitly addresses the transmission burden of drug-resistance in enteric fever (Table 3). Several prior studies have documented transmission of AMR variants between countries, including introduction of MDR 4.3.1 variants from South Asia into Kenya, Malawi and neighbouring countries8, 9, 27, and spread of ciprofloxacin-resistant 4.3.1.2 from India into Nepal3, 10. A recent study estimated QRDR mutations have arisen ≥94 times in S. Typhi and transferred between countries ≥119 times7. It is also frequently reported that AMR infections result from clonal spread. However, whilst several studies have used WGS and ancestral state reconstruction to estimate the proportion of AMR Mycobacterium tuberculosis infections that are due to transmission of AMR variants vs de novo emergence28, this has not been previously quantified in enteric fever pathogens. Our results show that the vast majority of enteric fever in all three study sites was due to circulation of locally- established pathogen variants, and that AMR infections were overwhelmingly (98%) caused by transmission of pre-existing AMR strains (Table 3). This supports the modelling assumption that each AMR infection poses a risk of secondary AMR infections in others, through shedding and onward transmission of the AMR variants. Coupled with our data showing the same variants co-circulate freely across age groups, this lends weight to model predictions that reducing the incidence of AMR S. Typhi infection and shedding in children through TCVs may have a secondary effect of reducing exposure to AMR infections across the whole population.
Our data showing persistence of zero-SNV clusters over months and even years (Fig. 3) highlights the challenges of inferring transmission events from S. Typhi WGS data. Previous studies have estimated substitution rates of approximately one SNV every two years3, 8, a log- scale slower than for host-generalist S. enterica serovars such as Agona and Kentucky, and similar to slow-growing M. tuberculosis29. Our data support previous assertions that the evolutionary rate in S. Typhi is too slow to support inference of explicit transmission events and networks from genome sequence data24; although it is still possible to obtain useful data on transmission dynamics from WGS data (such as estimates of R0 and trends in effective population size7). Simple spatial analysis using GPS coordinates without topological mapping identified significant spatial clustering in Dhaka and Kathmandu but not Blantyre (Fig. 3).
The Blantyre study area has a complex river system, comprising 10 river catchments in close proximity, and recent spatial modelling of S. Typhi sequences in this setting showed that dividing space into river catchments explained the spatio-genetic patterns much better than simple grid coordinates26. Detailed spatial modelling is beyond the scope of this study but is planned for all STRATAA sites.
Overall, our findings strengthen efforts to model the impact of TCVs, and provide essential baseline data against which to assess the impact of future immunisation programmes and other interventions to control enteric fever.
Methods
Ethics statement
Research Ethics Committee approval for a joint study protocol across all three surveillance sites was obtained within each country as well as from the Oxford Tropical Research Ethics committee as detailed elsewhere20, 22.
Study protocol
The STRATAA study protocol has been previously published22, and subsequent findings described in detail elsewhere23. Antimicrobial susceptibility testing was done using disc diffusion23.
Whole genome sequencing and analysis
Blood culture isolates were subjected to WGS via Illumina HiSeq to generate 100 bp paired- end reads as described previously8. SNVs were identified by mapping sequence reads to the S. Typhi CT18 (accession no. AL513382) and S. Paratyphi A AKU_12601 (accession no. FM200053) reference genomes using RedDog (V1beta.11). Genotypes were assigned using GenoTyphi30 (v1.9.1) and Paratype. Recombination-filtered maximum-likelihood phylogenies were inferred using Gubbins (v2.4.1) and RAxML (v8.2.8). Acquired AMR genes and plasmid replicons were detected using SRST2 (v0.2.0). To determine if detected molecular determinants of AMR were transmitted or inherited, we performed maximum- parsimony ancestral state reconstruction of AMR determinants and country, using R package phangorn (v2.5.5). See Supplementary Methods for full details of sequencing and analysis.
Statistical analyses
Statistical tests were conducted in R (v4.1.2) using two-sided tests of significance. Logistic regression models were fit using the glm() function in package stats (v4.1.2). Age-group was treated as an ordinal categorical variable (<5 years, ≥5 and <15 years, ≥15 years). Severe disease was defined as symptom duration exceeding 10 days and/or requiring hospitalisation (based on clinical review of case report forms). Further details are provided in Supplementary Methods.
Nucleotide sequence data accession numbers
Raw sequence data have been deposited in the European Nucleotide Archive under project PRJEB14050 (accessions in Table S1).
Role of the funding source
The funders had no role in study design; collection, analysis, and interpretation of data; writing of the report; or in the decision to submit the paper for publication.
Data Availability
Raw sequence data have been deposited in the European Nucleotide Archive under project PRJEB14050 (accessions in Table S1). All other data produced in the present work are contained in the manuscript.
Contributors
AJP, BB, FQ, GD, JDC, KEH, MG, RSH, SB, SD, VEP designed the study and obtained funding. They oversaw data collection together with AK, CM, NF, MS. AC, AK, FK, MS, JM contributed to data collection and curation. ST was project manager. ZAD, KEH, PA analysed data; AJP, JM, MG, RSH, SB, VEP assisted with design and interpretation of the analysis. ZAD and KEH had full access to all the data in the study and verified the data. ZAD, KEH and PA wrote the first draft of the manuscript, and all authors reviewed the manuscript critically for content and approved the decision to submit for publication.
Declaration of interests
VEP received travel reimbursement from Merck and Pfizer for attending Scientific Input Engagements unrelated to the topic of the manuscript and is a member of the WHO Immunization and Vaccine-related Implementation Research Advisory Committee (IVIR- AC).
STRATAA Study Group Authorship
Anup Adhikari5, Happy Chimphako Banda6, Christoph Blohmke1, Thomas C. Darton1, Christiane Dolecek2, Yama Farooq1, Maheshwar Ghimire5, Jennifer Hill1, Nhu Tran Hoang3, Tikhala Makhaza Jere6, Moses Kamzati6, Yu-Han Kao4, Clemens Masesa6, Maurice Mbewe6, Harrison Msuku6, Patrick Munthali6, Tran Vu Thieu Nga3, Rose Nkhata6, Neil J. Saad4, Trinh Van Tan3, Deus Thindwa6, Merryn Voysey1
Affiliations
1 Oxford Vaccine Group, Department of Paediatrics, University of Oxford, and the NIHR Oxford Biomedical Research Centre, Oxford, United Kingdom
2 Nuffield Department of Medicine, Centre for Tropical Medicine and Global Health, University of Oxford, Oxford, UK
3 The Hospital for Tropical Diseases, Wellcome Trust Major Overseas Programme, Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam
4 Department of Epidemiology of Microbial Diseases and the Public Health Modeling Unit, Yale School of Public Health, Yale University, New Haven, CT, USA
5 Oxford University Clinical Research Unit, Patan Academy of Health Sciences, Kathmandu, Nepal
6 Malawi-Liverpool Wellcome Programme, Blantyre, Malawi
Supplementary Methods
Bacterial isolates and whole genome sequencing
Isolates cultured from blood of febrile individuals recruited into the STRATAA passive surveillance studies at each of the three sites were stored locally until the end of the recruitment period. Subsequently, isolates were cultured overnight and genomic DNA extracted using the Wizard Genomic DNA Extraction Kit following the manufacturers recommendations (Promega, WI, USA). DNA was shipped to the Wellcome Sanger Institute and subjected to indexed whole genome sequencing on an Illumina HiSeq 2500 HiSeq platform to generate paired- end reads of 100 bp in length, as described previously1.
Isolates from n=452 cases in Dhaka (99.6% of all culture-positive) from the period January 2017-December 2018 were successfully sequenced and confirmed as S. Typhi or S. Paratyphi A (summary in Table 1, genome list in Table S1). From Kathmandu, n=138 cases (84.1% of all culture-positive) from the same time period were successfully sequenced and serovars confirmed. From Blantyre, n=83 cases (72.2% of all culture-positive) from October 2016-October 2018 were successfully sequenced and typhoidal serovars confirmed. Recruitment continued for an additional 10 months in Blantyre, resulting in total n=141 sequenced typhoidal isolates (89.2% of positive cultures). The full set of n=141 sequences from Blantyre are included in the main dataset ‘STRATAA’, which comprises genome sequence data for n=731 unique typhoidal Salmonella blood-culture isolates (622 S. Typhi and 109 S. Paratyphi A, see Table 1). A small number of additional blood-culture isolates were captured at the study sites before or after these formal surveillance periods (n=30 from Dhaka, n=27 from Kathmandu) or outside the boundaries of the surveillance catchment area (n=35 from Blantyre); these were also sequenced and included in phylogenetic trees to provide context (total n=707 S. Typhi, n=116 S. Paratyphi A; see Table S1), but were excluded from statistical analyses.
Single nucleotide variant (SNV) analysis and in silico genotyping
Raw S. Typhi Illumina reads were mapped to the S. Typhi CT18 reference sequence (accession no. AL513382)2, and those for S. Paratyphi A to the S. ParatyphiA AKU_12601 reference genome (accession no. FM200053)3.
Mapping was carried out using the RedDog mapping pipeline (V1beta.11; available at https://github.com/katholt/reddog), which uses Bowtie (v2.2.9)4 to map reads to the reference sequence, and SAMtools (v1.3.1)5 to identify single nucleotide variant (SNV) calls as previously described6. All raw read data analysed had a minimum read depth of >40-fold, and >97% reference coverage. For S. Typhi sequences, read alignments (BAM files) were then used as input for GenoTyphi (v1.9.1; available at: https://github.com/katholt/genotyphi)7 to assign S. Typhi isolates to known genotypes according to an extended S. Typhi genotyping framework7, 8. Similarly, Paratype (v1.0; available at https://github.com/CHRF-Genomics/Paratype) was used to assign S. Paratyphi A sequences to genotypes 9.
Chromosomal SNVs with confident homozygous base calls (phred score >20), for all SNV sites that had such calls in >95% of S. Typhi genomes (representing the 95% ‘soft’ core genome) were concatenated to form an alignment of alleles at 15,387 variant sites for all S. Typhi, including 707 from this study (Table S1) and 3,128 global context sequences (Table S6) from previous studies 6, 10–19. Previously defined8, 20, 21 repetitive and recombinant regions were excluded (354 kb; ∼7.4% of bases in the CT18 reference chromosome), and any remaining recombination filtered out using Gubbins (v2.4.1)22 resulting in a final alignment length of 14,780 chromosomal SNVs. S. Typhi phylogenies were outgroup rooted using S. Paratyphi A AKU_12601 alleles, and a representative selection of S. Typhi outgroup taxa from each defined genotype (Table S7). An interactive version of the S. Typhi phylogeny with STRATAA plus global isolates (shown in Figure S2) is available at https://microreact.org/project/wim5TssQ3AqSfgWXTSP2Bd. Genotype-specific phylogenies were constructed in the same manner (used for ancestral state reconstruction, described below); as were local site-specific phylogenies of STRAATA data (Figures S4-S6), and S. Paratyphi A phylogenies (outgroup-rooted using S. Typhi CT18 alleles, https://microreact.org/project/SgHbIR9cP).
Phylogenomic analysis
Maximum-likelihood (ML) phylogenetic trees were inferred from the aforementioned chromosomal SNV alignments using RAxML (v8.2.8)23. A generalised time-reversible model and a Gamma distribution was used to model site-specific rate variation (GTR+Γ substitution model; GTRGAMMA in RAxML) with 100 bootstrap pseudoreplicates used to assess branch support for the ML phylogeny. The resulting phylogenies were visualised and annotated using Microreact 24 and the R package ggtree (v2.2.4)25.
Identification of antimicrobial resistance (AMR) determinants and associated mobile genetic elements
For the detection of chromosomal point mutations associated with AMR, GenoTyphi (v1.9.1; available at: https://github.com/katholt/genotyphi)7,8 and GenoParatyphi (v0.1-alpha; available at: https://github.com/zadyson/genoparatyphi) were used. These tools screen for mutations in the quinolone resistance determining region (QRDR) of gyrA (codons 83 and 87), parC (codons 80 and 84) and gyrB (codon 464), and mutations at acrB codon 717 (azithromycin resistance)26, 27. Acquired genes and plasmid replicons were detected using the mapping-based allele typer SRST2 (v0.2.0)28 to screen against the ARG-ANNOT29 and PlasmidFinder30 databases, respectively.
Comparison of AMR genotypes and phenotypes
To assess how well genetic determinants predict AMR phenotypes, we interpreted the following markers as predictive of non-susceptibility to specific antimicrobials, based on previous publications27, 31: blaTEM-1, ampicillin; carbapenemase, meropenem; catA1, chloramphenicol; dfr plus sul, co-trimoxazole; extended- spectrum beta-lactamase, ceftriaxone and cefixime; acrB-717 mutation or mphA, azithromycin; QRDR mutation or qnrS, ciprofloxacin and nalidixic acid. Results are reported for drugs that were tested in ≥20% of isolates.
As most azithromycin resistance was not explained by known mechanisms (acrB-717 mutation or acquired genes), we screened Ribosomal RNA (rRNA) operons for potential explanatory mutations. A single copy of the rRNA operon (coordinates 4257263-4262892 in the CT18 reference genome) was extracted, and used as a reference sequence for mapping of S. Typhi reads using RedDog (as described above). Read alignments (BAM format) were subjected to low-frequency variant calling LoFreq (v2.1.3.1)32, and SNV read-depth data extracted from the resultant variant calling format (VCF) files with the read.vcfR() function from the R package vcfR (v1.12.0)33. Read depths for SNVs detected in the rRNA operon were normalised using the average chromosomal read depths for each sequence reported by RedDog across the entire CT18 reference sequence to identify single copy mutations (read depth of ∼1x the average reported for the CT18 chromosome). These single copy mutations were examined for an association with observed azithromycin resistance phenotypes, but were found to correlate with the genotype backgrounds in which they occurred and not with resistance. The same approach was used to assess S. Paratyphi A genomes, where reads were mapped to a single rRNA operon (coordinates 3870557-3876131 in the S. Paratyphi A AKU_12601 reference genome), but again no evidence of resistance-associated mutations in the rRNA operon were identified.
DBGWAS (v0.5.4)34 was utilised to carry out a bacterial genome-wide association study (GWAS) to screen for genetic loci and/or variants associated with the azithromycin resistance. For this, raw reads were assembled de novo with Unicycler (v0.4.7)35 and used as input to DBGWAS, along with azithromycin resistance status for each sequence. DBGWAS was run using default parameters and p-value <0.01 interpreted as a significant association; none were identified.
Transmission of molecular determinants of AMR
To determine if resistant enteric fever cases resulted from infection with locally-circulating resistant strains, imported strains, or emerged de novo, we carried out ancestral state reconstruction (ASR) analyses. For each genotype where molecular determinants of AMR were detected, we inferred a ML phylogeny including all available STRATAA genomes plus global contextual sequences of the same genotype (phylogenetic inference methods as detailed above). We then conducted ASR, for each detected AMR determinant and country of origin (Nepal, Bangladesh, Malawi, or other), onto these ML tree topologies using a maximum-parsimony method implemented in the ancestral.pars() function in the R package phangorn (v2.5.5)36. The inferred states for each variable (country and AMR determinant) were extracted for each internal node of each tree and used to infer, for each resistant isolate (tree tip), whether the resistance was most likely: (i) inherited from a local resistant strain (parent node and tip share same AMR determinant and location, interpreted as local transmission of a resistant strain); (ii) inherited from an imported resistant strain (parent node and tip share the same AMR determinant but parent node is located in a different country, interpreted as transmission of an imported resistant strain); (iii) recently emerged de novo (parent node lacks the AMR determinant, interpreted as recent emergence of resistance rather than transmission of an established resistant strain).
Cluster analysis
SNP distances were calculated using disty (v0.1.0, https://github.com/c2-d2/disty). Zero-SNV clusters were then identified using the AgglomerativeClustering function of the Python package scikit-learn (v1.0.2) i.e.
AgglomerativeClustering(linkage=’complete’, distance_threshold=0.1, n_clusters=None).fit_predict(df).
Global Position System (GPS) coordinates were collected by field workers from the homes of cases using either study tablets or GPS machines. If a GPS signal could not be obtained at a particular location due to density, then the closest location where a signal could be obtained was recorded instead. Coordinates were available for 88.6% of cases (94.5% in Dhaka, 77.5% in Kathmandu, 80.9% in Blantyre). Spatial pairwise distance was calculated using the geopy.distance.distance function from the geopy Python package (v2.3.0). Distributions of pairwise geographic and temporal distances were compared between cases in zero-SNV clusters and unclustered cases using the Kolmogorov-Smirnov two sample test implemented in the scipy (v1.7.1) package of Python i.e. scipy.stats.ks_2samp.
Statistical analyses
Statistical tests were conducted in R (v4.1.2) using two-sided tests of significance. SNV distances were calculated from the SNV alignment (described above) using snp-dists (v0.7.0), available at (https://github.com/tseemann/snp-dists). Simpson’s index diversity estimates were calculated using the diversity() function in the R package vegan (v2.5.6)37. Phylogenetic signals for age-group were quantified using the function multiPhylosignal() in the R package picante (v1.8.2)38 to calculate K statistic39 values. Multivariate logistic regression models were fit using the glm() function in package stats (4.1.2). Age-group was treated as an ordinal categorical variable (<5 years, ≥5 and <15 years, ≥15 years); other variables (serovar, dominant genotype, MDR, severity) were coded as binary data.
Supplementary Methods
Supplementary Figures
Supplementary Tables
Table S1: S. Typhi and S. Paratyphi A genomes sequenced in this study
[CSV file]
Table S6: Publicly available S. Typhi and S. Paratyphi A genome sequences used to provide global context for genotypes detected in STRATAA sequences
[CSV file]
Table S7: Representative S. Typhi sequences (one per defined genotype) used to outgroup-root S. Typhi phylogenies
[CSV file]
Acknowledgements
This research was funded by the Wellcome Trust [STRATAA grant number 106158/Z/14/Z and core funding to the Wellcome Sanger Institute, grant number 206194] and the Bill & Melinda Gates Foundation [grant number OPP1141321]. ZAD received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 845681. For the purpose of open access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission. We acknowledge the contribution of all participants who have taken part in the studies and the large field and laboratory teams at the sites, including: Amit Aryja, Binod Lal Bajracharya, David Banda, Yama Mujadidi, Pallavi Gurung, Nazia Rahman, Archana Maharjan, George Mangulenji, Prasanta Kumar Biswas, and the Nepal Family Development Foundation team. We also acknowledge the sequencing and pathogen informatics teams at the Wellcome Sanger Institute for sequencing and data processing; and Sebastian Duchene of the University of Melbourne for useful discussions regarding transmission analyses. icddr,b is grateful to the Government of Bangladesh, Canada, Sweden and the United Kingdom for support.