Abstract
Despite the considerable morbidity and mortality of Yellow fever virus (YFV) infections in Brazil, as well as its widespread presence in non-human primate host, our understanding of disease outbreaks is hampered by limited viral genomic data. Determining the timing and spatial corridors of YFV spread, as well as the geographic hotspots that link the endemic north of the country with epidemic extra-Amazonian regions, are central to predicting and preventing future outbreak events and epidemics. Here, we tracked the recent spread of the virus by integrating genome sequences of new YFV infections sampled from infected non-human primates and humans with both epidemiological and vector data. Through a combination of phylogenetic and epidemiological models we reconstructed the recent transmission history of YFV within different epidemic seasons in Brazil. A suitability index based on the highly domesticated Aedes aegypti was able to capture the seasonality of reported human infections. Spatial modelling revealed spatial hotspots with both past reporting and low vaccination coverage, which coincided with many of the largest urban centres in the Southeast. Phylodynamic analysis unravelled the circulation of three distinct YFV lineages, and provided proof of the directionality of a known spatial corridor of viral spread that connects the endemic North with the extra-Amazonian basin. This study illustrates that genomics linked with field sampling of animals and humans within a One Health framework can provide new insights into the landscape of YFV transmission, augmenting traditional approaches to infectious disease surveillance and control.
Introduction
Yellow fever virus (YFV) is a single-stranded positive-sense RNA virus belonging to the genus Flavivirus, family Flaviviridae (Lindenbach et al., 2013). This mosquito-borne pathogen is currently endemic in tropical areas of Africa and South America. YFV can be maintained in two transmission cycles: the sylvatic (or jungle) and the urban cycles. The sylvatic cycle involves non-human primates (NHPs) and forest canopy-dwelling mosquitoes, mainly Sabethes sp. and Haemagogus sp. (Monath and Vasconcelos, 2015). While African NHPs seemingly rarely die from YFV infection, New World NHPs typically exhibit disease signs with elevated mortality rates and thus act as sentinel animals for viral circulation in the environment, in turn representing a valuable marker for epidemiological surveillance (Silva et al., 2020). Within the urban cycle, YFV is transmitted to humans by highly anthropophilic Aedes sp. mosquito vectors (Monath and Vasconcelos, 2015) that are typically widely distributed in urban settings in South America and Africa.
In Brazil, yellow fever transmission historically occurs within a sylvatic cycle in the Amazonian region (Vasconcelos et al., 2004). In the extra-Amazonian region, yellow fever outbreaks occur with potential infection to humans with irregular periodicity under favourable conditions for transmission, such as a build up of susceptible hosts, below threshold human vaccine coverage, high vector density, favourable temperature and rainfall, or the emergence of viral strains with potentially increased fitness advantage (Sacchetto et al., 2022; Vasconcelos et al., 1997). In recent decades, YFV re-emergence events have had a great impact on public health in Brazil (Faria et al., 2018). In 2002-2003, 2007-2009 and the during the ongoing outbreak (2016-2019 and 2020-present), there has been a spatial expansion of YFV circulation, with the virus spreading from the east towards the south of Brazil, reaching the state of Rio Grande do Sul located at the extreme meridional region of the country (Brazilian Ministry of Health, 2020a; Almeida et al., 2012; Romano et al., 2014). During these outbreaks, thousands of NHPs deaths were documented, mostly in Alouatta sp. (i.e. howler monkeys), followed by Callithrix sp. and Cebus sp. (Romano et al., 2014, Chame et al., 2020), and in the most recent circulation, for the first time the virus was detected in Leontopithecus sp., an endemic highly endangered species (Silva et al., 2020). Additionally, more than 2,100 human cases were reported in the Southeast region of Brazil, with a case-fatality rate of ∼30%, many of them in areas with low human vaccination coverage (Brazilian Ministry of Health, 2020ab).
In September 2020 a novel YFV re-emergence was observed within the midwestern states of Goiás, Distrito Federal and the Southeast state of Minas Gerais (Brazilian Ministry of Health, 2021/2022; Minas Gerais State Department of Health, 2022). This re-emergence was concomitant with the outbreak that started in 2016 and is ongoing in the Southern region of the country (Andrade et al., 2021), increasing the complexity of the Brazilian epidemiological scenario and the risk to public health (Silva et al., 2020; Brazilian Ministry of Health, 2021/2022). While the urban cycle of yellow fever has been absent in Brazil since 1942 (Possas et al., 2018), these recent outbreaks have occurred in proximity to areas heavily infested by potential Ae. aegypti and Ae. albopictus vectors that are close to large, densely populated metropolitan regions. These areas are also characterised by low YFV vaccination coverage, thereby representing a risk for the re-establishment of the urban cycle (Romano et al., 2014). Previous studies have analysed the spatial and evolutionary dynamics of the current YFV outbreak in different Southeastern states (Faria et al., 2018) and have documented the co-circulation of distinct YFV lineages (Delatorre et al., 2019; Giovanetti et al., 2019). Nevertheless, a shortage of genomic data from many locations and time points has hampered the ability to understand in detail the reemergence and establishment of YFV transmission across extra-Amazonian regions (Giovanetti et al., 2019; Faria et al., 2018).
Identifying spatial corridors of YFV spread, their ecological backgrounds, the underlying human immunity landscape, as well as the role of the animal vertebrate hosts and vector populations are crucial to predicting, preventing and controlling future potential outbreaks events that may lead to epidemics. To gain better insights into the routes of YFV dispersion, we tracked the spread and re-emergence of the virus by analysing metadata on human infections and newly 147 genome sequences sampled mainly from NHPs, as well as from some human patients within the Northern, Midwestern, Southeastern, and Southern regions of Brazil.
Material and methods
Sample collection
Human and non-human primate samples were collected by local and regional public health authorities, under the guidelines of a national strategy of yellow fever surveillance and sent for molecular diagnostics to the Reference Laboratory of Emerging Viruses of the Carlos Chagas Institute/Fiocruz-PR, which is a Brazilian Ministry of Health Regional Reference Laboratory for arboviruses.
Ethical statement
The project was supported by the Pan American World Health Organization (PAHO) and the Brazilian Ministry of Health (MoH) as part of the arboviral genomic surveillance efforts within the terms of Resolution 510/2016 of CONEP (Comissão Nacional de Ética em Pesquisa, Ministério da Saúde; National Ethical Committee for Research, Ministry of Health).
Molecular screening and nanopore sequencing
RNA was extracted from tissue samples using the QIAamp Viral RNA Mini KitTM (Qiagen, Hilden, Germany) or the MagMAX pathogen RNA/DNA kit (Life Technologies, Carlsbad, USA) according to the manufacturer’s instructions. YFV RNA was detected using the RT-qPCR protocol described by Domingo et al (Domingo et al., 2012).
cDNA synthesis and whole-genome nanopore sequencing
Sequencing was attempted on 147 selected RT-PCR-positive samples (from a total of 200) regardless of CT value (as previously described) and the availability of epidemiological data (Faria et al., 2018; Giovanetti et al., 2019). All positive samples were submitted to a cDNA synthesis protocol (Faria et al., 2018) using a ProtoScript II first strand cDNA synthesis kit. A multiplex tiling PCR was then performed using the previously published YFV primer scheme and 30 cycles of PCR using Q5 high-fidelity DNA polymerase (NEB) as previously described (Quick et al., 2017). Amplicons were purified using AMPure XP beads (Beckman Coulter), and cleaned-up PCR product concentrations were measured using a Qubit double-stranded DNA (dsDNA) high-sensitivity (HS) assay kit on a Qubit 3.0 fluorometer (Thermo Fisher). DNA library preparation was performed using the Ligation sequencing kit (Oxford Nanopore Technologies) and the Native barcoding kit (NBD103; Oxford Nanopore Technologies, Oxford, UK). A sequencing library was generated from the barcoded products using the genomic DNA sequencing kit SQK-MAP007/SQK-LSK208 (Oxford Nanopore Technologies). The sequencing library was loaded onto a R9.4 flow cell (Oxford Nanopore Technologies). Sample data, location, hosts, NHP species, municipality of collection and collection date are summarised in Table 1.
Generation of consensus sequences
Raw files were basecalled using Guppy v4.5.4 and barcode demultiplexing was performed using qcat. Consensus sequences were generated by de novo assembling using Genome Detective (https://www.genomedetective.com/) (Vilsker et al., 2019). Briefly, Genome Detective uses DIAMOND to identify and classify candidate viral reads in broad taxonomic units, using the viral subset of the Swissprot UniRef90 protein database. Candidate reads are next assigned to candidate reference sequences using NCBI blastn and aligned using AGA (Annotated Genome Aligner) and MAFFT. Final contigs and consensus sequences are made available as FASTA files. More detail about Genome Detective can be found in (Vilsker et al., 2019).
Collation of YFV complete genome data sets
Genotyping was first conducted using the yellow fever typing tool available at https://www.genomedetective.com/app/typingtool/yellowfever/ and confirmed using a maximum likelihood (ML) phylogenetic analysis incorporating a collection of representative sequences (n= 495) belonging to the four YFV lineages (Fig S1). This analysis robustly placed the novel YFV sequences generated here into a clade that likely emerged in the state of Roraima in North Brazil in 2002 (Fig S1).
The genome sequences generated here were combined with a data set comprising previously published genomes from the 2002, 2016 to 2019 YFV epidemics in Brazil (Faria et al., 2018; Delatorre et al., 2018; Giovanetti et al., 2019). Three complete or near complete YFV genome data sets were generated. Data set 1 (n=443) comprised the data reported in this study (n=147) plus (n=296) complete or near complete YFV genomic sequences (10,000 bp) retrieved from NCBI in September 2022 and covering all the YFV South American (SA) I genotype genomes currently available. Subsequently, to understand the transmission and the spatiotemporal evolution of the SA1 lineages 1 and 2 of YFV, genetic analyses were conducted on smaller data sets that included all available genomic sequences belonging to each of those lineages: n=163 and n=270, respectively. Sequence alignment was performed using MAFFT (Katoh et al., 2013) and manually curated to remove artefacts using Aliview (Larson et al., 2014). ML phylogenetic trees were estimated using IQ-TREE2 (Nguyen et al., 2015) under the GTR nucleotide substitution model, which was inferred as the best-fit model by the ModelFinder application implemented in IQ-TREE2 (Kalyaanamoorthy et al., 2017). Statistical support for tree nodes was estimated using a ML bootstrap approach with 1,000 replicates. To investigate the temporal signal in our YFV data sets, we regressed root-to-tip genetic distances from this ML tree against sample collection dates using TempEst v.1.5.1 (Rambaut et al., 2016).
Dated phylogenetics
To estimate time-calibrated phylogenies, we conducted a phylogenetic analysis using a Bayesian approach (Suchard et al., 2018). Accordingly, we used the GTR+gamma4 nucleotide substitution model and Bayesian Skyline tree prior as employed previously (Giovanetti et al., 2019) with an uncorrelated relaxed clock with a lognormal distribution (Giovanetti et al., 2019). Analyses were run in duplicate in BEAST v.1.10.4 for 100 million MCMC steps, sampling parameters, and trees every 10,000th step. Convergence of MCMC chains was checked using Tracer (Rambaut et al., 2018). Maximum clade credibility (MCC) trees were summarised using TreeAnnotator after discarding 10% as burn-in.
Phylogeographic analyses
To model the phylogenetic diffusion of YFV SA1 lineages 1 and 2 we used a flexible relaxed random walk diffusion model (Lemey et al., 2010, Pybus et al., 2012) that accommodates branch-specific variation in rates of dispersal with a Cauchy distribution and a jitter window site of 0.01 (Dellicour et al., 2016, Dellicour et al., 2021). For each sequence, coordinates of latitude and longitude were attributed. MCMC analyses were performed in BEAST v1.10.4, running in duplicate for 100 million interactions and sampling every 10,000 steps in the chain. Convergence for each run was assessed in Tracer (effective sample size for all relevant model parameters >200). MCC trees for each run were summarised using TreeAnnotator after discarding the initial 10% as burn-in. Finally, we used the R package ‘seraphim’ version 1.0 (Dellicour et al., 2016) to extract and map spatiotemporal information embedded in the posterior trees.
Eco-epidemiological data and integration with genomic data
Data of weekly notified and laboratory confirmed cases of infection by YFV in Brazil during 2015 to 2022 were supplied by the Brazilian Ministry of Health (BrMoH). A mosquito-viral suitability measure (index P) was estimated using the MVSE R-package v1.01 (Obolski et al., 2019). The index P receives as input local temperature, humidity and proposed probability distributions for key viral, vector and host parameters related to the host-pathogen system under study. Accordingly, it measures the reproductive (transmission) potential of a single adult female mosquito in a completely susceptible host population. For parameterization we used satellite climate data from Copernicus.eu (dataset “ERA5-Land monthly averaged data from 1950 to present”, https://cds.climate.copernicus.eu/), and parameter probability distributions informed by the literature. A full description of Index P parameterization can be found in the Supplementary Text. We used R v4.1.2 to calculate the correlation between incidence and index P using Spearman’s rank correlation coefficient (function cor.test) and to determine best time lag between the two time series (function ccf).
Bayesian modelling of YFV human case counts
We modelled human case counts and candidate covariates by municipality using the Besag York Mollié model (BYM2). Briefly, this model assumes a Poisson distribution for case counts and includes both spatial covariates and random effects to account for unexplained spatial variation. The model was implemented in R v4.1.2 and Stan. A full description of covariates, the model and its output (posterior distributions) are provided in the Supplementary Text.
Results
Human incidence of YFV in Brazil
Due to inherent natural variations in mosquito population size, YFV should display temporal dynamics with an oscillatory behaviour characterised by recurrent peak incidence every epidemic season (local mid-summer) (Couto-Lima et al., 2017). However, limited surveillance and local testing capacity mean that obtaining a detailed spatio-temporal perspective with high resolution, and within Brazilian microregions, is challenging. Nonetheless, the epidemic curves of human case reports in the study period showed clear seasonal outbreaks within several national macroregions, compatible with the periods previously detected and used for national surveillance (Romano et al., 2011) (Fig 1a). In particular, between 2015 and 2022 there were three outbreaks in the Southeast (2016-2019), followed by two smaller outbreaks in the South (2020-2021). Estimated transmission potential of YFV (index P) based on local climatic variables averaged across the macroregions presented clear seasonal oscillations matching the time windows of the observed waves of infection (Fig 1a). The majority of human reported infections also took place in inter-yearly periods that had higher index P values (Fig 1a inner plot). When looking at the years 2017-2019 for which incidence presented a clearer seasonal signal, we found a positive correlation between cases and index P of 0.47 (p-value=0.003). As shown in previous studies, index P often precedes incidence in time due to inherent natural lags between natural climate variation and its effects on transmission (Nakase et al., 2022), or due to delays in reporting. Accordingly, we found that an optimal shift of index P by +1 month into the future maximised the correlation with incidence at 0.73 (p-value=5.7e-07).
In accordance with previous studies (e.g. Tuboi et al., 2007), we found that incidence across the entire time and spatial ranges of observation were characterised by an older age profile compared to that of the population within the macroregions (Fig 1b). Overall, the highest burden was found in the 45-49 age group (∼13% of cases) which was at odds with a much younger age profile of the local population. Also in agreement with previous studies (Tuboi et al., 2007; Faria et al., 2018), incidence was higher among males (∼82% of all cases), in a background of ∼49% of the population being male within those macroregions.
We explored the spatio-temporal association between human case counts with ten covariates at the municipality level (Figures S2-3), including the suitability index P, human population density, vaccination coverage, Cerrado and Atlantic forest biomes, pasture and urbanisation land uses, forest cover and NHP YFV counts (Tables S2-3). We focused on the states of São Paulo, Minas Gerais, Espírito Santo and Rio de Janeiro, which together reported 2,210 of the 2,289 human cases (97%) in our data set (Fig 1c). To quantify the roles of each covariate we applied a Bayesian spatial regression model at the municipality level (Supplementary Text). Our analysis revealed considerable overdispersion due to unobserved random spatial effects (Fig S4), such that the ten covariates could not explain the spatial distribution of reported human cases alone. The estimated posterior distributions for the regression coefficients of each covariate (Fig 1d, Table S4) showed results compatible with expectations from the visual inspection of spatial distributions, with four variables having positive effects and six having negative effects (Fig 1d, Fig S2). Of note, there was a positive association of human cases with the Atlantic forest biome and occurrence of NHP/YFV cases, and negative associations with vaccination coverage, population density and index P (Fig 1d). The index P had the strongest negative effect, which contrasted the positive time-based correlation of human cases with the index (Fig 1a). Together, these observations revealed that while the timing of spillover events are largely dictated by climate seasonality (Fig 1a), the locations of spillover events are not necessarily the ones with highest suitability due to more relevant local factors such as forest biome type, forest cover and altitude that are all negatively associated with the spatial distribution of index P (Fig S5). This mix of covariate contributions reinforced the notion that reported human cases typically result from transmission events associated with regions reporting NHP cases within or bordering forested environments, with lower vaccine coverage and human population density (Kaul et al., 2018; Cunha et al., 2019), which are not necessarily regions with the most favourable conditions for transmission by Aedes aegyti.
We further mapped the predicted probability of human YFV case reporting (from the spatial regression model) and the covariate of vaccine coverage to obtain their bivariate spatial variation (Figs 2a-b). Spatial hotspots emerged with both high probability of reporting and proportion unvaccinated towards the southeast of the study area (Fig 2c). While recent past reporting was negatively associated with human population density (Fig 1d), many of the largest cities in the area of study were found to be peripheral to these hotspots, highlighting areas with large human populations susceptible to future outbreaks events (Fig 2c). This was particularly the case for the states of São Paulo (cities of São Paulo, Campinas, Guarulhos), Minas Gerais (Belo Horizonte, Contagem) and Espiríto Santo (Serra, Vila Velha, Cariacica). For the state of Rio de Janeiro, the largest cities were characterised by a mix of a high proportion of unvaccinated individuals and an intermediate probability of reporting, but were closely surrounded by hotspots of low vaccination coverage and high levels of reporting to the north. The only large city seemingly not closely connected to such a hotspot was Uberlândia (Minas Gerais), which had intermediate vaccination levels and low risk of occurrence. The identification of these hotspots, characterised by a mix of historical case reporting and a proportion of unvaccinated individuals living in close proximity to the largest cities, highlight areas of critical importance in which vaccination coverage and surveillance need to be enhanced in the near future.
Phylodynamics of YFV in Brazil
To explore the phylodynamics of YFV in Brazil, we combined our 147 newly generated sequences obtained from the Southeastern, Midwestern and Southern macroregions of the country to those of other genomes available on GenBank (n=296). Our analysis revealed the circulation of three different clades, named hereafter as clades Ia, IIb, as well as a novel clade termed IIIc (Fig 3). As previously described (Delatorre et al., 2019; Giovanetti et al., 2019), the South American genotype I was characterised by the co-circulation of two viral lineages (i.e., clades Ia and IIb). Circulating clades Ia-IIIc were characterised by specific mutational signatures, including nonsynonymous changes in the RNA-dependent-RNA polymerase (RdRp): S8053D and T9806I in clade Ia, V8048I in clade IIb, and G8048A in clade IIIc. These four nonsynonymous and other six synonymous mutations are described in Table S5.
To investigate the evolution of clades Ia and IIb in more detail, we used smaller data sets derived from each clade individually. Before phylogeographic analysis, each clade was also assessed for molecular clock signal using the root-to-tip regression method available in TempEst v1.5.3 (Rambaut et al., 2016), following the removal of potential outliers that may strongly violate the assumption of rate constancy. The final data set included n= 163 genome sequences for Clade Ia, and n= 270 genome sequences for Clade IIb. Importantly, there was a strong correlation between sampling time and the root-to-tip divergence in the two data sets (Figs 4a, 5a), indicative of a constancy of evolutionary rate and hence allowing the use of molecular clock models to infer evolutionary parameters. Our phylogeographic reconstruction of clade Ia (Fig 4b) suggested a mean time of origin in late-July 2015 (95% highest posterior density (HPD): 29 March 2015 to 18 August 2015). Viruses from this clade spread from the Northern region of the state of Minas Gerais towards the Southeast and later to the Northeast, as indicated by isolates from the Bahia state.
A different spatial pattern was observed for clade IIb (Fig 5) compared to clade Ia (Fig 4). Specifically, clade IIb originated with a mean date in late November 2015 (95% highest posterior density (HPD): 28 November 2015 to 05 January 2016) with an early dispersal from the Midwest (state of Goiás) to the Southeast (Minas Gerais, Espírito Santo and São Paulo) and later to the South (state of Paraná, Santa Catarina and Rio Grande do Sul) where it persisted at least until the end of 2021 (Fig 5b).
Through analysis of the genomic data set that included samples collected in 2017 from the North macroregion we were able to identify a corridor of viral spread associated with clade IIIc (Figure 3, posterior probability - pp 1.0) that connected the endemic North with the extra-Amazonian basin (states of Goiás and Distrito Federal in the Midwest, and the state of Minas Gerais in the Southeast). We estimated the mean time of the most recent common ancestor (tMRCA) for this clade to be July 2016 (95% highest posterior density interval [HPD], April 2015 - March 2017).
Conclusion / discussion
The circulation of YFV has been reported in the extra-Amazonian regions of Brazil since the early 2000s (Brazilian Ministry of Health, 2020a; Almeida et al., 2012; Romano et al., 2014; Vasconcelos et al., 2003; Cardoso et al., 2010). During this period, many advances have been achieved in the country regarding surveillance in both the reservoir and human population, such as the start of many digital platforms for animal and case reporting (e.g. https://sissgeo.lncc.br/). Preliminary studies demonstrated that a later reemergence, which began in 2016, resulted in widespread virus dissemination and an extended period of transmission still active in 2022 (Andrade et al., 2021, Andrade et al., 2022). The uninterrupted circulation of YFV in the extra-
Amazonian environment for a period that exceeds seven years is unprecedented and coincides with ongoing ecological changes. For example, increases in temperature and precipitation in some areas of the Atlantic forest biome in the South and Southeast may be contributing to creating conditions for the maintenance of perennial foci of YFV transmission in or around highly populated areas (Marengo et al., 2020; Regoto et al., 2021). At the same time, while it remains difficult to quantify direct causal effects of landscape changes due to human intervention, the midwestern and southeastern states analysed in this study are known to be experiencing rapid alterations. Indeed, changes in the forestry legislation that took effect in 2016 have allowed and fomented degradation of forested environments; for example decreasing the size of protected forests along rivers that are expected to increase exposure of adult, male and poor rural workers involved in logging and farming (Dobson et al., 2020; Ribeiro et al., 2022). By 2017, a record primary forest deforestation was measured along the Doce river valley in the state of Minas Gerais (Global forest, 2022). Similarly, the number of forest fires in the region where YFV re-emerged in 2020 more than doubled in 2018 and 2019 (National Institute of Spatial Research, 2022). Such landscape changes are driven by the need to de-forest environments for agricultural purposes on a large scale, and 2020 may represent a year of greater exposure of workers to degraded forests and secondary forests (Brondízio et al., 2009; Schneider et al., 2015). This recent trend is unlikely to change drastically in coming years, and the exploration of the role of degraded forested environments in increased risk of YFV spillover from the non-human primate hosts should be the focus of future research.
In recent years, thousands of cases and deaths, mainly among NHPs, have caused unprecedented impacts on these animal populations and public health, resulting in raised awareness and control activities, such as the increase of vaccine coverage to classically non-endemic territories (Andrade et al., 2021, Andrade et al., 2022). These recent trends also relate to conservation efforts for NHPs. For example, in the context of the endangered Alouatta guariba clamitans, previous work suggests that YFV infection is a major threat for population stability, since outbreaks in fragmented, metapopulation-structured populations are likely to result in local extinctions (Moreno et al., 2015, Chame et al. 2020). In parallel, an ongoing decline in the size of the Atlantic forest protected areas, due to recent government policies and the abandonment of surveillance and law enforcement, might have both increased the risks of new NHP outbreaks and, consequently, chances the infection to humans in the environs of areas with high population density.
Previous studies have examined the spatial and evolutionary dynamics of YFV between 2016 and 2019 (Faria et al., 2018; Giovanetti et al., 2019). Unfortunately, however, the shortage of genomic data has hampered our ability to understand particular aspects of this epidemic, such as the current reemergence in the Southeast (since late 2019). In this study, we presented a combination of epidemiological and genomic analysis of novel YFV data with the aim of further understanding and describing the past and present of the ongoing epidemic.
A time series of reported human cases between 2015 and 2022 showed a typical yearly seasonal pattern associated with a midsummer peak in January, as well as three outbreaks in the Southeast (2017-2019) followed by two outbreaks in the South (2020-2021). In general, reporting was associated with areas peripheral or within forested environments characterised by an Atlantic forest biome, to males and older age groups, to variation in climate, population density and vaccination coverage. The suitability index used was calibrated to the urban cycle vector Aedes aegypti due to lack of data for mosquito-vectors of the enzootic transmission cycle. The index was associated positively with the timing of reported human infections, providing opportunities to estimate time-windows of importance for spillover events and surveillance across the country in future studies.
We were able to identify spatial hotspots characterised by both the reporting of human cases and low vaccination coverage, which were critically peripheral to some of the largest human populations in the Southeast. These hotspots demonstrate the recent reporting of cases close to large cities with large susceptible human populations and where Aedes aegypti is also typically abundant, thereby presenting possible ecological boundaries across which urban transmission cycles could become established. A lack of access to detailed reporting of human cases in other regions of Brazil hampered the application of our spatial regression model to a wider spatial scale. However, it remains plausible that similar hotspots exist within other states, including the state of Bahia that served as a corridor for clade Ia into the southern states, is known to be rich in NHP biodiversity, and also harbours large human populations. These hotspots in the Southeast should become the focus of future targeted mosquito and NHP surveillance, as well as public health campaigns to boost public awareness and vaccination coverage in particular within the peripheries of large urban centres.
Our newly generated YFV sequences were classified to the South American I genotype and formed three distinct clades Ia–IIIc. These clades presented different spatial patterns. Clade Ia historically spread from the Northern region of the state of Minas Gerais towards the Southeast and later to the Northeast. In contrast clade IIb showed an early dispersal from the Midwest (state of Goiás) to the Southeast (Minas Gerais, Espírito Santo and São Paulo) with a later spread to the South (state of Paraná, Santa Catarina and Rio Grande do Sul) where it has persisted at least until the end of 2021. While the general spatial spread of clade Ia and IIb have already been described in the Brazilian context (Delatorre et al., 2018; Giovanetti et al., 2019), the analysis of the novel isolates from three macro regions in 2020 and 2021 is novel (North, Midwest and Southeast). This also allowed us to produce the first genetic evidence in support of a corridor of spread (previously described using NHP case count data (Possas et al., 2018)) associated with clade IIIc which connected the endemic North with the extra-Amazonian basin (states of Goiás, Distrito Federal in the Midwest, and the state of Minas Gerais in the Southeast). We additionally described several point mutations within clades I-III viral genomes, of which six were associated with amino acid substitutions: S8053D and T9806I in the RdRp gene of clade Ia, V8048I in the RdRp gene of clade IIb, and S8048N in the RdRp gene of clade IIIc. As the RdRp is responsible for the replication of the viral genome, further studies are required to elucidate the possible impact of these mutations on structure and function, and hence on both viral pathogenesis and fitness.
We failed to identify if the locations within this corridor (that cuts through the centre of the country) had ecological or landscape features of relevance. For example, the associated central regions of Brazil are rich in the Cerrado biome (Tisler et al., 2022) which was negatively associated with human YFV case reporting. Compared to other regions, these regions are also not particularly suitable for the Haemagogus and Sabethes spp. vector-species that are associated with the sylvatic cycle (Li et al., 2022), nor do they present particular richness of relevant NHP species (IUCN. 2022). A potential contributing factor could be the large rivers that stem in a north-south axis over the corridor (e.g. Rio Tocantins, Rio das Mortes, Rio Araguaia), from the state of Pará crossing Tocantins (North) into Goiás (Midwest), the latter at the north border of Minas Gerais (Southeast) as well as the forests on the edges of the north-south oriented mountains that make up the brazilian central plateau. Rivers, mountains and their associated forest environments offer ideal corridors for NHP settlement and migration, potentially contributing to the spatial spread of YFV (Dietz et al., 2019). In this study we present the first evidence of the north-south directionality of spread over this spatial corridor, which appears particularly related to degraded and deforested riparian forests, raising awareness to its public health importance and suggesting that future research and surveillance initiatives should focus on the associated regions.
Our results reinforce the importance of the North of Brazil as a potential hotspot and revealed a new role for the Midwest regions of the country, in line with studies that have highlighted both regions as relevant hubs, not only for YFV, but also for dengue virus (DENV) (Faria et al., 2018; Giovanetti et al., 2019; Adelino et al., 2021). There were critical gaps in existing genomic data (445 genomes since 2002 with patchy temporal and spatial sampling, Fig S6) which curtailed definite conclusions in this study on key points of the recent history of YFV in Brazil. There remains, for instance, only a small number of viral genomes from the North, which is a key region for understanding YFV persistence dynamics and genetic diversity. At the same time, although the virus continuously expands its geographical range, there has been historically weak sampling in some states such as Goiás, Distrito Federal and Minas Gerais that hampered the unravelling of the corridor of spread revealed in the current study.
By identifying spatial corridors of spread, their eco-epidemiological backgrounds and drivers and mutational signatures associated with successful viral lineages, the combination of epidemiological and genomic surveillance within a One Health approach can have significant public health impact. It can identify the likely places for the reemergence of urban cycles of transmission, inform on emerging viral lineages that should be the focus of empirical laboratory experimentation, and critically identify areas and time windows where catch-up vaccination campaigns and surveillance initiatives should be directed. Given the existing large gaps in knowledge, there is clearly a need for continued funding for genomic surveillance of YFV both in Brazil and globally.
Data Availability
New sequences generated as part of this study have been deposited in GenBank under accession numbers OP508570-OP508716 (Table 1). Aggregate estimates of YFV temporal suitability (index P) and its input (humidity, temperature, rainfall) for the South and Southeast between 2014 and 2021 are provided as a comma separated values (CSV) file (Supplementary Data File 1). Estimates of the probability of reporting human cases per municipality for the South and Southeast (output of spatial regression model) are provided as a comma separated values (CSV) file (Supplementary Data File 2) geocoded with GEOCMU7 (a seven digit unique identifier).
Data availability
New sequences generated as part of this study have been deposited in GenBank under accession numbers OP508570-OP508716 (Table 1). Aggregate estimates of YFV temporal suitability (index P) and its input (humidity, temperature, rainfall) for the South and Southeast between 2014 and 2021 are provided as a comma separated values (CSV) file (Supplementary Data File 1). Estimates of the probability of reporting human cases per municipality for the South and Southeast (output of spatial regression model) are provided as a comma separated values (CSV) file (Supplementary Data File 2) geocoded with GEOCMU7 (a seven digit unique identifier).
Supplementary files
Supplementary Text File 1 includes a full description of the modelling approach to YFV suitability (Index P) and spatial regression of human infections (BYM2), details on the mutational differences between YFV clades I-III, as well as Supplementary Tables S1-5 and Supplementary Figures S1-4.
Acknowledgments
This work was supported in part through National Institutes of Health USA grant U01 AI151698 for the United World Arbovirus Research Network (UWARN) and the Brazilian Ministry of Health (SCON2021-00180). MG is funded by PON “Ricerca e Innovazione” 2014-2020. JL is funded by BioISI (Biosystems and Integrative Sciences Institute), Faculdade de Ciências da Universidade de Lisboa (UIDP/4046/2020). FP is funded by the One Health Poultry Hub, a Global Challenges Research Fund (GCRF) and United Kingdom Research and Innovation (UKRI) initiative. ECH is funded by an Australian Research Council Australian Laureate Fellowship (FL170100022). CNDS was founded by CNPq (307176-2018-5). Authors additionally thanks Fiocruz/CVSLR for the logistic support.
Conceptualization: MG, CZ, JL, LCJA and CNDS
Methodology: MG, FP, CZ, VF, TN, ACK, MT, GS, GGD, AEMLM, RS, TERA, JX, CO, SPS, NRG, HF, MAMG, FL and JL.
Investigation: MG, FP, CZ, VF, TN, ACK, MT, GS, GGD, AEMLM, RS, TARA, JX, CO, SPS, NRG, HF, MAMG, FL, PHR, VLS, LAP, AFM, ILM, DERS, GRTC, MBC, FCMI, MAP, KRLJC, ARRF, CFCA, EMM, MPDA, RCR, AASC, AP, MC, LA, INR, SPR, AIB, TO, CF, NFOM, AF, CDSR, CCS, MABA, ES, JC, DAA, EK, LFM, RRG, SFC, JABF, MGDL, ILB, GMPS, MRFS, MMSC, JCSS, ABJ, EGP, LST, DAA, RM, JSR, PCLS, ASAFS, SS, GC, RS, WN, LAM, AGAF, GGP, BTDN, DBAM, ACRC, RVC, WVV, AMBF, MA, ECH, DGR, AR, JL, LCJA, CNDS.
Sampling: MG, FP, CZ, VF, TN, ACK, MT, GS, GGD, AEMLM, RS, TARA, JX, CO, SPS, NRG, HF, MAMG, FL, PHR, VLS, LAP, AFM, ILM, DERS, GRTC, MBC, FCMI, MAP, KRLJC, ARRF, CFCA, EMM, MPDA, RCR, AASC, AP, MC, LA, INR, SPR, AIB, TO, CF, NFOM, AF, CDSR, CCS, MABA, ES, JC, DAA, EK, LFM, RRG, SFC, JABF, MGDL, ILB, GMPS, MRFS, MMSC, JCSS, ABJ, EGP, LST, DAA, RM, JSR, PCLS, ASAFS, SS, GC, RS, WN, LAM, AGAF, GGP, BTDN, DBAM, ACRC, RVC, WVV, AMBF, MA, ECH, DGR, AR, JL, LCJA, CNDS.
Sequencing: MG, VF, TERA, JX, SPS, CO and NRG.
Visualization: MG, FP, VG, TN and JL.
Funding acquisition: LCJA and CNDS.
Project administration: LCJA and CNDS.
Supervision: LCJA and CNDS.
Writing – original draft: MG, CZ, FP, JL, and CNDS.
Writing – review & editing: MG, FP, CZ, VF, TN, ACK, MT, GS, GGD, AEMLM, RS, TARA, JX, CO, SPS, NRG, HF, MAMG, FL, PHR, VLS, LAP, AFM, ILM, DERS, GRTC, MBC, FCMI, MAP, KRLJC, ARRF, CFCA, EMM, MPDA, RCR, AASC, AP, MC, LA, INR, SPR, AIB, TO, CF, NFOM, AF, CDSR, CCS, MABA, ES, JC, DAA, EK, LFM, RRG, SFC, JABF, MGDL, ILB, GMPS, MRFS, MMSC, JCSS, ABJ, EGP, LST, DAA, RM, JSR, PCLS, ASAFS, SS, GC, RS, WN, LAM, AGAF, GGP, BTDN, DBAM, ACRC, RVC, WVV, AMBF, MA, ECH, DGR, AR, JL, LCJA, CNDS.