Using multiple sampling strategies to estimate SARS-CoV-2 epidemiological parameters from genomic sequencing data

Rhys P. D. Inward; Kris V. Parag; Nuno R. Faria

doi:10.1101/2022.02.04.22270165

ABSTRACT

SARS-CoV-2 virus genomes are currently being sequenced at an unprecedented pace. The choice of viral sequences used in genetic and epidemiological analysis is important as it can induce biases that detract from the value of these rich datasets. This raises questions about how a set of sequences should be chosen for analysis, and which epidemiological parameters derived from genomic data are sensitive or robust to changes in sampling. We provide initial insights on these largely understudied problems using SARS-CoV-2 genomic sequences from Hong Kong and the Amazonas State, Brazil. We consider sampling schemes that select sequences uniformly, in proportion or reciprocally with case incidence and which simply use all available sequences (unsampled). We apply Birth-Death Skyline and Skygrowth methods to estimate the time-varying reproduction number (R_t) and growth rate (R_t) under these strategies as well as related R₀ and date of origin parameters. We compare these to estimates from case data derived from EpiFilter, which we use as a reference for assessing bias. We find that both R_t and R_t are sensitive to changes in sampling whilst R₀ and the date of origin are relatively robust. Moreover, we find that the unsampled datasets, which reflect an opportunistic sampling scheme, engender the most biased R_t and R_t estimates for both our Hong Kong and Amazonas case studies. We highlight that sampling strategy choices may be an influential yet neglected component of sequencing analysis pipelines. More targeted attempts at genomic surveillance and epidemic analyses, particularly in resource-poor settings with limited sequencing capabilities, are necessary to maximise the informativeness of virus genomic datasets.

INTRODUCTION

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is an enveloped single- stranded zoonotic RNA virus belonging to the Betacoronavirus genus and Coronaviridae family¹. It was first identified in late 2019 in a live food market in Wuhan City, Hubei Province, China². Within a month, SARS-CoV-2 had disseminated globally through sustained human-to-human transmission. It was declared a public health emergency of international concern on the 30th of January 2020 by the World Health Organisation³. Those infected with SARS-CoV-2 have phenotypically diverse symptoms ranging from mild fever to multiple organ dysfunction syndromes and death⁴.

Despite the implementation of non-pharmaceutical interventions (NPIs) by many countries to control their epidemics, to date over 418 million SARS-CoV-2 cases and 5.8 million deaths have been reported worldwide⁵. These NPIs can vary within and between countries and include restrictions on international and local travel, school closures, social distancing measures and the isolation of infected individuals and their contacts⁶. The key aim of NPIs is to reduce epidemic transmission, often measured by epidemiological parameters such as the time-varying effective reproduction number (R_t at time t) and growth rate (R_t), which both provide updating measures of the rate of spread of a pathogen (see Supplementary Table 1 for detailed definitions)^{7, 8}.

However, there is currently great difficulty in estimating and comparing epidemiological parameters derived from case and death data globally due to disparities in molecular diagnostic surveillance and notification systems between countries. Further, even if data are directly comparable, the choice of epidemiological parameter can implicitly shape insights into how NPIs influence transmission potential^{9, 10}. As such, there is a need to supplement traditional estimates with information derived from alternative data sources, such as genomic data¹¹, to gain improved and more robust insights into viral transmission dynamics^{12, 13}.

Phylodynamic analysis of virus genome sequences have increasingly been used for studying emerging infectious diseases, as seen during the current SARS-CoV-2 pandemic^14–17, recent Ebola virus epidemics in Western Africa¹⁸ and the Zika epidemic in Brazil and the Americas^{19, 20}. Transmissibility parameters such as the basic reproduction number (R₀), R_t and R_t can be directly inferred from genomic sequencing data or from epidemiological data, while other epidemiological parameters such as the date of origin of a given viral variant or lineage can only be estimated from genomic data. This is of particular importance for variants of concern (VOC), genetic variants with evidence of increased transmissibility, more severe disease, and/or immune evasion. VOC are typically detected through virus genome sequencing and only limited inferences can be made through epidemiological data alone²¹.

Currently, SARS-CoV-2 virus genomes from COVID-19 cases are being sequenced at an unprecedented pace providing a wealth of virus genomic datasets²². There are currently over 8.4 million genomic sequences available on GISAID, an open-source repository for influenza and SARS-CoV-2 genomic sequences²³. These rich datasets can be used to provide an independent perspective on pathogen dynamics and can help validate or challenge parameters derived from epidemiological data. Specifically, the genomic data can potentially overcome some of the limitations and biases that can result from using epidemiological data alone. For example, genomic data are less susceptible to changes at the government level such as alterations to the definition of a confirmed case and changes to notification systems^{24, 25}. Inferences from virus genomic data improve our understanding of underlying epidemic spread and can facilitate better-informed infection control decisions²⁶. However, these advantages are not straightforward to realise. The added value of genomic data depends on two related variables: sampling strategy and computational complexity.

The most popular approaches used to investigate changes in virus population dynamics include the Bayesian Skyline Plot²⁷ and Skygrid²⁸ models and the Birth-Death Skyline (BDSKY)²⁹. These integrate Markov Chain Monte Carlo (MCMC) procedures and often converge slowly on large datasets³⁰. As such, currently available SARS-CoV-2 datasets containing thousands of sequences become computationally impractical to analyse and sub- sampling is necessary. Although previous studies have examined how sampling choices might influence phylodynamic inferences^30–34, this remains a neglected area of study³⁵, particularly concerning SARS-CoV-2 for which sequencing efforts have been unprecedented ³⁶. To our knowledge, there are no published studies concerning SARS-CoV-2 which explore the effect that sampling strategies have on the phylodynamic reconstruction of key transmission parameters. Incorrectly implementing a sampling scheme or ignoring its importance can mislead inferences and introduce biases^{30, 37}.

This raises the important question that motivates our analysis: how should sequences be selected for phylodynamic analysis and which parameters are sensitive or robust to changes in different sampling schemes. Here we explore how diverse sampling strategies in genomic sequencing may affect the estimation of key epidemiological parameters. We estimate R₀, R_t, and R_t from genomic sequencing data under different sampling strategies from a location with higher genomic coverage represented by Hong Kong, and a location with lower genomic coverage represented by the Amazonas region, Brazil. We then compare our estimates against those derived from reference case data. By benchmarking genomic inferences against those from case data we can better understand the impact that sampling strategies may have on phylodynamic inference, bolster confidence in estimates of genomic-specific parameters such as the origin time and improve the interpretation of estimates from areas with heterogeneous genomic coverage.

METHODS

Empirical Estimation of the Reproduction Number, Time-varying Effective Reproduction Number, and Growth Rate

Epidemiological Datasets

Two sources of data from the Amazonas region, Brazil and one source of data from Hong Kong were used to calculate empirical epidemiological parameters. For the Amazonas region, case data from the SIVEP-Gripe (Sistema de Informação de Vigilância Epidemiológica da Gripe) SARI (severe acute respiratory infections) database from the 30^th of November 2020 up to 7^th of February 2021 were used. Here we were interested in cases caused by the P.1/Gamma VOC first detected in Manaus, the number of P.1 cases was calculated by using the proportion of P.1/Gamma viral sequences uploaded to GISAID within each week (Supplementary Figure 1). For Hong Kong, all case data were extracted from the Centre of Health Protection, Department of Health, the Government of the Hong Kong Special Administrative region up to the 7^th of May 2020. Due to lags in the development of detectable viral loads, symptom onset and subsequent testing³⁸; the date on which each case was recorded was left shifted by 5 days within our models³⁹ to account for these delays in both datasets.

Basic Reproduction Number

The R0 parameter was estimated using a time series of confirmed SARS-CoV-2 cases from both Hong Kong and the Amazonas region. To avoid the impact of NPIs, only data up to the banning of mass gathering in Hong Kong (27^th March 2020) and until the imposition of strict restrictions in the Amazonas region (12^th January 2021) were used. Weekly counts of confirmed cases were modelled using maximum likelihood methods. The weekly case counts were assumed to be Poisson distributed and were fitted to a closed Susceptible-Exposed- Infectious-Recovered (SEIR) model (Equation 1) by maximising the likelihood of observing the data given the model parameters (Table 1). Subsequently, the log-likelihood was used to calculate the R0 by fitting β, the effective contact rate. To generate approximate 95% confidence intervals (CIs) for R₀, non-parametric bootstrapping was used with 1000 iterations.

View this table:

Table 1:

This shows the parameter estimates used within the deterministic SEIR model.

Time-varying Effective Reproduction Number

To estimate R_t from case line list data the EpiFilter method⁴⁴ was used. EpiFilter describes transmission using a renewal model; a general and popular framework that can be applied to infer the dynamics of numerous infectious diseases from case incidence⁴⁵. This model describes how the number of new cases (incidence) at time t depends on R_t at that specified time point and the past incidence, which is summarised by the cumulative number of cases up to each time point weighted by the generation time distribution, which we assume to be known. Epifilter integrates both Bayesian forward and backward recursive smoothing. This improves R_t estimates by leveraging the benefits of two of the most popular R_t estimation approaches: EpiEstim ⁴⁶ and the Wallinga-Teunis method⁴⁷. EpiFilter minimises the mean squared error in estimation and reduces dependence on prior assumptions, making it a suitable candidate for deriving reference estimates. We use these to benchmark estimates independently obtained from genomic data. We assume the generation time distribution is well approximated by the serial interval (SI) distribution⁴⁶, which describes the times between symptom onsets between an infector–infectee pair.

Growth Rate

After R_t has been inferred, the Wallinga-Lipsitch equation for a gamma distributed generation time distribution (Equation 2) was used to estimate the exponential epidemic R_t⁴⁸. The SI for Hong Kong was derived from a systematic review and meta-analysis⁴⁹ and a study exploring SI in Brazil was used for the Amazonas datasets⁵⁰. The SI was assumed to be gammadistributed. The gamma distribution is represented by gamma (ε, γ) with ε and γ being the shape and scale parameters respectively.

SARS-CoV-2 Brazilian Gamma VOC and Hong Kong datasets

All high-quality, complete SARS-CoV-2 genomes were downloaded from GISAID²³ for Hong Kong (up to 7^th May 2020) and the Amazonas state, Brazil (from 30th November 2020 up to 7^th February 2021). Using the Accession ID of each sequence, all sequences were screened and only sequences previously analysed and published in PubMed, MedRxiv, BioRxiv, virological or Preprint repositories were selected for subsequent analysis. For both datasets, sequence alignment was conducted using MAFFTV.7⁵¹. The first 130 base pairs (bp) and last 50 bps of the aligned sequences were trimmed to remove potential sequencing artefacts in line with the Nextstrain protocol⁵². Both datasets were then processed using the Nextclade pipeline for quality control (https://clades.nextstrain.org/). Briefly, the Nextclade pipeline examines the completeness, divergence, and ambiguity of bases in each genetic sequence. Only sequences deemed ‘good’ by the Nextclade pipeline were selected for.

Subsequently, all sequences were screened for identity and in the case of identical sequences, for those with the same location, collection date, only one such isolate was used. Moreover, PANGO lineage classification was conducted using the Pangolin²² software tool (http://pangolin.cog-uk.io) on sequences from the Amazonas region and only those with the designated P.1/Gamma lineage were selected for (Supplementary Figure 1).

Maximum Likelihood tree reconstruction

Maximum likelihood phylogenetic trees were reconstructed using IQTREE2⁵³ for both datasets. A TIM2 model of nucleotide substitution with empirical base frequencies and a proportion of invariant sites was used as selected for by the ModelFinder application⁵⁴ for the Hong Kong dataset. For the Brazilian dataset, a TN model of nucleotide substitution⁵⁵ with empirical base frequencies was selected for. To assess branch support, the approximate likelihood-ratio test based on the Shimodaira–Hasegawa-like procedure with 1,000 replicates⁵⁶, was used.

Root-to-tip regression

To explore the temporal structure of both the Brazilian and Hong Kong dataset, TempEst was used to regress the root-to-tip genetic distances against sampling dates (yyyy- mm-dd). The ‘best-fitting’ root for the phylogeny was found by maximising the R² value of the root-to-tip regression (Supplementary Figure 2). Several sequences showed incongruent genetic diversity and were discarded from subsequent analyses. This resulted in a final dataset of N = 117 Hong Kong sequences and N = 196 Brazilian sequences. The gradient of the slopes (clock rates) provided by TempEst were used to inform the clock prior in the phylodynamic analysis.

Subsampling for analysis

Four retrospective sampling schemes were used to select a subsample of Amazonas and Hong Kong sequences. Each sampling period was broken up into weeks with each week being used as an interval according to a temporal sampling scheme (without replacement). This temporal sampling scheme was based on the number of reported cases of SARS-CoV-2.

The temporal sampling schemes that we explored were:

● Uniform sampling: All weeks have equal probability.
● Proportional sampling: Weeks are chosen with a probability proportional to the value of the number of cases in each epi-week.
● Reciprocal-proportional sampling: Weeks are chosen with a probability proportional to the reciprocal of the number of cases in each epi-week.
● No sampling strategy applied: All sequences were included without a sampling strategy applied (equivalent to opportunistic sampling).

These sampling schemes were inspired by those recommended by the WHO for practical use in different settings and scenarios⁵⁸. Proportional sampling is equivalent to representative sampling, uniform sampling is equivalent to fixed sampling whilst the unsampled data includes all sampling strategies. Reciprocal-proportional sampling is not commonly used in practice as was used as a control within this study.

Bayesian Evolutionary Analysis

Date molecular clock phylogenies were inferred for all sampling strategies applied to the Amazonas and Hong Kong dataset using BEAST v1.10.4⁵⁹ with BEAGLE library v3.1.0⁶⁰ for accelerated likelihood evaluation. For both the Amazonas and Hong Kong datasets, a HKY substitution model with gamma-distributed rate variation among sites and four rate categories was used to account for among-site rate variation⁶¹. A strict clock molecular clock model was chosen. Both the Amazonas and Hong Kong dataset were analysed under a flexible non- parametric skygrid tree prior⁶². Four independent MCMC chains were run for both datasets.

For the Amazonas dataset, each MCMC chain consisted of 250,000,000 steps with sampling every 50,000 steps. Meanwhile, for the Hong Kong dataset, each MCMC chain consisted of 200,000,000 steps with sampling every 40,000 steps. For both datasets, the four independent MCMC runs were combined using LogCombiner v1.10.4⁵⁹. Subsequently, 10% of all trees were discarded as burn in, and the effective sample size of parameter estimates were evaluated using TRACER v1.7.2⁶³. An effective sample size of over 200 was obtained for all parameters. Maximum clade credibility (MCC) trees were summarised using Tree Annotator⁵⁹.

Phylodynamic Reconstruction

Estimation of the Basic and Time-varying Effective Reproduction Numbers

The Bayesian birth-death skyline (BDSKY) model²⁹ implemented within BEAST 2 v2.6.5⁶⁴ was applied to estimate the time-varying transmissibility parameter R_t (Table 2). A HKY substitution model with a gamma-distributed rate variation among sites and four rate categories⁶¹ was used alongside a strict molecular clock model. The selected number of intervals for both datasets was 5, representing R_t changing every 2.5 weeks for the Hong Kong datasets and every 2 weeks for the Brazilian datasets, with equidistant intervals per step. An exponential distribution was used with a mean of 36.5y^-1 for the rate of becoming infectious, assuming a mean duration of infection of 10 days¹⁵. A uniform distribution prior was used for the sampling proportion, which models changes in case ascertainment. Four independent MCMC chains were run for 50 million MCMC steps with sampling every 5000 steps for each dataset. These MCMC runs were combined using LogCombiner v2.6.5.⁶⁴ and the effective sample size of parameter estimates evaluated using TRACER v1.7.2⁶³. We obtained an effective sample size above 200 for all parameters (indicating convergence) and plotted all results using the bdskytools R package (https://github.com/laduplessis/bdskytools).

View this table:

Table 2: Values and priors for the parameters of the BDSKY model

Estimation of Growth Rates

For each dataset, a scaled proxy for R_t was obtained from the skygrowth method⁶⁵ within R. Skygrowth uses a non-parametric Bayesian approach to apply a first-order autoregressive stochastic process on the growth rate of the effective population size. The MCMC chains were run for one million iterations for each dataset on their MCC tree with an Exponential (10^-5) prior on the smoothing parameter. The skygrowth model was parameterised assuming that the effective population size of SARS-COV-2 could change every two weeks. To facilitate a comparison of the scaled proxy for R_t estimated by skygrowth and exponential R_t estimated by EpiFilter, the R_t estimated by the skygrowth method was rescaled to the exponential growth rate. This was achieved by adding a gamma rate variable to the scaled proxy for R_t, which assumed a mean duration of infection of 10 days¹⁵, to calculate R_t.

Subsequently, the Wallinga-Lipsitch equation (Equation 2) was used to convert R_t into the exponential growth rate⁴⁸.

Comparing Parameter Estimates from Genetic and Epidemiological Data

To compare estimates derived from epidemiological and genetic data the Jensen-Shannon divergence (DJS)⁶⁶, which measures the similarity between two probability mass functions (PMFs), was applied. The DJS offers a formal information theoretic evaluation of distributions and is more robust than comparing Bayesian credible intervals (BCIs) since it considers both the shape and spread of a given distribution. The DJS is essentially a symmetric and smoothed version of the Kullback-Leibler divergence (DKL) and is commonly used in the fields of machine learning and bioinformatics. The DKL between two PMFs, P and Q, is defined in Equation 3 below⁶⁷. To calculate the PMF for each epidemiological parameter, the cumulative probability density function (PDF) was extracted for each model, converted to a probability density function (PDF), and a discretisation procedure then applied τ represents the PDF and is discretized via

Equation 4, where s = 0.05, 0.01….and τ(ν) is the cumulative probability density of τ and i is the incidence. The Jensen-Shannon distance (JSD) metric quantifies the DJS by taking the square-root of the total DJS and is the metric that we used to compare parameter estimations from differing sampling strategies. The JSD can be calculated using Equation 5 with P and Q representing the two probability distributions and DKL as the KL divergence. A smaller JSD metric indicates that two probability distributions (P and Q) are more similar with a Jensen-Shannon distance of 0 indicating equivalence of the two distributions. The mean JSD was taken over all intervals for the BDSKY and Skygrowth models to obtain an overall measure of the level of estimated similarity.

Data availability

Please see https://github.com/rhysinward/Phylodyanmic-Subsampling for code and data used within this study.

RESULTS

Sampling Schemes

Hong Kong

Hong Kong reacted rapidly upon learning of the emergence of SARS-CoV-2 in Wuhan, Hubei province, China, by declaring a state of emergency on the 25th of January 2020 and by mobilising intensive surveillance schemes in response to initial cases⁶⁸. This appeared to be successful in controlling the first wave of cases. However, due to imported cases from Europe and North America, a second wave of SARS-CoV-2 infections emerged prompting stricter NPIs such as the closure of borders and restrictions on gatherings ⁶⁸. Following these measures, the incidence of SARS-CoV-2 rapidly decreased (Figure 1). Hong Kong has a high sampling intensity with 11.6% of confirmed cases sequenced during our study period.

Figure 1.

Confirmed SARS-CoV-2 cases from Hong Kong until 7^th of May 2020. The dashed lines represent policy change-times⁶⁸.

Further, Hong Kong has high quality case data with a high testing rate through effective tracing of close contacts, testing of all asymptomatic arriving travellers and all patients with pneumonia⁶⁹.

The number of cases within Hong Kong for each week was used to inform the sampling schemes used within this study. This resulted in the unsampled scheme having N = 117 sequences, the proportional sampling scheme having N = 54 sequences, the uniform sampling scheme having N = 79 and the reciprocal-proportional sampling scheme having N = 84 sequences (Supplementary Figure 3).

Amazonas

The Amazonas state of Brazil had its first laboratory confirmed case of SARS-CoV-2 in March 2020 in a traveller returning from Europe⁷⁰. After a first large wave of SARS-CoV-2 infections within the state that peaked in early May 2020 (Figure 2), the epidemic waned, cases dropped, remaining stable until mid-December 2020. The number of cases then started growing exponentially, ushering in a second epidemic wave. This second wave peaked in January 2021 (Figure 2) and coincided with the emergence of a new SARS-CoV-2 VOC, designated P.1/Gamma¹⁴.

Figure 2.

Confirmed SARS-CoV-2 cases from Amazonas state, north Brazil until 7^th of February 2021. The dashed lines represent policy change-times ⁷².

To combat this second wave, the Government of the Amazonas state suspended all non- essential commercial activities on the 23rd of December 2020 (http://www.pge.am.gov.br/legislacao-covid-19/). However, in response to protests, these restrictions were reversed, and cases continued to climb. On the 12th of January, when local transmission of P.1/Gamma was confirmed in Manaus, capital of Amazonas state⁷¹, NPIs were re-introduced (http://www.pge.am.gov.br/legislacao-covid-19/) which seemed to be successful in reducing the case incidence in the state. However, cases remained comparatively high (Figure 4). Amazonas has a low sampling intensity with 2.4% of suspected P.1/gamma cases sequenced during our study period.

The number of cases within the Amazonas region informed the sampling schemes used within this study. This resulted in the unsampled scheme having N = 196 sequences, the proportional sampling scheme having N = 168 sequences, the uniform sampling scheme having N = 150 and the reciprocal-proportional sampling scheme having N = 67 sequences (Supplementary Figure 4).

Root-to-tip Regression

The correlation (R²) between genetic divergence and sampling dates for the Hong Kong datasets ranged between 0.36 and 0.52 and between 0.13 and 0.20 for the Amazonas datasets (Supplementary Figure 2). This implies that the Hong Kong datasets have a stronger temporal signal. This is likely due to the Hong Kong datasets have a wider sampling interval (106 days) compared to the Amazonas datasets (69 days). A wider sampling interval can lead to a stronger temporal signal⁷³. The gradient (rate) of the regression ranged from 1.16x10^-3 to 2.09x10^-3 s/s/y for the Hong Kong datasets and 4.41x10^-4 to 5.30x10^-4 s/s/y for the Amazonas datasets.

Estimation of Evolutionary Parameters

The mean substitution rate (measured in units of number of substitutions per site per year, s/s/y) and the time to most common recent ancestor (TMRCA) was estimated in BEAST, for both datasets, and the estimation from all sampling schemes was compared.

Hong Kong

For Hong Kong, the mean substitution rate per site per year ranged from 9.16x10^-4 to 2.09x10^-3 with sampling schemes all having overlapped Bayesian credible intervals (BCIs) (Supplementary Table 2; Supplementary Figure 5A). This indicates that the sampling scheme did not have a significant impact on the estimation of the clock rate. Moreover, the clock rate is comparable to estimations from the root-to-tip regression and to early estimations of the mean substitution rate per site per year of SARS-CoV-2 (Duchene et al., 2020).

Molecular clock dating of the Hong Kong dataset indicates that the estimated time of the most common recent ancestor was around December 2020 (Figure 3B; Supplementary Table 2). This is a few weeks before the first confirmed case which was reported on the 18th of January 2021. Once again, all sampling strategies have overlapped BCIs and with the range in means differing by around three weeks, a relatively short time scale, suggesting that the sampling scheme does not significantly impact the estimation of the TMRCA.

Figure 3.

R₀ estimated from BDSKY and TMRCA for Hong Kong and Brazil. Figure 1A and B represent Hong Kong and Figure 1C and D represent the Amazonas.

Brazil

For the P.1 lineage in the Amazonas region, the mean substitution rate ranged from 4.00x10^-4 to 5.56x10^-4 s/s/y with all sampling schemes having overlapped BCIs (Figure 3D, Supplementary Table 2; Supplementary Figure 5B). This indicates that sampling strategy does not impact the estimation of the clock rate, supporting findings from the Hong Kong dataset. This also supports estimations from the root-to-tip analysis (Supplementary Figure 2).

Molecular clock dating estimated a TMRCA mean around late October to early November (Figure 3D; Supplementary Table 2). This is around five weeks before the date of the first P.1 case identified in Manaus used in our study. All sampling schemes have overlapping BCI consistent with the conclusion from the Hong Kong data that TMRCA is relatively robust to sampling.

Estimation of Basic Reproduction Number

We found that Hong Kong had a significantly lower R₀ of 2.17 (95% credible interval (CI) = 1.43 - 2.83) when compared to Amazonas which had a R₀ of 3.67 (95% CI = 2.83 – 4.48). All sampling schemes for both datasets were characterised by similar R₀ values (Figure 3) indicating that the estimation of R₀ is robust to changes in sampling scheme.

Time-varying Reproduction number and Growth rate

We estimate R_t and R_t for local SARS-CoV-2 epidemics in Hong Kong and Amazonas, Brazil. Our main results showing these two parameters and JSD metrics are shown in figures 4-8.

Hong Kong

We applied the BDSKY model to estimate the R_t for each dataset subsampled according to the different sampling strategies (Figure 4). We compared these against the R_t from case data, derived from EpiFilter. Based on the proportional sampling scheme, which had the lowest

Figure 4:

R_t estimated from both the BDSKY and EpiFilter methods for Hong Kong. The bold writing represents the sampling scheme used in panels A-D. The light-shaded area represents the 95% Highest Posterior Density Interval with the darker-shaded area presenting where the BDSKY and EpiFilter models overlap. The solid line represents the mean R_t estimate with EpiFilter in red and BDSKY in blue. The dashed lines represent policy change-times. The Jensen Shannon Distance is ordered from best to worse in panel E.

JSD (Figure 4E), we initially infer a super-critical R_t value, with a mean around 2, that appears to fall swiftly in response to the state of emergency and the rapid implementation of NPIs. A steady transmission rate subsequently persisted throughout the following weeks around the critical threshold (R_t = 1). This period is followed by a sharp increase in R_t, peaking at a mean value of 2.6. This is likely due to imported cases from North America and Europe⁶⁸. This led to a ban on international travel resulting in a sharp decline in R_t (Figure 2). However, this decline lasted around a week with the mean R_t briefly increasing until more stringent NPIs such as the banning of major gatherings were implemented. Following this, the R_t continued its sharp decline falling below the critical threshold, with transmission becoming sub-critical (Figure 4).

These results were mirrored in the estimation of rt. (Figure 5) for which the uniform and proportional sampling schemes showed the least divergence (Figure 5E). There was an initial decline in the R_t, which steadied at a value of ∼ 0, indicating that epidemic stabilisation had occurred. This stable period is followed by an increase in R_t peaking at around a 0.050 d^-1 (Figure 5). In response to NPIs, the R_t starts to decrease, falling below 0, indicating a receding epidemic. The rate of this decline peaks at around -0.075 d^-1 (Figure 5).

Figure 5:

R_t estimated from both the Skygrowth and EpiFilter methods for Hong Kong. The bold writing represents the sampling scheme used in panels A-D. The light-shaded area represents the 95% Highest Posterior Density Interval with the darker-shaded area presenting where the BDSKY and Skygrowth models overlap. The solid line represents the mean R_t estimate with Skygrowth in red and BDSKY in blue. The dashed lines represent policy change-times. The JSD metric is ordered from best to worse in panel E.

Brazil

Based on the uniform sampling scheme, which had the lowest JSD (Figure 6E), we initially infer super-critical transmission (R_t > 1) with a mean value of 3 (Figure 6). From this point, the R_t declines, although it remains above the critical threshold (R_t = 1) for much of the study period. Sub-critical transmission (R_t < 1) was only reached after the re-imposition of NPIs.

Figure 6:

R_t estimated from both the BDSKY and EpiFilter methods forAmazonas, Brazil. The bold writing represents the sampling scheme used in panels A-D. The light-shaded area represents the 95% Highest Posterior Density Interval with the darker-shaded area presenting where the BDSKY and EpiFilter models overlap. The solid line represents the mean R_t estimate with EpiFilter in red and BDSKY in blue. The dashed lines represent policy change-times. The Jensen Shannon Distance is ordered from best to worse in panel E.

This implies that initial restrictions, such as the suspension of commercial activities, were likely insufficient for suppressing spread. Only after more stringent restrictions were imposed did R_t become sub-critical. However, there is no evidence of a sharp decrease in R_t once restrictions were re-imposed, which may suggest limited effectiveness.

Based on the uniform sampling scheme, which had the lowest JSD (Figure 7E) we infer a steady decline in R_t which matches the pattern seen with the R_t value (Figure 7). The initial R_t implied a 0.250 d^-1. Subsequently, the R_t falls over the study period. R_t falls below 0 after the re-imposition of NPIs declining at -0.030 d^-1 by the end of the study period. There is no evidence of any noticeable declines in R_t when interventions were introduced indicating that they might not have significantly impacted the growth rate of P.1/gamma.

Figure 7:

R_t estimated from both the Skygrowth and EpiFilter methods for Amazonas, Brazil. The bold writing represents the sampling scheme used in panels A-D. The light- shaded area represents the 95% Highest Posterior Density Interval with the darker-shaded area presenting where the BDSKY and Skygrowth models overlap. The solid line represents the mean R_t estimate with Skygrowth in red and BDSKY in blue. The dashed lines represent policy change-times. The JSD is ordered from best to worse in panel E.

Discussion

In this study, phylodynamic methods have been applied to available SARS-CoV-2 sequences from Hong Kong and the Amazonas region of Brazil to infer their key epidemiological parameters and to compare the impact that various sampling strategies have on the phylodynamic reconstruction of these parameters.

We estimated the basic reproductive number of SARS-CoV-2 in Hong Kong to be 2.17 (95% CI = 1.43-2.83). This supports previous estimates of the initial R₀ in Hong Kong^{68, 74} which estimates R₀ to be 2.23 (95% CI = 1.47-3.42). For the Amazonas region in Brazil, we estimated the R₀ to be 3.67 (95% CI = 2.83 – 4.48). Whilst the population of Amazonas State may not be fully susceptible to P.1/gamma¹⁴, this should not affect the comparison among sampling schemes. We found that R₀ is robust to changes in sampling schemes (Figure 3A and C).

For the Hong Kong dataset, the proportional sampling scheme was superior to all other sampling schemes in estimating Rt. It successfully predicted the initial super-critical R_t, its decline in response to rapid NPIs, and subsequent increase and decline during the second wave of infections (Figure 4B). This was in comparison to the reciprocal-proportional scheme, which provided the worst (largest) JSD (Figure 4D) and an R_t estimate that was largely insensitive to NPIs. The proportional sampling scheme, alongside the uniform sampling scheme, best estimated rt (Figure 5B and C). In contrast, for the Amazonas dataset, the uniform sampling scheme best estimated the R_t and R_t (Figure 6C and Figure 7C). It captured both its initial super-critical R_t and high R_t alongside their subsequent decline. Our R_t estimates are consistent with previous estimates of P.1 in Amazonas state¹⁴. This contrasted with the unsampled data in which the R_t increased at the end of the period (Figure 7A). This highlights that unlike R0, both R_t and R_t are sensitive to changes in sampling and that even related epidemiological parameters like R_t and R_t may require different sampling strategies to optimise inferences.

Molecular clock dating of the Hong Kong and Amazonas dataset has revealed that the date of origin is relatively robust to changes in sampling schemes. For Hong Kong, SARS-CoV-2 likely emerged in mid-December 2019 around 5 weeks before the first reported case on the 22^nd of January 2020⁶⁸. The Amazonas dataset revealed that the date of the common ancestor of the P.1 lineage emerged around late October 2020 to early November, around 5 weeks before the first reported case on the 6^th of December¹⁴, with all BCI’s overlapping for each sampling strategy. Like the molecular clock dating, we found that the molecular clock rate was robust to changes in sampling strategies in both datasets with all sampling strategies having overlapped BCI’s (Supplementary Table 2 and Supplementary Figure 5). For the Hong Kong dataset, its clock rate is comparable to early estimations of the mean substitution rate per site per year of SARS-CoV-2¹³. However, the clock rate estimated for the Brazilian dataset is lower than initial 8.00x10^-4 s/s/y which is used in investigating SARS-CoV-2⁷⁵ and that has been used in previous analyses of P.1⁷⁶. This initial estimation of evolutionary rate was estimated from genomic data taken over a short time span at the beginning of the pandemic introducing a time dependency bias. By using a more appropriate clock rate it can improve tree height and rooting resulting in more robust parameter estimations⁷⁷.

Treating sampling times as uninformative has been shown to be inferior to including them as dependent on effective population size and other parameters by several previous studies^{30, 31, 34, 78}. Whilst these studies did not consider the estimation of epidemiological parameters, they highlight the potential of systematic biases being introduced into the phylodynamic reconstruction by not using a sampling scheme or by assuming an incorrect model for how sampling schemes introduce information. This was supported by our results as phylodynamic inferences with no sampling strategy applied had the poorest performance for both Hong Kong and the Amazonas region. This implies that sampling has a significant impact on phylodynamic reconstruction, and that exploration of sampling strategies is needed to obtain the most robust parameter estimates.

While our results provide a rigorous underpinning and insight into the dynamics of SARS- CoV-2 and the impact of sampling strategies in the Amazonas region and Hong Kong, there are limitations. The Skygrowth and BDSKY models do not explicitly consider imports into their respective regions. This is particularly relevant for Hong Kong as most initial sequences from the region were sequenced from importation events⁷⁹ which can introduce error into parameter estimation. However, as the epidemic expanded, more infections were attributable to autochthonous transmission⁷⁹, and the risk of error introduced by importation events decreased. Moreover, while sampling strategies can account for temporal variations in genomic sampling fractions there is currently no way to account for non-random sampling approaches in either the BDSKY or Skygrowth models⁸⁰. It is unclear how network-based sampling may affect parameter estimates obtained through these models⁸¹ presenting a key challenge in molecular and genetic epidemiology. Spatial heterogeneities were also not explored within this work. This represents the next key step in understanding the impact of sampling as spatial sampling schemes would allow the reconstruction of the dispersal dynamics and estimation of epidemic overdispersion (k), a key epidemiological parameter.

Finally, we compared our phylodynamic estimates against epidemiological inferences derived from case data from Hong Kong and Amazonas state, two settings with very different diagnostic capacity. While Hong Kong has high quality case data with a high testing rate through⁶⁹, there is a large underreporting of SARS-CoV-2 cases in the Amazonas state^{72, 82} .

Future epidemiological modelling work is needed to compare parameter estimates obtained from case data, death data and excess death data across different settings.

This work has highlighted the impact and importance that applying temporal sampling strategies can have on phylodynamic reconstruction. Whilst more genomic datasets from a variety of countries and regions with different sampling intensities and proportions are needed to create a more generalisable sampling framework and to dissect any potential cofounders, it has been shown that genomic datasets with no sampling strategy applied can introduce significant uncertainty and biases in the estimation of epidemiological parameters. This finding identifies the need for more targeted attempts at performing genomic surveillance and epidemic analyses particularly in resource-poor settings which have a limited genomic capability.

Role of the Funding Sources

N.R.F. acknowledges support from Wellcome Trust and Royal Society Sir Henry Dale Fellowship (204311/Z/16/Z), Bill and Melinda Gates Foundation (INV-034540) and Medical Research Council-Sao Paulo Research Foundation (FAPESP) CADDE partnership award (MR/S0195/1 and FAPESP 18/14389-0) (https://caddecentre.org). K.V.P. acknowledges support from grant reference MR/R015600/1, jointly funded by the UK Medical Research Council (MRC) and the UK Department for International Development (DFID) and from the NIHR Health Protection Research Unit in Behavioural Science and Evaluation at University of Bristol.

CRediT authorship contribution statement

R.P.D.I, K.V.P and N.R.F conceived and designed the study, R.P.D.I wrote and performed the analyses. R.P.D.I wrote the manuscript which was edited and supervised by K.V.P and N.R.F. All authors have contributed to and approved the manuscript for submission.

Data Availability

All data produced in the present study are available upon reasonable request to the authors

Supplementary Figures and Tables

Supplementary Figure 1:

The proportion of P.1 sequences compared to non-P.1 sequences found on GISaid (Shu and McCauley, 2017).

Supplementary Figure 2:

Root-to-tip genetic distances to sample collection dates for the SARS-CoV-2 genome datasets used in this study: A-D represents Hong Kong and E-H represent Amazonas State. Plots are based on the maximum likelihood trees rooted by maximising R². The linear regression trend lines are shown to data points, corresponding to the genome sequences (represented with black dots).

Supplementary Figure 3:

Number of sequences for each week and sampling scheme for Hong Kong dataset.

Supplementary Figure 4:

Number of sequences for each week and sampling scheme for Amazonas dataset.

Supplementary Figure 5:

Mean substitution rate (s/s/y) for Hong Kong and Brazil. Figure 1A represents Hong Kong with Figure 1B representing the Amazonas.

View this table:

Supplementary Table 1:

Key parameters and definitions for SARS-CoV-2

View this table:

Supplementary Table 2:

TMRCA and mean substitution rate both with 95% BCI for each sampling strategy for Hong Kong and Amazonas datasets alongside the Jensen-Shannon distance. Full posterior distribution of the TMRCA and substitution rates obtained under the different sampling strategies can be found in Figure 3B and D and Supplementary Figure 5.

View this table:

Supplementary Table 3:

Accession ID of each Hong Kong sequence for each sampling strategy used within this study

View this table:

Supplementary Table 4:

Accession ID of each Amazonas State, Brazil sequence for each sampling strategy used within this study

Footnotes

↵5 Jointly supervised this work
Tidying main text and adding clarity. Supplemental files updated. Figure 6 revised.

References

1.↵
Gorbalenya, A. E. et al. The species Severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2. Nature Microbiology 5, 536–544 (2020).
OpenUrl
2.↵
Zhu, N. et al. A Novel Coronavirus from Patients with Pneumonia in China, 2019. New England Journal of Medicine 382, 727–733 (2020).
OpenUrl CrossRef PubMed
3.↵
World Health Organisation. Public Health Emergency of International Concern (PHEIC). (2020).
4.↵
Verity, R. et al. Estimates of the severity of coronavirus disease 2019: a model-based analysis. The Lancet. Infectious diseases 20, 669–677 (2020).
OpenUrl CrossRef PubMed
5.↵
World Health Organisation. Coronavirus disease (COVID-19) Weekly Epidemiological Update and Weekly Operational Update. https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports (2022).
6.↵
European Centre for Disease Prevention and Control. Guidelines for the implementation of non-pharmaceutical interventions against COVID-19 Key messages General considerations on NPI to control COVID-19. (2020).
7.↵
Anderson, Vegari, C., Baggaley, R., Hollingsworth, T. D. D. & Maddren, R. The Royal Society SET-C Reports. Reproduction number (R) and growth rate (r) of the COVID-19 epidemic in the UK: methods of estimation, data sources, causes of heterogeneity, and use as a guide in policy formulation [report unpublished]. The Royal Society 1–86 (2020).
8.↵
UK Health Security Agency. The R value and growth rate. https://www.gov.uk/guidance/the-r-value-and-growth-rate (2022).
9.↵
Parag, K. v, Thompson, R. N. & Donnelly, C. A. Are epidemic growth rates more informative than reproduction numbers? medRxiv 2021.04.15.21255565 (2021) doi:10.1101/2021.04.15.21255565.
OpenUrl Abstract/FREE Full Text
10.↵
Dushoff, J. & Park, S. W. Speed and strength of an epidemic intervention. Proceedings of the Royal Society B: Biological Sciences 288, 20201556 (2021).
11.↵
World Health Organisation. Genomic sequencing of SARS-CoV-2 A guide to implementation for maximum impact on public health. (2021).
12.↵
Jombart, T., et al. Bayesian Reconstruction of Disease Outbreaks by Combining Epidemiologic and Genomic Data. PLOS Computational Biology 10, e1003457-(2014).
OpenUrl
13.↵
Duchene, S. et al. Temporal signal and the phylodynamic threshold of SARS-CoV-2. Virus Evolution 6, (2020).
14.↵
Faria, N. R. et al. Genomics and epidemiology of the P.1 SARS-CoV-2 lineage in Manaus, Brazil. Science 372, 815 LP – 821 (2021).
OpenUrl
15.↵
Nadeau, S. A., Vaughan, T. G., Scire, J., Huisman, J. S. & Stadler, T. The origin and early spread of SARS-CoV-2 in Europe. Proceedings of the National Academy of Sciences 118, e2012008118 (2021).
16.
Romano, C. M. & Melo, F. L. Genomic surveillance of SARS-CoV-2: A race against time. The Lancet Regional Health - Americas 0, 100029 (2021).
17.↵
Volz, E. et al. Evaluating the Effects of SARS-CoV-2 Spike Mutation D614G on Transmissibility and Pathogenicity. Cell 184, 64–75.e11 (2021).
OpenUrl PubMed
18.↵
Dudas, G. et al. Virus genomes reveal factors that spread and sustained the Ebola epidemic. Nature (2017) doi:10.1038/nature22040.
OpenUrl CrossRef PubMed
19.↵
Faria, N. R. et al. Establishment and cryptic transmission of Zika virus in Brazil and the Americas. Nature 546, 406–410 (2017).
OpenUrl CrossRef PubMed
20.↵
Grubaugh, N. D. et al. Genomic epidemiology reveals multiple introductions of Zika virus into the United States. Nature 546, 401–405 (2017).
OpenUrl CrossRef PubMed
21.↵
Harvey, W. T. et al. SARS-CoV-2 variants, spike mutations and immune escape. Nature Reviews Microbiology 19, 409–424 (2021).
OpenUrl CrossRef
22.↵
Rambaut, A. et al. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nature Microbiology 5, 1403–1407 (2020).
OpenUrl
23.↵
Shu, Y. & McCauley, J. GISAID: Global initiative on sharing all influenza data - from vision to reality. Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin 22, 30494 (2017).
24.↵
Tsang, T. K., et al. Effect of changing case definitions for COVID-19 on the epidemic curve and transmission parameters in mainland China: a modelling study. The Lancet. Public health 5, e289–e296 (2020).
OpenUrl CrossRef
25.↵
de Souza, W. M. et al. Epidemiological and clinical characteristics of the COVID-19 epidemic in Brazil. Nature Human Behaviour 4, 856–865 (2020).
OpenUrl
26.↵
Dolan, P. T., Whitfield, Z. J. & Andino, R. Mapping the Evolutionary Potential of RNA Viruses. Cell Host and Microbe 23, 435–446 (2018).
OpenUrl
27.↵
Drummond, A. J., Rambaut, A., Shapiro, B. & Pybus, O. G. Bayesian Coalescent Inference of Past Population Dynamics from Molecular Sequences. Molecular Biology and Evolution 22, 1185–1192 (2005).
OpenUrl CrossRef PubMed Web of Science
28.↵
Gill, M. S. et al. Improving Bayesian population dynamics inference: a coalescent- based model for multiple loci. Molecular biology and evolution 30, 713–724 (2013).
OpenUrl CrossRef PubMed Web of Science
29.↵
Stadler, T., Kühnert, D., Bonhoeffer, S. & Drummond, A. J. Birth–death skyline plot reveals temporal changes of epidemic spread in HIV and hepatitis C virus (HCV). Proceedings of the National Academy of Sciences 110, 228 LP – 233 (2013).
30.↵
Hall, M. D., Woolhouse, M. E. J. & Rambaut, A. The effects of sampling strategy on the quality of reconstruction of viral population dynamics using Bayesian skyline family coalescent methods: A simulation study. Virus evolution 2, vew003–vew003 (2016).
31.↵
Parag, K. v, du Plessis, L. & Pybus, O. G. Jointly Inferring the Dynamics of Population Size and Sampling Intensity from Molecular Sequences. Molecular Biology and Evolution 37, 2414–2429 (2020).
OpenUrl CrossRef
32.
Stack, J. C., Welch, J. D., Ferrari, M. J., Shapiro, B. U. & Grenfell, B. T. Protocols for sampling viral sequences to study epidemic dynamics. Journal of the Royal Society, Interface 7, 1119–1127 (2010).
OpenUrl
33.
de Silva, E., Ferguson, N. M. & Fraser, C. Inferring pandemic growth rates from sequence data. Journal of The Royal Society Interface 9, 1797–1808 (2012).
OpenUrl
34.↵
Karcher, M. D., Palacios, J. A., Bedford, T., Suchard, M. A. & Minin, V. N. Quantifying and Mitigating the Effect of Preferential Sampling on Phylodynamic Inference. PLoS computational biology 12, e1004789–e1004789 (2016).
OpenUrl CrossRef
35.↵
Frost, S. D. W. et al. Eight challenges in phylodynamic inference. Epidemics 10, 88– 92 (2015).
OpenUrl CrossRef PubMed
36.↵
du Plessis, L. et al. Establishment and lineage dynamics of the SARS-CoV-2 epidemic in the UK. Science 371, 708–712 (2021).
OpenUrl Abstract/FREE Full Text
37.↵
Hidano, A. & Gates, M. C. Assessing biases in phylodynamic inferences in the presence of super-spreaders. Veterinary Research 50, 74 (2019).
38.↵
Gostic, K. M., et al. Practical considerations for measuring the effective reproductive number, Rt. PLOS Computational Biology 16, e1008409 (2020).
OpenUrl
39.↵
Pullano, G. et al. Underdetection of cases of COVID-19 in France threatens epidemic control. Nature 590, 134–139 (2021).
OpenUrl PubMed
40.
The World Bank. Population, total - Hong Kong SAR, China. https://data.worldbank.org/indicator/SP.POP.TOTL?locations=HK (2021).
41.
IBGE . Population Projections. https://www.ibge.gov.br/en/statistics/social/population.html (2020).
42.
Byrne, A. W. et al. Inferred duration of infectious period of SARS-CoV-2: Rapid scoping review and analysis of available evidence for asymptomatic and symptomatic COVID-19 cases. BMJ Open 10, 1–16 (2020).
OpenUrl CrossRef PubMed
43.
McAloon, C., et al. Incubation period of COVID-19: a rapid systematic review and meta-analysis of observational research. BMJ Open 10, e039652 (2020).
OpenUrl Abstract/FREE Full Text
44.↵
Parag, K. v. Improved estimation of time-varying reproduction numbers at low case incidence and between epidemic waves. PLOS Computational Biology 17, e1009347 (2021).
OpenUrl
45.↵
Fraser, C. Estimating Individual and Household Reproduction Numbers in an Emerging Epidemic. PLoS ONE 2, e758 (2007).
46.↵
Cori, A., Ferguson, N. M., Fraser, C. & Cauchemez, S. A New Framework and Software to Estimate Time-Varying Reproduction Numbers During Epidemics. American Journal of Epidemiology 178, 1505–1512 (2013).
OpenUrl CrossRef PubMed
47.↵
Wallinga, J. & Teunis, P. Different Epidemic Curves for Severe Acute Respiratory Syndrome Reveal Similar Impacts of Control Measures. American Journal of Epidemiology 160, 509–516 (2004).
OpenUrl CrossRef PubMed Web of Science
48.↵
Wallinga & Lipsitch. How generation intervals shape the relationship between growth rates and reproductive numbers. Proceedings of the Royal Society B: Biological Sciences 274, 599–604 (2007).
OpenUrl CrossRef PubMed Web of Science
49.↵
Rai, B., Shukla, A. & Dwivedi, L. K. Estimates of serial interval for COVID-19: A systematic review and meta-analysis. Clinical epidemiology and global health 9, 157– 161 (2021).
OpenUrl
50.↵
Prete, C. A. et al. Serial interval distribution of SARS-CoV-2 infection in Brazil. Journal of travel medicine 28, 1–3 (2021).
OpenUrl
51.↵
Katoh, K., Misawa, K., Kuma, K. & Miyata, T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Research 30, 3059–3066 (2002).
OpenUrl CrossRef PubMed Web of Science
52.↵
Hadfield, J. et al. Nextstrain: real-time tracking of pathogen evolution. Bioinformatics 34, 4121–4123 (2018).
OpenUrl CrossRef PubMed
53.↵
Minh, B. Q. et al. IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Molecular Biology and Evolution 37, 1530–1534 (2020).
OpenUrl CrossRef PubMed
54.↵
Kalyaanamoorthy, S., Minh, B. Q., Wong, T. K. F., von Haeseler, A. & Jermiin, L. S. ModelFinder: fast model selection for accurate phylogenetic estimates. Nature Methods 14, 587–589 (2017).
OpenUrl
55.↵
Tamura, K. & Nei, M. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Molecular Biology and Evolution 10, 512–526 (1993).
OpenUrl CrossRef PubMed Web of Science
56.↵
Anisimova, M., Gil, M., Dufayard, J.-F., Dessimoz, C. & Gascuel, O. Survey of branch support methods demonstrates accuracy, power, and robustness of fast likelihood-based approximation schemes. Systematic biology 60, 685–699 (2011).
OpenUrl CrossRef PubMed
57.
Rambaut, A., Lam, T. T., Max Carvalho, L. & Pybus, O. G. Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen). Virus evolution 2, vew007–vew007 (2016).
58.↵
World Health Organisation. Guidance for surveillance of SARS-CoV-2 variants Interim guidance. (2021).
59.↵
Suchard, M. A. et al. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus evolution 4, vey016–vey016 (2018).
60.↵
Ayres, D. L. et al. BEAGLE 3: Improved Performance, Scaling, and Usability for a High-Performance Computing Library for Statistical Phylogenetics. Systematic Biology 68, 1052–1061 (2019).
OpenUrl
61.↵
Hasegawa, M., Kishino, H. & Yano, T. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. Journal of Molecular Evolution 22, 160–174 (1985).
OpenUrl CrossRef PubMed Web of Science
62.↵
Hill, V. & Baele, G. Bayesian Estimation of Past Population Dynamics in BEAST 1.10 Using the Skygrid Coalescent Model. Molecular Biology and Evolution 36, 2620–2628 (2019).
OpenUrl
63.↵
Rambaut, A., Drummond, A. J., Xie, D., Baele, G. & Suchard, M. A. Posterior Summarization in Bayesian Phylogenetics Using Tracer 1.7. Systematic biology 67, 901–904 (2018).
OpenUrl CrossRef PubMed
64.↵
Bouckaert, R., et al. BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis. PLOS Computational Biology 15, e1006650 (2019).
OpenUrl
65.↵
Volz, E. M. & Didelot, X. Modeling the Growth and Decline of Pathogen Effective Population Size Provides Insight into Epidemic Dynamics and Drivers of Antimicrobial Resistance. Systematic Biology 67, 719–728 (2018).
OpenUrl CrossRef PubMed
66.↵
Lin, J. Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory 37, 145–151 (1991).
OpenUrl
67.↵
Kullback, S. & Leibler, R. A. On Information and Sufficiency. The Annals of Mathematical Statistics 22, 79–86 (1951).
OpenUrl CrossRef
68.↵
Cowling, B. J., et al. Impact assessment of non-pharmaceutical interventions against coronavirus disease 2019 and influenza in Hong Kong: an observational study. The Lancet Public Health 5, e279–e288 (2020).
OpenUrl
69.↵
Wu, P. et al. Suppressing COVID-19 Transmission in Hong Kong: An Observational Study of the First Four Months. SSRN (2020) doi:10.21203/rs.3.rs-34047/v1.
OpenUrl CrossRef
70.↵
Nascimento, V. A. do, et al. Genomic and phylogenetic characterisation of an imported case of SARS-CoV-2 in Amazonas State, Brazil. Memórias do Instituto Oswaldo Cruz 115, (2020).
71.↵
Faria, N. R., et al. Genomic characterisation of an emergent SARS-CoV-2 lineage in Manaus: preliminary findings. Virological https://virological.org/t/genomic-characterisation-of-an-emergent-sars-cov-2-lineage-in-manaus-preliminary-findings/586 (2021).
72.↵
Sabino, E. C. et al. Resurgence of COVID-19 in Manaus, Brazil, despite high seroprevalence. Lancet (London, England) 397, 452–455 (2021).
OpenUrl
73.↵
Drummond, A. J., Pybus, O. G., Rambaut, A., Forsberg, R. & Rodrigo, A. G. Measurably evolving populations. Trends in Ecology & Evolution 18, 481–488 (2003).
OpenUrl
74.↵
Zhao, S. et al. Preliminary estimation of the basic reproduction number of novel coronavirus (2019-nCoV) in China, from 2019 to 2020: A data-driven analysis in the early phase of the outbreak. International journal of infectious diseases : IJID : official publication of the International Society for Infectious Diseases 92, 214–217 (2020).
OpenUrl
75.↵
Andersen, K. G., Rambaut, A., Lipkin, W. I., Holmes, E. C. & Garry, R. F. The proximal origin of SARS-CoV-2. Nature Medicine (2020) doi:10.1038/s41591-020-0820-9.
OpenUrl CrossRef PubMed
76.↵
Naveca, F. G. et al. COVID-19 in Amazonas, Brazil, was driven by the persistence of endemic lineages and P.1 emergence. Nature Medicine 27, 1230–1238 (2021).
OpenUrl
77.↵
Boskova, V., Stadler, T. & Magnus, C. The influence of phylodynamic model specifications on parameter estimates of the Zika virus epidemic. Virus Evolution 4, (2018).
78.↵
Liu, Q. et al. Population Genetics of SARS-CoV-2: Disentangling Effects of Sampling Bias and Infection Clusters. Genomics, Proteomics & Bioinformatics 18, 640–647 (2020).
OpenUrl
79.↵
Adam, D. C. et al. Clustering and superspreading potential of SARS-CoV-2 infections in Hong Kong. Nature Medicine 26, 1714–1719 (2020).
OpenUrl PubMed
80.↵
Vasylyeva, T. I. et al. Phylodynamics helps to evaluate the impact of an HIV prevention intervention. Viruses 12, 1–15 (2020).
OpenUrl CrossRef PubMed
81.↵
Volz, E. M., Koelle, K. & Bedford, T. Viral phylodynamics. PLoS computational biology 9, e1002947–e1002947 (2013).
OpenUrl
82.↵
Buss, L. et al. Three-quarters attack rate of SARS-CoV-2 in the Brazilian Amazon during a largely unmitigated epidemic. Science 371, 288–292 (2021).
OpenUrl Abstract/FREE Full Text

View the discussion thread.

Posted March 16, 2022.

Download PDF

Data/Code

Citation Tools

Subject Area

Epidemiology

Subject Areas

All Articles

Addiction Medicine (400)
Allergy and Immunology (711)
Anesthesia (204)
Cardiovascular Medicine (2961)
Dentistry and Oral Medicine (335)
Dermatology (250)
Emergency Medicine (443)
Endocrinology (including Diabetes Mellitus and Metabolic Disease) (1048)
Epidemiology (12769)
Forensic Medicine (12)
Gastroenterology (829)
Genetic and Genomic Medicine (4604)
Geriatric Medicine (421)
Health Economics (731)
Health Informatics (2935)
Health Policy (1069)
Health Systems and Quality Improvement (1088)
Hematology (390)
HIV/AIDS (927)
Infectious Diseases (except HIV/AIDS) (14120)
Intensive Care and Critical Care Medicine (850)
Medical Education (429)
Medical Ethics (116)
Nephrology (472)
Neurology (4384)
Nursing (237)
Nutrition (641)
Obstetrics and Gynecology (813)
Occupational and Environmental Health (737)
Oncology (2282)
Ophthalmology (648)
Orthopedics (258)
Otolaryngology (326)
Pain Medicine (279)
Palliative Medicine (83)
Pathology (502)
Pediatrics (1199)
Pharmacology and Therapeutics (507)
Primary Care Research (499)
Psychiatry and Clinical Psychology (3781)
Public and Global Health (6969)
Radiology and Imaging (1537)
Rehabilitation Medicine and Physical Therapy (910)
Respiratory Medicine (917)
Rheumatology (442)
Sexual and Reproductive Health (445)
Sports Medicine (385)
Surgery (491)
Toxicology (60)
Transplantation (212)
Urology (182)

[1] 1.↵
Gorbalenya, A. E. et al. The species Severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2. Nature Microbiology 5, 536–544 (2020).
OpenUrl

[2] 2.↵
Zhu, N. et al. A Novel Coronavirus from Patients with Pneumonia in China, 2019. New England Journal of Medicine 382, 727–733 (2020).
OpenUrl CrossRef PubMed

[3] 3.↵
World Health Organisation. Public Health Emergency of International Concern (PHEIC). (2020).

[4] 4.↵
Verity, R. et al. Estimates of the severity of coronavirus disease 2019: a model-based analysis. The Lancet. Infectious diseases 20, 669–677 (2020).
OpenUrl CrossRef PubMed

[5] 5.↵
World Health Organisation. Coronavirus disease (COVID-19) Weekly Epidemiological Update and Weekly Operational Update. https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports (2022).

[6] 6.↵
European Centre for Disease Prevention and Control. Guidelines for the implementation of non-pharmaceutical interventions against COVID-19 Key messages General considerations on NPI to control COVID-19. (2020).

[7] 7.↵
Anderson, Vegari, C., Baggaley, R., Hollingsworth, T. D. D. & Maddren, R. The Royal Society SET-C Reports. Reproduction number (R) and growth rate (r) of the COVID-19 epidemic in the UK: methods of estimation, data sources, causes of heterogeneity, and use as a guide in policy formulation [report unpublished]. The Royal Society 1–86 (2020).

[8] 8.↵
UK Health Security Agency. The R value and growth rate. https://www.gov.uk/guidance/the-r-value-and-growth-rate (2022).

[9] 9.↵
Parag, K. v, Thompson, R. N. & Donnelly, C. A. Are epidemic growth rates more informative than reproduction numbers? medRxiv 2021.04.15.21255565 (2021) doi:10.1101/2021.04.15.21255565.
OpenUrl Abstract/FREE Full Text

[10] 10.↵
Dushoff, J. & Park, S. W. Speed and strength of an epidemic intervention. Proceedings of the Royal Society B: Biological Sciences 288, 20201556 (2021).

[11] 11.↵
World Health Organisation. Genomic sequencing of SARS-CoV-2 A guide to implementation for maximum impact on public health. (2021).

[12] 12.↵
Jombart, T., et al. Bayesian Reconstruction of Disease Outbreaks by Combining Epidemiologic and Genomic Data. PLOS Computational Biology 10, e1003457-(2014).
OpenUrl

[13] 13.↵
Duchene, S. et al. Temporal signal and the phylodynamic threshold of SARS-CoV-2. Virus Evolution 6, (2020).

[14] 14.↵
Faria, N. R. et al. Genomics and epidemiology of the P.1 SARS-CoV-2 lineage in Manaus, Brazil. Science 372, 815 LP – 821 (2021).
OpenUrl

[15] 15.↵
Nadeau, S. A., Vaughan, T. G., Scire, J., Huisman, J. S. & Stadler, T. The origin and early spread of SARS-CoV-2 in Europe. Proceedings of the National Academy of Sciences 118, e2012008118 (2021).

[16] 16.
Romano, C. M. & Melo, F. L. Genomic surveillance of SARS-CoV-2: A race against time. The Lancet Regional Health - Americas 0, 100029 (2021).

[17] 17.↵
Volz, E. et al. Evaluating the Effects of SARS-CoV-2 Spike Mutation D614G on Transmissibility and Pathogenicity. Cell 184, 64–75.e11 (2021).
OpenUrl PubMed

[18] 18.↵
Dudas, G. et al. Virus genomes reveal factors that spread and sustained the Ebola epidemic. Nature (2017) doi:10.1038/nature22040.
OpenUrl CrossRef PubMed

[19] 19.↵
Faria, N. R. et al. Establishment and cryptic transmission of Zika virus in Brazil and the Americas. Nature 546, 406–410 (2017).
OpenUrl CrossRef PubMed

[20] 20.↵
Grubaugh, N. D. et al. Genomic epidemiology reveals multiple introductions of Zika virus into the United States. Nature 546, 401–405 (2017).
OpenUrl CrossRef PubMed

[21] 21.↵
Harvey, W. T. et al. SARS-CoV-2 variants, spike mutations and immune escape. Nature Reviews Microbiology 19, 409–424 (2021).
OpenUrl CrossRef

[22] 22.↵
Rambaut, A. et al. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nature Microbiology 5, 1403–1407 (2020).
OpenUrl

[23] 23.↵
Shu, Y. & McCauley, J. GISAID: Global initiative on sharing all influenza data - from vision to reality. Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin 22, 30494 (2017).

[24] 24.↵
Tsang, T. K., et al. Effect of changing case definitions for COVID-19 on the epidemic curve and transmission parameters in mainland China: a modelling study. The Lancet. Public health 5, e289–e296 (2020).
OpenUrl CrossRef

[25] 25.↵
de Souza, W. M. et al. Epidemiological and clinical characteristics of the COVID-19 epidemic in Brazil. Nature Human Behaviour 4, 856–865 (2020).
OpenUrl

[26] 26.↵
Dolan, P. T., Whitfield, Z. J. & Andino, R. Mapping the Evolutionary Potential of RNA Viruses. Cell Host and Microbe 23, 435–446 (2018).
OpenUrl

[27] 27.↵
Drummond, A. J., Rambaut, A., Shapiro, B. & Pybus, O. G. Bayesian Coalescent Inference of Past Population Dynamics from Molecular Sequences. Molecular Biology and Evolution 22, 1185–1192 (2005).
OpenUrl CrossRef PubMed Web of Science

[28] 28.↵
Gill, M. S. et al. Improving Bayesian population dynamics inference: a coalescent- based model for multiple loci. Molecular biology and evolution 30, 713–724 (2013).
OpenUrl CrossRef PubMed Web of Science

[29] 29.↵
Stadler, T., Kühnert, D., Bonhoeffer, S. & Drummond, A. J. Birth–death skyline plot reveals temporal changes of epidemic spread in HIV and hepatitis C virus (HCV). Proceedings of the National Academy of Sciences 110, 228 LP – 233 (2013).

[30] 30.↵
Hall, M. D., Woolhouse, M. E. J. & Rambaut, A. The effects of sampling strategy on the quality of reconstruction of viral population dynamics using Bayesian skyline family coalescent methods: A simulation study. Virus evolution 2, vew003–vew003 (2016).

[31] 31.↵
Parag, K. v, du Plessis, L. & Pybus, O. G. Jointly Inferring the Dynamics of Population Size and Sampling Intensity from Molecular Sequences. Molecular Biology and Evolution 37, 2414–2429 (2020).
OpenUrl CrossRef

[32] 32.
Stack, J. C., Welch, J. D., Ferrari, M. J., Shapiro, B. U. & Grenfell, B. T. Protocols for sampling viral sequences to study epidemic dynamics. Journal of the Royal Society, Interface 7, 1119–1127 (2010).
OpenUrl

[33] 33.
de Silva, E., Ferguson, N. M. & Fraser, C. Inferring pandemic growth rates from sequence data. Journal of The Royal Society Interface 9, 1797–1808 (2012).
OpenUrl

[34] 34.↵
Karcher, M. D., Palacios, J. A., Bedford, T., Suchard, M. A. & Minin, V. N. Quantifying and Mitigating the Effect of Preferential Sampling on Phylodynamic Inference. PLoS computational biology 12, e1004789–e1004789 (2016).
OpenUrl CrossRef

[35] 35.↵
Frost, S. D. W. et al. Eight challenges in phylodynamic inference. Epidemics 10, 88– 92 (2015).
OpenUrl CrossRef PubMed

[36] 36.↵
du Plessis, L. et al. Establishment and lineage dynamics of the SARS-CoV-2 epidemic in the UK. Science 371, 708–712 (2021).
OpenUrl Abstract/FREE Full Text

[37] 37.↵
Hidano, A. & Gates, M. C. Assessing biases in phylodynamic inferences in the presence of super-spreaders. Veterinary Research 50, 74 (2019).

[38] 38.↵
Gostic, K. M., et al. Practical considerations for measuring the effective reproductive number, Rt. PLOS Computational Biology 16, e1008409 (2020).
OpenUrl

[39] 39.↵
Pullano, G. et al. Underdetection of cases of COVID-19 in France threatens epidemic control. Nature 590, 134–139 (2021).
OpenUrl PubMed

[40] 40.
The World Bank. Population, total - Hong Kong SAR, China. https://data.worldbank.org/indicator/SP.POP.TOTL?locations=HK (2021).

[41] 41.
IBGE . Population Projections. https://www.ibge.gov.br/en/statistics/social/population.html (2020).

[42] 42.
Byrne, A. W. et al. Inferred duration of infectious period of SARS-CoV-2: Rapid scoping review and analysis of available evidence for asymptomatic and symptomatic COVID-19 cases. BMJ Open 10, 1–16 (2020).
OpenUrl CrossRef PubMed

[43] 43.
McAloon, C., et al. Incubation period of COVID-19: a rapid systematic review and meta-analysis of observational research. BMJ Open 10, e039652 (2020).
OpenUrl Abstract/FREE Full Text

[44] 44.↵
Parag, K. v. Improved estimation of time-varying reproduction numbers at low case incidence and between epidemic waves. PLOS Computational Biology 17, e1009347 (2021).
OpenUrl

[45] 45.↵
Fraser, C. Estimating Individual and Household Reproduction Numbers in an Emerging Epidemic. PLoS ONE 2, e758 (2007).

[46] 46.↵
Cori, A., Ferguson, N. M., Fraser, C. & Cauchemez, S. A New Framework and Software to Estimate Time-Varying Reproduction Numbers During Epidemics. American Journal of Epidemiology 178, 1505–1512 (2013).
OpenUrl CrossRef PubMed

[47] 47.↵
Wallinga, J. & Teunis, P. Different Epidemic Curves for Severe Acute Respiratory Syndrome Reveal Similar Impacts of Control Measures. American Journal of Epidemiology 160, 509–516 (2004).
OpenUrl CrossRef PubMed Web of Science

[48] 48.↵
Wallinga & Lipsitch. How generation intervals shape the relationship between growth rates and reproductive numbers. Proceedings of the Royal Society B: Biological Sciences 274, 599–604 (2007).
OpenUrl CrossRef PubMed Web of Science

[49] 49.↵
Rai, B., Shukla, A. & Dwivedi, L. K. Estimates of serial interval for COVID-19: A systematic review and meta-analysis. Clinical epidemiology and global health 9, 157– 161 (2021).
OpenUrl

[50] 50.↵
Prete, C. A. et al. Serial interval distribution of SARS-CoV-2 infection in Brazil. Journal of travel medicine 28, 1–3 (2021).
OpenUrl

[51] 51.↵
Katoh, K., Misawa, K., Kuma, K. & Miyata, T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Research 30, 3059–3066 (2002).
OpenUrl CrossRef PubMed Web of Science

[52] 52.↵
Hadfield, J. et al. Nextstrain: real-time tracking of pathogen evolution. Bioinformatics 34, 4121–4123 (2018).
OpenUrl CrossRef PubMed

[53] 53.↵
Minh, B. Q. et al. IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Molecular Biology and Evolution 37, 1530–1534 (2020).
OpenUrl CrossRef PubMed

[54] 54.↵
Kalyaanamoorthy, S., Minh, B. Q., Wong, T. K. F., von Haeseler, A. & Jermiin, L. S. ModelFinder: fast model selection for accurate phylogenetic estimates. Nature Methods 14, 587–589 (2017).
OpenUrl

[55] 55.↵
Tamura, K. & Nei, M. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Molecular Biology and Evolution 10, 512–526 (1993).
OpenUrl CrossRef PubMed Web of Science

[56] 56.↵
Anisimova, M., Gil, M., Dufayard, J.-F., Dessimoz, C. & Gascuel, O. Survey of branch support methods demonstrates accuracy, power, and robustness of fast likelihood-based approximation schemes. Systematic biology 60, 685–699 (2011).
OpenUrl CrossRef PubMed

[57] 57.
Rambaut, A., Lam, T. T., Max Carvalho, L. & Pybus, O. G. Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen). Virus evolution 2, vew007–vew007 (2016).

[58] 58.↵
World Health Organisation. Guidance for surveillance of SARS-CoV-2 variants Interim guidance. (2021).

[59] 59.↵
Suchard, M. A. et al. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus evolution 4, vey016–vey016 (2018).

[60] 60.↵
Ayres, D. L. et al. BEAGLE 3: Improved Performance, Scaling, and Usability for a High-Performance Computing Library for Statistical Phylogenetics. Systematic Biology 68, 1052–1061 (2019).
OpenUrl

[61] 61.↵
Hasegawa, M., Kishino, H. & Yano, T. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. Journal of Molecular Evolution 22, 160–174 (1985).
OpenUrl CrossRef PubMed Web of Science

[62] 62.↵
Hill, V. & Baele, G. Bayesian Estimation of Past Population Dynamics in BEAST 1.10 Using the Skygrid Coalescent Model. Molecular Biology and Evolution 36, 2620–2628 (2019).
OpenUrl

[63] 63.↵
Rambaut, A., Drummond, A. J., Xie, D., Baele, G. & Suchard, M. A. Posterior Summarization in Bayesian Phylogenetics Using Tracer 1.7. Systematic biology 67, 901–904 (2018).
OpenUrl CrossRef PubMed

[64] 64.↵
Bouckaert, R., et al. BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis. PLOS Computational Biology 15, e1006650 (2019).
OpenUrl

[65] 65.↵
Volz, E. M. & Didelot, X. Modeling the Growth and Decline of Pathogen Effective Population Size Provides Insight into Epidemic Dynamics and Drivers of Antimicrobial Resistance. Systematic Biology 67, 719–728 (2018).
OpenUrl CrossRef PubMed

[66] 66.↵
Lin, J. Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory 37, 145–151 (1991).
OpenUrl

[67] 67.↵
Kullback, S. & Leibler, R. A. On Information and Sufficiency. The Annals of Mathematical Statistics 22, 79–86 (1951).
OpenUrl CrossRef

[68] 68.↵
Cowling, B. J., et al. Impact assessment of non-pharmaceutical interventions against coronavirus disease 2019 and influenza in Hong Kong: an observational study. The Lancet Public Health 5, e279–e288 (2020).
OpenUrl

[69] 69.↵
Wu, P. et al. Suppressing COVID-19 Transmission in Hong Kong: An Observational Study of the First Four Months. SSRN (2020) doi:10.21203/rs.3.rs-34047/v1.
OpenUrl CrossRef

[70] 70.↵
Nascimento, V. A. do, et al. Genomic and phylogenetic characterisation of an imported case of SARS-CoV-2 in Amazonas State, Brazil. Memórias do Instituto Oswaldo Cruz 115, (2020).

[71] 71.↵
Faria, N. R., et al. Genomic characterisation of an emergent SARS-CoV-2 lineage in Manaus: preliminary findings. Virological https://virological.org/t/genomic-characterisation-of-an-emergent-sars-cov-2-lineage-in-manaus-preliminary-findings/586 (2021).

[72] 72.↵
Sabino, E. C. et al. Resurgence of COVID-19 in Manaus, Brazil, despite high seroprevalence. Lancet (London, England) 397, 452–455 (2021).
OpenUrl

[73] 73.↵
Drummond, A. J., Pybus, O. G., Rambaut, A., Forsberg, R. & Rodrigo, A. G. Measurably evolving populations. Trends in Ecology & Evolution 18, 481–488 (2003).
OpenUrl

[74] 74.↵
Zhao, S. et al. Preliminary estimation of the basic reproduction number of novel coronavirus (2019-nCoV) in China, from 2019 to 2020: A data-driven analysis in the early phase of the outbreak. International journal of infectious diseases : IJID : official publication of the International Society for Infectious Diseases 92, 214–217 (2020).
OpenUrl

[75] 75.↵
Andersen, K. G., Rambaut, A., Lipkin, W. I., Holmes, E. C. & Garry, R. F. The proximal origin of SARS-CoV-2. Nature Medicine (2020) doi:10.1038/s41591-020-0820-9.
OpenUrl CrossRef PubMed

[76] 76.↵
Naveca, F. G. et al. COVID-19 in Amazonas, Brazil, was driven by the persistence of endemic lineages and P.1 emergence. Nature Medicine 27, 1230–1238 (2021).
OpenUrl

[77] 77.↵
Boskova, V., Stadler, T. & Magnus, C. The influence of phylodynamic model specifications on parameter estimates of the Zika virus epidemic. Virus Evolution 4, (2018).

[78] 78.↵
Liu, Q. et al. Population Genetics of SARS-CoV-2: Disentangling Effects of Sampling Bias and Infection Clusters. Genomics, Proteomics & Bioinformatics 18, 640–647 (2020).
OpenUrl

[79] 79.↵
Adam, D. C. et al. Clustering and superspreading potential of SARS-CoV-2 infections in Hong Kong. Nature Medicine 26, 1714–1719 (2020).
OpenUrl PubMed

[80] 80.↵
Vasylyeva, T. I. et al. Phylodynamics helps to evaluate the impact of an HIV prevention intervention. Viruses 12, 1–15 (2020).
OpenUrl CrossRef PubMed

[81] 81.↵
Volz, E. M., Koelle, K. & Bedford, T. Viral phylodynamics. PLoS computational biology 9, e1002947–e1002947 (2013).
OpenUrl

[82] 82.↵
Buss, L. et al. Three-quarters attack rate of SARS-CoV-2 in the Brazilian Amazon during a largely unmitigated epidemic. Science 371, 288–292 (2021).
OpenUrl Abstract/FREE Full Text

Using multiple sampling strategies to estimate SARS-CoV-2 epidemiological parameters from genomic sequencing data

ABSTRACT

INTRODUCTION

METHODS

Empirical Estimation of the Reproduction Number, Time-varying Effective Reproduction Number, and Growth Rate

Epidemiological Datasets

Basic Reproduction Number

Time-varying Effective Reproduction Number

Growth Rate

SARS-CoV-2 Brazilian Gamma VOC and Hong Kong datasets

Maximum Likelihood tree reconstruction

Root-to-tip regression

Subsampling for analysis

Bayesian Evolutionary Analysis

Phylodynamic Reconstruction

Estimation of the Basic and Time-varying Effective Reproduction Numbers

Estimation of Growth Rates

Comparing Parameter Estimates from Genetic and Epidemiological Data

Data availability

RESULTS

Sampling Schemes

Hong Kong

Amazonas

Root-to-tip Regression

Estimation of Evolutionary Parameters

Hong Kong

Brazil

Estimation of Basic Reproduction Number

Time-varying Reproduction number and Growth rate

Hong Kong

Brazil

Discussion

Role of the Funding Sources

CRediT authorship contribution statement

Data Availability

Supplementary Figures and Tables

Footnotes

References

Citation Manager Formats

Subject Area