Abstract
In December 2020, the United Kingdom (UK) reported a SARS-CoV-2 Variant of Concern (VoC) which is now coined B.1.1.7. Based on the UK data and later additional data from other countries, a transmission advantage of around 40-80% was estimated for this variant [1, 2, 3].
In Switzerland, since spring 2020, we perform whole genome sequencing of SARS-CoV-2 samples obtained from a large diagnostic lab (Viollier AG) on a weekly basis for genomic surveillance. The lab processes SARS-CoV-2 samples from across Switzerland. Based on a total of 7631 sequences obtained from samples collected between 14.12.2020 and 11.02.2021 at Viollier AG, we determine the relative proportion of the B.1.1.7 variant on a daily basis. In addition, we use data from a second lab (Dr Risch) screening all their samples for the B.1.1.7 variant. These two datasets represent 11.5 % of all SARS-CoV-2 confirmed cases across Switzerland during the considered time period. They allow us to quantify the transmission advantage of the B.1.1.7 variant on a national and a regional scale.
Taking all our data and estimates together, we propose a transmission advantage of 49-65% of B.1.1.7 compared to the other circulating variants. Further, we estimate the effective reproductive number through time for B.1.1.7 and the other variants, again pointing to a higher transmission rate of B.1.1.7. In particular, for the time period 01.01.2021-17.01.2021, we estimate an average reproductive number for B.1.1.7 of 1.28 [1.07-1.49] while the estimate for the other variants is 0.83 [0.63-1.03], based on the total number of confirmed cases and our Viollier sequencing data. Switzerland tightened measures on 18.01.2021. A comparison of the empirical confirmed case numbers up to 20.02.2021 to a very simple model using the estimates of the reproductive number from the first half of January provides indication that the rate of spread of all variants slowed down recently.
In summary, the dynamics of increase in frequency of B.1.1.7 is as expected based on the observations in the UK. Our plots are available online and constantly updated with new data to closely monitor the changes in absolute numbers.
1 Introduction
Reports of an increased transmissibility of SARS-CoV-2 variant B.1.1.7 (501Y.V1) were released in mid-December 2020 [4, 5, 6]. This variant carries the N501Y mutation in the spike protein which may increase the ACE2 receptor affinity [7]. Within only a few months, the variant was able to become the dominant lineage in the UK.
Since these first reports, great efforts were made in Switzerland to detect and trace B.1.1.7 [8]. The first cases of B.1.1.7 were confirmed on 24.12.2020 and retrospective analyses identified B.1.1.7 in samples dating back to October [8]. In total, 1370 infections with B.1.1.7 were confirmed up to 05.02.2021 [8].
An increase in prevalence does not necessarily imply a transmission fitness advantage of the virus. For example, a variant termed 20A.EU1 spread rapidly during summer 2020 across Europe. However, data suggests that extended travel and superspreading events rather than a viral transmission advantage caused that spread [9].
However, the variant B.1.1.7 rapidly increased in frequency in many high-prevalence regions across the UK. Such re-occuring patterns are hard to explain without a transmission advantage. Analyses of the B.1.1.7 variant in the UK suggest that the variant has a transmission fitness advantage anywhere between 40 and 80% [1, 2, 3]. Davies et al [3] further uses data from Denmark obtaining again similar estimates. Quantitative analysis of data on the spread of the 501Y mutation in Switzerland also suggests a transmission advantage in roughly that range [10].
In this paper, we collected whole genome sequencing data based on samples from Viollier AG to determine the proportion of B.1.1.7 through time. Further we used data from the diagnostic laboratory Dr Risch which screens all their samples for the B.1.1.7 variant. We then used this data to quantify the transmission fitness advantage of B.1.1.7 for Switzerland as well as for the seven Swiss economic regions (Grossregionen). We further calculated the reproductive number for B.1.1.7 and show how the B.1.1.7 case numbers developed through time.
The core plots presented here will be updated regularly on [11] and part of it is currently incorporated into the Swiss National COVID-19 Science Task Force website [12]. The code and data used for this paper is publically available on Github [13]. All sequences based on samples fro Viollier are available on GISAID.
2 Method
2.1 Data
The main source of data are whole genome sequences based on samples from Viollier AG. Viollier AG is a large diagnostic lab processing SARS-CoV-2 samples from across Switzerland. Every week, SARS-CoV-2 samples were randomly selected for sequencing among all positively tested samples at the Viollier AG. For each sample, we know the date of the test and the canton in which the test was performed.
We perform whole genome sequencing in three facilities. The sequencing protocol for the samples sequenced at the Genomics Facility Basel and the Functional Genomics Center Zurich is described in [14]. Samples processed at the Health 2030 Genome Center used the Illumina COVIDSeq library preparation reagents following the protocol provided by the supplier [15]. These reagents are based on the ARTIC v3 multiplex PCR amplicon protocol [16]. When sufficient volume was available, 8.5ul of RNA extracted from patient nasopharyngeal samples were used in the cDNA synthesis step; if 8.5ul were not available, the maximum volume possible was used. Pooled libraries were sequenced on the Illumina NovaSeq 6000 using a 50-nucleotide pair-end run configuration. Post-sequencing library read de-multiplexing was done using an in-house developed processing pipeline [17].
The downstream bioinformatics procedure to obtain consensus sequences is described in [14, 18]. In the time frame of interest – from 14.12.2020 to 11.02.2021, a total of 7631 sequences were generated and checked for the B.1.1.7 variant. Thus, per week we produced around 900 sequences. In the same time period, 154’241 people were confirmed with a SARS-CoV-2 infection, thus we provide sequences for 4.9% of all cases. We define a sequence to be a B.1.1.7 sample if at least 80% of the lineage-defining, non-synonymous nucleotide changes according to [4] are present. The dataset used downstream is the number of sequences per day and the number of identified B.1.1.7 variants per day.
Additionally, we have data from the Dr Risch medical laboratories. Dr Risch medical laboratories used the Taqpath assay from Thermofisher for their diagnostic and recorded the S gene target failure (SGTF). SGTF samples are potential B.1.1.7 variants, as the B.1.1.7 variant causes a SGTF due to its deletion at position 69-70 in the spike protein. Further, Dr Risch medical laboratories screen their samples for the 501Y mutation by a variant-specific PCR test. If a sample is identified as a potential VoC by these procedures, it is sent for whole genome sequencing to the University Hospital Basel in order to confirm the B.1.1.7 variant. The sequencing protocol is described in [19]. For the recent samples, the confirmation may still be outstanding. However, since typically a SGTF plus a 501Y mutation corresponds indeed to a B.1.1.7 variant, we consider these samples as B.1.1.7 variants already prior to the results of the whole genome sequencing confirmation.
When the initial screening for B.1.1.7 in Switzerland was started just before Christmas 2020, the first approach to identify the variants was by specific amplification of the indicative Spike region followed by in house Sanger sequencing at the national reference center for emerging viral infections (CRIVE) in Geneva.
Overall, in our considered time period, 10175 samples were screened from Dr Risch. In the final dataset, we record the daily number of positive tests at Dr Risch and the number of B.1.1.7 variants identified.
Taking both datasets together, 11.5% of all SARS-CoV-2 cases during our considered time period (14.12.2020 to 11.02.2021) were screened for the B.1.1.7 variant. While Viollier processes samples from all over Switzerland, the intensity varies across regions. The set of sequenced samples inherits this uneven geographical distribution with the result that, relatively to the number of all cases, over ten times more samples were sequenced from the region Nordwestschweiz than from the region Ticino (Table 1). Dr Risch’s data has a different geographic distribution than Viollier and, for example, a much better coverage of Eastern Switzerland. In summary, both datasets differ in their geographic biases. We present results on the different datasets and compare these results.
2.2 Estimating the transmission fitness advantage of a new variant
In what follows, we define two models describing the dynamics with which a new variant with a transmission fitness advantage spreads in a population. The first model is based on the assumption of discrete time, while the second model is based on the assumption of continuous time. Both models have been considered extensively in the literature to estimate fitness advantage (see e.g. [20, 3]). While in an epidemic, a generation does not end after a fixed time span (discrete time model), generations are typically less variable than modelled under an exponential distribution (continuous time model). Thus we view the two models as two extremes with the actual dynamics being described by a process in between. We provide estimates based on both models and suggest that the true parameter may be anywhere within the ranges spanned by the two models. In the next sections, we provide details of how we estimate the transmission fitness advantage of B.1.1.7 based on daily data of the total number of samples and B.1.1.7 samples under these two models.
2.2.1 Discrete time model
We call X the common (non-B.1.1.7) variants and Y the B.1.1.7 variant. The process starts in generation 0 with x0 cases caused by variant X and y0 cases caused by variant Y. Let the number of cases in generation n be x(n) and y(n) for variants X and Y.
Let the reproductive number Rd of variant X in generation n be Rd(n). Let the transmission advantage of variant Y be fd. Then the reproductive number of Y in generation n is (1 + fd)Rd(n). Thus, we assume a multiplicative fitness advantage.
We have x(n) = x(0)×Rd(0)Rd(1)… Rd(n−1) and y(n) = y(0)×Rd(0)Rd(1)… Rd(n−1)(1+fd)n. If Rd is constant through time, we have
Let the proportion of variant Y at generation n be p(n). We have,
Thus, p(n) is the logistic function. It does not depend on Rd.
If we write time in days t rather than generations n and assume a generation time of g days, we get
We now switch our parameterization to the more common for parameter estimation from daily data. Parameter a is the logistic growth rate and parameter t0 the sigmoid’s midpoint.
The two free parameters, a and t0, are related to the two free parameters in Equation 1, fd and p(0), as follows:
In particular, we get fd = eag − 1.
2.2.2 Continuous time model
In continuous time, instead of Rd and generation time g, we define the transmission rate β and the recovery rate µ. Under this model, the reproductive number is Rc = β/µ. Further, since an individual in the discrete model recovers after a generation of duration g (during which they left Rd offspring), we note that g is related to the expected time to recovery 1/µ in the continuous model, and in fact assume g = 1/µ in what follows. Again our initial numbers of the variants X and Y are x(0) and y(0). Calendar time is denoted by a continuous parameter t. We then have in expectation,
We note that β − µ is coined the Malthusian growth parameter [3].
Further we again assume that variant Y has a transmission fitness advantage of fc, with transmission rate β(1 + fc) and recovery rate µ. The population size of the variant at time t is thus
The proportion of the variant in the population at time t is where we again recognize the logistic function. We turn again to the more common parameterization, where we thus have .
The reproductive number is Rc = β/µ and the mean time to recovery, 1/µ, is equaled to g. Then, β = Rc/g. Thus,
In particular, we have . Note that the estimated fitness advantage under this model depends on the reproductive number Rc and is thus changing if Rc is changing through time.
2.2.3 Connection between discrete and continuous time
The discrete and continuous models are very similar. Both have the intitial conditions x(0) and y(0). For the dynamics, the discrete model has parameters Rd and g while the continuous model has parameters β and µ. We have Rc = β/µ and we further assumed that g = 1/µ (1/µ is the expected time until recovery in the continuous setting while g is the time to recovery in the discrete setting). The different parameterizations of fitness advantage are coined fc and fd. We now determine how fc and fd are related.
To compare the two models, we now assume that their overall dynamics for the variant X are the same. After n generations of duration g, we have for variant X,
Using a Taylor expansion for β/µ close to 1, we obtain that indeed Rd = Rc.
For the two models to produce the same growth also for variant Y, we require,
In the last step, we make use of Rc = β/µ and g = 1/µ.
This is equivalent to . Using a Taylor expansion we get fd = 1+Rcfc+O((Rcfc)2)−1 and thus fd = Rcfc for small Rcfc.
2.2.4 Maximum likelihood parameter estimation
Next we explain how we estimate a and t0 of the logistic functions (Eqn. 2 and 5) from our data using maximum likelihood. We consider that we have data at times t1,…, td. At time ti, we obtained ni samples, where ni is fixed, non-random.
We assume that the true number of B.1.1.7 variants at time ti is a random variable, Ki which is binomially distributed with parameter p(ti), i.e.
In particular we assume here a deterministic logistic growth model for the increase in the proportion of variant Y (Eqn. 2 and 5), on top of which only the drawing process is random. This model simplifies naturally to a very popular logistic regression. This is an instance of a Generalized Linear Model, where the natural parameter of the binomial distribution is a linear function of predictors, the only predictor considered here being the time t.
We use the Python library statsmodel [21] to recover maximum likelihood estimates (MLEs) and confidence intervals based on an asymptotic Gaussian distribution for the parameters of the logistic regression based on our data, i.e. the fixed values t1,…, td, n1,…, nd as well as the numbers of samples at each time point being the variant B.1.1.7, k1,…, kd. Parameters a, t0, fd, fc as well as the proportions of variant B.1.1.7 p(t) through time are simple transformations of the parameters of the logistic regression. Their MLEs are the same transformations applied to the MLEs of the logistic regression parameters. The difference between the MLEs and the true parameters are again Gaussian, with a covariance matrix found by applying the delta method. This is used to construct confidence intervals for all these quantities.
2.3 Estimation of the effective reproductive number
We use the number of confirmed cases per day from the Federal Office of Public Health, Switzerland, for 14.12.2020 to 11.02.2021. Then, for each day, we estimate the number of B.1.1.7 variants by multiplying the total number of confirmed cases by the proportion of B.1.1.7 in our dataset (Viollier or Risch). We then estimate an effective reproductive number of the B.1.1.7 variant and of the non-B.1.1.7 variants using these data. For this estimation, we use the method developed in [22]. This method consists of two main parts: first, the observed case data is related to the corresponding time series of infections. We smooth the observations using LOESS smoothing to remove weekend effects. Then, we deconvolve with the delay of infection to symptom onset (gamma distributed with mean 5.3 and sd 3.2) and the delay from symptom onset to case confirmation (gamma distributed with mean 5.5 and sd 3.8). Second, we estimate the effective reproductive number from the time series of infection incidence using EpiEstim [23]. The reported point estimate is the estimate on the original case data. To account for uncertainty in the observation process, the observed daily case incidences are additionally bootstrapped 1000 times, resulting in an ensemble of alternative case incidence time series and corresponding estimated effective reproductive numbers. These are used to construct the 95% confidence interval around the effective reproductive number, and to calculate the standard deviation of the ratios of effective reproductive number estimates (see below).
We perform the estimation of the reproductive number in two different ways. First, we estimate smooth changes in the reproductive number, by estimating it across the entire time series using a 3-day sliding window. Second, we assume the reproductive number was constant during time intervals in which the non-pharmaceutical interventions did not change. Since 18.01.2021, Switzerland has implemented a set of tighter measures (in particular, shops are closed and the size of gatherings is restricted to five people [24]). Thus we fix the reproductive number to be constant between 01.01.2021 and 17.01.2021. Then the reproductive number is allowed to change and again fixed to be constant from 18.01.2021 onwards.
To compare the effective reproductive number R of the B.1.1.7 variant (Y) to that of non-B.1.1.7 variants (X), we take the ratio at every time point. The standard deviation of this ratio σρ was found through Gaussian error propagation of the standard deviation of the individual R estimates (σX, σY):
2.4 Projections under a simple epidemiological model
We finally display the expected number of confirmed cases in the future under the continuous model; thus case numbers of variant X and Y are changing according to Equations 3 and 4. We initialize the model (x(0) and y(0)) on 01.01.2021 with the estimated number of B.1.1.7 and non-B.1.1.7 cases on that day. We assume a reproductive number R = β/µ for the non-B.1.1.7 as estimated on the national level for 01.01.2021-17.01.2021. Finally, we assume that the expected generation time 1/µ is 4.8 days and the fitness advantage is the estimated fc for the region and dataset of interest (Table 3).
3 Results
We estimate the logistic growth rate a and the sigmoid’s midpoint t0 based on the Viollier and Risch data (Table 3). Taking the estimates of both datasets together, we obtain a growth rate a of around 0.09-0.10 per day for Switzerland. For each economic region, the estimated uncertainty interval of a overlaps with the Swiss-wide uncertainty interval. We have little data for two out of seven regions (Ticino and Central Switzerland; <550 sequences in total) resulting in very wide uncertainty intervals. From the t0 estimates, we observe that the Geneva region is about 1-2 weeks ahead of the rest of Switzerland with respect to the spread of B.1.1.7, which confirms estimates from [10]. We estimate that B.1.1.7 will become dominant in Switzerland in March 2021.
In Fig. 1 and 2, we graphically illustrate the logistic growth in frequency of B.1.1.7 and show the daily data together with an estimate of the proportion of B.1.1.7 under the logistic growth model (parameter p(ti) on each day ti).
Next, we estimate the reproductive number for B.1.1.7 and non-B.1.1.7 on a national scale (Fig. 3). During 01.01.2021-17.01.2021, the reproductive number for B.1.1.7 was significantly above 1 (Viollier: 1.28 [1.07-1.49], Risch: 1.46 [1.23,1.68]) while the reproductive number of non-B.1.1.7 tended to be below 1 (Viollier: 0.83 [0.63-1.03], Risch: 0.82 [0.61,1.03]).
Based on the Viollier data, the reproductive number did not change much after 18.01.2021, however the confidence intervals are wide. As expected, the ratio of the reproductive numbers for B.1.1.7 and non-B.1.1.7 stays roughly constant throughout January. Our data does not allow us to estimate the reproductive number for February yet.
For the Risch data, the reproductive number of non-B.1.1.7 remained roughly constant throughout January. Based on our discrete and continuous model described above, we would then expect also a roughly constant reproduction number for B.1.1.7. However, the reproduction number for B.1.1.7 dropped throughout January, which means that the ratio of the reproductive numbers for B.1.1.7 and non-B.1.1.7 dropped. This is not expected under our models. We will monitor this pattern over the next weeks to see if there is bias in the recent Risch data or if this phenomena in the data is not covered by our model.
We use the average reproductive number during 01.01.2021-17.01.2021 to calculate the transmission fitness advantage fc of B.1.1.7 under our continuous time model. Further, we calculate the transmission fitness advantage fd under a discrete time model. In both cases, we assume a generation time g of 4.8 days (which is the mean generation time assumed in the estimation of the reproductive number). The fitness values are shown in Table 3. On a national level, we estimate a fitness advantage of 49-65% across methods and datasets. The regional estimates have an overlap with this interval. We note that we use the national reproductive number for the regional fc estimate. Since Ticino had a lower and the Lake Geneva region a higher reproductive number averaged over all variants [25], the fc for Ticino may be an underestimate and the fc for Geneva an overestimate.
Next, we show the expected dynamics of the epidemic under the continuous model using parameter values based on epidemic conditions in the first half of January (Fig. 4 and 5). We show the development of confirmed case numbers based on this model in the blue (B.1.1.7) and green (non-B.1.1.7) areas. In particular, in January, the model projects a decline in overall case numbers due to a decline in non-B.1.1.7 variants. However, the B.1.1.7 variant is increasing. Under this model, once B.1.1.7 becomes dominant, the total number of cases will be increasing again. It is important to note that this simple model assumes that the transmission dynamics do not change and in particular does not include the effects of vaccination, immunity after infection, and population heterogeneity.
We investigate if our empirical data follows this trend. The dark blue line shows the estimated number of B.1.1.7 cases. Again, this number is the product of the total number of confirmed cases for a day by the proportion of the B.1.1.7 variant for that day. In dark green, we plot the total number of confirmed cases. If the empirical data develops as the model, the dark blue line follows the upper end of the blue area and the dark green line follows the upper end of the green area.
Throughout January, the model and the empirical data follow the same trends across datasets and regions with the exception of Ticino and the Lage Geneva region. For the national level and the five regions with a good match in January, we observe that the empirical total case numbers drop faster compared to the model starting early February. Data over the next week about the B.1.1.7 variant will reveal if this pattern will be confirmed for the variant and if the reproductive number for both the B.1.1.7 variant and the non-B.1.1.7 variants is declining in February.
The discrepancy for Ticino and Lake Geneva is not surprising: they had a reproductive number which was different from the national reproductive number in the first half of January. For Ticino, the empirical case numbers drop faster than the model, which is in line with a lower reported reproductive number compared to the national level. For the Lake Geneva region, the empirical case numbers drop slower than the model, which is in line with a higher reported reproductive number compared to the national level[25]. For all regions but Ticino, we have enough data to estimate a reproductive number for the non-B.1.1.7 variants for 01.01.2021-17.01.2021. While for Switzerland, we obtained a point estimate of 0.83, the point estimates for all regions but Lake Geneva are between 0.81-0.83. Thus using the point estimate for all of Switzerland for the regional plots in Fig. 5 is justified. For Geneva, we obtain a point estimate of 0.88. We use this point estimate estimate in a Lake Geneva specific model (Fig. 6). Again we observe that the total number of confirmed cases dropped recently faster compared to the model.
4 Discussion
We have quantified the transmission fitness advantage of B.1.1.7 for two Swiss datasets. One dataset also allows us to obtain estimates for the seven Swiss economic regions. Swiss-wide estimates point towards a transmission fitness advantage of 49-65%. Our estimates predict that in March, B.1.1.7 will be dominant in Switzerland. This is in line with predictions by [10] based on Swiss data tracking 501Y mutations. Cases are confirmed 8-11 days after the time of infection in Switzerland, thus actually the majority of new infections may already be caused by B.1.1.7 as of today – the 24.02.2021.
The increase in frequency of B.1.1.7 in Switzerland occurred as expected given the large transmission advantage. In fact, unless measures such as contact tracing target in particular B.1.1.7, this increase in relative frequency will occur given a transmission advantage. However, variables such as implemented measures, the adherence to measures, or levels of immunity in the population determine how fast the variants increase in absolute numbers.
We show that in the first half of January, the absolute numbers of B.1.1.7 increased (R-value 1.28 [1.07-1.49] for 01.01.2021-17.01.2021) based on our sequence dataset) while the absolute numbers of all other cases decreased (R-value 0.83 [0.63-1.03]). For the rest of January, the reproductive number for B.1.1.7 and non-B.1.1.7 did not change much according to the Viollier data. The Risch data suggests a drop in only the B.1.1.7 reproductive number. However, based on our models on the transmission advantage, we do not expect a change in the ratio of the R-values of the two variants. Future data will reveal if this drop reflects a bias in the data or if this phenomena in the data is not covered by our model.
We observe that the number of confirmed cases in February is lower compared to our model scenario using parameter estimates from the first half of January. This may reflect an overall slowdown of transmission. Data on B.1.1.7 over the next week will reveal if the same slow-down is also observed for the B.1.1.7 variant.
Our data reveals that different regions from Switzerland were sampled with different intensity. Thus, we also performed analyses for the seven economic regions in Switzerland. Here, we expect that the homogeneous sampling intensity is met better compared to the national level. We may observe cases though which are imports from other economic regions compared to local transmission (the same problem of course exists on a national level, though the smaller the scale, the more imports we expect). However, the economic regions represent well-defined regions where we expect a lot of mixing within and less mixing across regions. The estimates of the transmission fitness advantage for the economic regions are largely in line with the national estimates, though of course with larger uncertainty. Two of the regions (Ticino, Central Switzerland) have too little data to make precise statements. Geneva is estimated to be around 1-2 weeks ahead in the B.1.1.7 dynamics compared to Switzerland as a whole. One explanation may be a large number of UK travelers in ski resorts in the Valais in December 2020.
We characterized 4.9% of all confirmed cases over our study period through whole genome sequencing. Our sequencing efforts will continue on a weekly basis with all data being submitted to GISAID. These efforts, where we aim to sequence a random subset of cases throughout Switzerland, are envisioned to facilitate the global tracking of SARS-CoV-2, to allow monitoring the spread of known VoCs, and to enable rapid identification of new VoCs. Such whole genome sequencing efforts are complemented by Swiss-wide VoC-specific tracing efforts [8]. In particular, in the Dr Risch medical laboratories, samples are checked for S dropouts and screened for the 501Y mutation. If a VoC is suspected based on these analyses, whole genome sequencing is performed. In that way, we enrich our whole genome dataset, yielding to an overall characterization of 11.5% of all confirmed cases.
Overall we see consistent signal for a large transmission advantage of B.1.1.7 in Switzerland. This confirms estimates from studies from the UK [1, 2, 3], Denmark [3], and Switzerland ([10]; looking at 501Y mutations). In February, the epidemic appears to have slowed down overall. This may be due to effects of changes in measures on 18.01.2021 and improvement in compliance and adherence. Our core plots are available on [11] and [12] for real-time monitoring. Further, we will add new Variants of Concern to our webpage once they spread in Switzerland.
Data Availability
All used data and code are publicly available.
https://github.com/cevo-public/Quantification-of-the-spread-of-a-SARS-CoV-2-variant
Contribution
CC: Conceptualization, Data curation, Formal Analysis, Investigation, Methodology, Resources, Software, Validation, Visualization, Writing - original draft; SN: Data curation, Writing - review & editing; IT: Data Curation, Resources, Software, Validation; MM: Formal Analysis, Writing - review & editing; JSH: Formal Analysis, Writing - review & editing; KPJ: Data curation, Formal analysis, Investigation, Methodology, Software, Validation, Visualization, Writing - review & editing; LF: Software; DD: Data curation, Software, Validation; KJ: Formal analysis, Validation; CA: Methodology, Writing - review & editing; NB: Conceptualization, Funding acquisition, Project administration, Supervision, Writing - review & editing; MA: Conceptualization, Investigation, Writing - review & editing; TS: Conceptualization, Funding acquisition, Investigation, Methodology, Project administration, Supervision, Validation, Writing - original draft; Everyone else: Resources, Writing - review & editing.
Acknowledgement
We thank Richard Neher for useful discussions on the maximum likelihood inference. TS acknowledges funding from the Swiss National Science foundation (Special Call on Coronaviruses; 31CA30 196267 and 31CA30 196348). CA received funding from the European Union Horizon 2020 research and innovation programme - project EpiPose (No 101003688).