Abstract
(1) Background The estimation of daily reproduction rates throughout the infectivity period is rarely considered and only their sum Ro is calculated to quantify the level of virulence of an infectious agent;
(2) Methods We give the equation of the discrete dynamics of epidemic growth and we obtain an estimation of the daily reproduction rates, by using a technique of deconvolution of the series of observed new cases of Covid-19;
(3) Results We give both simulation results as well as estimations for several countries for the Covid-19 outbreak;
(4) Conclusions We discuss the role of the noise on the precision of the estimation and we open on perspectives of forecasting methods to predict the distribution of daily reproduction rates along the infectivity period.
1 Introduction
After the Severe Acute Respiratory Syndrome outbreak by coronavirus SARS CoV in 2002 and the Middle East Respiratory Syndrome outbreak by coronavirus MERS CoV in 2012, the Covid-19 disease by coronavirus SARS CoV-2 is the third coronavirus outbreak in the past two decades. Human coronaviruses including 229E, OC43, NL63, and HKU1 are a group of viruses that cause a significant percentage of all common colds in humans. The SARS CoV-2 can be transmitted from person to person by respiratory droplets and through contact and fomites. Therefore, the severity of disease symptoms such as cough, sputum, and their viral load are most often important factors in the virus’s ability to spread, and these factors can change rapidly in just a few days during the period of infectivity of an individual. This ability to spread is quantified by the classic basic reproduction number Ro (also called basic reproduction ratio or rate), an epidemiology parameter which describes the transmissibility of an infectious agent and is equal to the mean number of susceptible individuals, an infected individual can contaminate during his infectivity period. Ro is not a biological constant for a pathogen: in fact, Ro is affected by numerous exogenous factors like geoclimate, demographic and socio-behavioural factors that govern pathogen transmission [1,2,3,4,5,6].
Because of these environmental factors, Ro might change seasonally, but these factors variations are not significant if a very short time in considered. Ro depends also on endogenous factors like the viral load of the infected individuals during their infectivity period; the variations of this viral load [8,9,10,11,12,13,14] is often neglected in the theoretical and applied studies on the Covid-19 outbreak, in which the authors estimate a unique reproduction number Ro linked to the Malthusian growth parameter of the exponential phase of the epidemic during which Ro is greater than 1 (Figure 1) and they rarely consider the distribution of partial daily reproduction numbers at day j of the infectivity period, denoted [1]. When this distribution is considered, it is more convenient i) to define for each age class the distribution of the marginal daily reproduction numbers, ii) to estimate in each case its entropy and simulate the dynamics either using a Markovian model like that defined in Delbrück’s approach [2] or an ODE SIR model. In the Markovian case, Ro has to be replaced by the evolutionary entropy defined by L. Demetrius as the Kolmogorov-Sinaï entropy of the corresponding random process, which has properties concerning the stability of its Markovian invariant measure, analogue to the properties of the Malthusian parameter for the stability of the ODE’s steady state [3,4]. We compare these two modelling approaches in Section 2, then present in Section 3 as illustration results of estimation of daily reproduction rates in different countries at different periods. In Section 4, we discuss our method in the framework of the classical population dynamics (Leslie model) and eventually, in Section 5, we conclude and open the approach to the multi-age model.
2 Materials and Methods
2.1 Relationship between Markovian and ODE SIR approach
2.1.1 Obtaining SIR equation from a discrete mechanism
Let consider the classical reproduction rate Ro. If the model is deterministic, if Xj denotes the number of new cases at day j, and the contagious period is made of r consecutive days, with Rk the marginal reproduction rate at day k of the contagious period, we have:
It is easy to show that, if Xo = 1 and r = 5 (inside the estimated interval of the duration of the maximal contagion period for the COVID-19 [5,6,7,8,9,10,11,12,13,14,15,16,17], we obtain:
If R2 and R3 are dominant and equal to R/2, then X5 behaves as 2R2R3 = R2/2, which shows the difficulty to estimate Ro, which is the mean value of the on the contagious period. The length of this period can be estimated from the ARIMA series of the stationary random variables , equal to the without their trend: one can take the interval on which the auto-correlation function remains more than a certain threshold, e.g., 0.1 [15]. More generally if R1 = a, R2 = b and R3 = c, we get:
If R1 and R2 equal respectively a and b, and if a = b = R/2, c = 0, then X5 behaves like:
If R = 1, {Xj}j=1,∞ is the Fibonacci sequence, and more generally, for R > 1, the generalised Fibonacci sequence. Let suppose now that b = 0 and a depends on the day j such as aj = νC(j), where C(j) represents the number of possible susceptible individuals, which can be recruitable by one infectious individual at day j. If cumulated infected individuals (supposed to be all infectious) at day j are denoted by Ij, we have:
Suppose that the first infected is recruited at the centre of his sphere of influence and that the secondary infected individuals remain in this sphere, by widening the radius on day j, therefore the susceptible individuals C(j), where each infectious on day j − 1 can recruit are located on a decreasing part of the sphere of influence of the first infected 0 (Figure 2).
The function C(j) decreases due to the bulk on the successive spheres and we can consider the following functional form C(j) = S(j)/(c + S(j)), where S(j) is the number of susceptible individuals at day j. Then, we can write the following equation taking into account the mortality:
The corresponding continuous equation is close to the SIR equation, if c is greater before S:
2.1.2 Second obtention of the SIR equation from a discrete mechanism
Another way to derive the SIR equation from a probabilistic approach is to start from the microscopic equation of molecular shocks by Delbrück [2] which corresponds to a classical birth-and-death process: if at least one event (contact ν, birth f, death μ or recovering ρ) occurs in (t, t + dt), we have, if births compensate deaths, leaving constant the total size N of the population:
Hence, we have, if Pk(t) denotes P (S(t) = k, I(t) = N − k): and we obtain:
Then, by multiplying by sk and summing over k, we obtain the characteristic function of the random variable S, which is proven to be a Poisson random variable if the coefficients ν, f, μ and ρ are sufficiently small. If births do not compensate deaths, we have:
If S and I are independent and if the coefficients ν, f, μ and ρ are sufficiently small, they are Poisson random variables, whose expectations E(S) and E(I) verify: which leads to the SIR equations for the variables S, I and R considered as being deterministic:
2.2 Distribution of the rate of transmission along the infectiousness period of an individual
Figure 3 gives the average transmission rate Ro calculated on the 25th of October 2020 just before the second lockdown (re-confinement) in France [16], but because at this day the second wave of the epidemic is still in its exponential phase, it is more convenient i) to consider the distribution of the marginal daily reproduction numbers, and ii) to calculate its entropy and simulate the epidemic dynamics using a Markovian model [2]. If Ro denotes the average transmission rate (or mean reproduction number) among the studied population, we can estimate the distribution V (whose coefficients are Vj = Rj/Ro) of the daily reproduction numbers Rj along the infectiousness period, by remarking that the number Xj of new infectious cases at day j, equal to Xj = I(j)–I(j − 1), where I(j) is the number of infectious at day j, verifies the discrete convolution equation: where r is the duration of the contagion period, estimated by 1/(k +μ), where k is the recovering rate and μ the death rate in SIR equations: where S and I are respectively the size of the susceptible and infectious populations.
If r and S can be considered as constant during the first exponential phase of the pandemic, we can also assume that the distribution V is constant and then, V can be estimated by solving the linear system: which can be written as X = MR, hence giving: and the equation (13) can be solved numerically, if the pandemic is observed during a time greater than 1/(k + μ). Then, the entropy of V = R/Ro can be calculated, as the Kolmogorov-Sinaï entropy of the Markovian Delbrück scheme [2] ruling the and giving new parameters for characterising pandemic dynamics, namely for quantifying its robustness and stability [3,4].
2.3 The biphasic pattern of the virulence curve in coronaviruses
Mostly, the clinical course of patients with seasonal influenza shows a biphasic occurrence of symptoms with two distinct peaks. Patients have a classic influenza disease followed by an improvement period and a recurrence of the symptoms [1,2,3]. The influenza RNA virus shedding (the time during which a person might be infectious to another person) increases sharply one-half to one day after infection, peaks on day 2 and persists for an average total duration of 4.5 days, between 3 and 6 days, which explains that we will choose in the following as infectivity duration these extreme values, i.e., either 3 or 6 days depending on the positivity of the estimated daily reproduction rates. It is common to see this biphasic influenza clinically: after incubation of one day, there is a high fever, then a drop in temperature before rising, hence the term “V” fever. The other symptoms like coughing often also have this improvement on the second day of the flu attack: after a first feverish rise (39 − 40°C), the temperature drops to 38°C on 2nd day, then rises before disappearing on the 5th day, the fever being accompanied by respiratory signs (coughing, sneezing, clear rhinorrhea, etc.). By looking the shape of virulence curves observed in coronaviruses patients [5,6,7,8,9,10,11,12], we often see this biphasic pattern.
3 Results
3.1 Distribution of the daily reproduction numbers Rj’s along the infectiousness period of an individual. A theoretical deterministic example
Deterministic Toy Model
Let’s suppose that a priori are equal to: then the inverse of the matrix M is given by: and the deconvolution gives the a posteriori : thanks to the following calculation: and we obtain for the a posteriori distribution of the daily reproduction rates the exact replica of the a priori distribution.
3.2 Distribution of the daily reproduction numbers Rj’s. A simulated stochastic example
Let’s consider a stochastic version of the deterministic toy model given in Section 3.1., by introducing an increasing noise on the , e.g., a uniform distribution on the three following intervals:[2−a, 2+a], [1−a/2, 1+a/2], [2− a, 2 + a], with increasing values of a, from 0.1 to 1, in order to see when the deconvolution would give negative a posteriori , with conservation of the average of their sum Ro, if you repeat the random choice of the values of the at each generation?
We will give the results of the random simulations for increasing values of a from a = 0.1 to a = 1.
For a= 0.1, let’s choose the a priori distribution of the daily reproduction numbers R1 in the interval [1.9, 2.1], R2 in [0.95, 1.05] and R3 in [1.9, 2.1] as R1 = 2.1, R2 = 0.95, R3 = 2.1. Then, transition matrix M1 is equal to: and we have:
From X6 = 113.491, X5 = 41.7391, X4 = 15.351, a posteriori can be calculated: R1 = 2.1, R2 = 0.95, R3 = 2.1
The next a priori are chosen as: R1 = 2, R2 = 0.95, R3 = 1.9 and we have:
Then, we get the matrices M2 and
The a posteriori equal: R1 = 2.0279, R2 = 7.6158, R3 = −16.426
The following a priori are: R1 = 2, R2 = 1.05, R3 = 1.9 and we have:
Then, we get the matrices M3 and
The a posteriori equal: R1 = 2.486, R2 = −2.33, R3 = 7.38769
The next a priori are: R1 = 1.9, R2 = 1.05, R3 = 1.9 and we have:
Then, we get the matrices M4 and
The a posteriori equal: R1 = 2.69, R2 = −16.72, R3 = 43.809
For a=1, let’s choose the a priori R1 in [1.0, 3.0], R2 in [0.5, 1.5] and R3 in [1, 3] e.g., R1 = 1, R2 = 1.355, R3 = 1.1. Then, transition matrix is equal to: and its inverse is given by:
The new cases are: X6 = 18.209, X5 = 9.101, X4 = 4.81, X3 = 2.355, X2 = 1, X1 = 1, and by deconvolution, we get a posteriori equal: R1 = 1, R2 = 1.355, R3 = 1.1, i.e., the exact a priori distribution.
Let’s consider now a new a priori R1 = 1, R2 = 1, R3 = 1. That gives a new matrix M2, with new X7 and X8 calculated from the new a priori , by using the former values of X6, &, X2 :
Hence, we get:
Then, by inverting M2, we can calculate the a posteriori : and a posteriori equal: R1 = 2.90, R2 = 5.4888, R3 = −14.696
We continue the process by calculating X9 and X10 using new a priori R1 = 3, R2 = 0.5, R3 = 2.9 :
Hence, we get: and a posteriori equal: R1 = 3.66898, R2 = −33.857, R3 = 61.32
Let’s choose now a new a priori R1 = 2.6, R2 = 0.7, R3 = 2.6
Then, we have: and we get the matrices M4 and :
Then, a posteriori equal: R1 = 2.99859, R2 = −1.785, R3 = 7.139
More precise simulation results are given in Table 1, which summarises computations made for random choices of a priori distributions, for a = 0.1 and a = 1. These simulations show a great sensitivity to the noise, but a qualitative conservation of the down biphasic (D-B) shape of their distribution along the infectivity period of individuals.
3.3 Distribution of the daily reproduction numbers Rj’s. The real example of France
The Figure 3 gives the average transmission rate Ro calculated the 25th of October 2020 just before the second lockdown (re-confinement) in France [18]. Because the second wave of the epidemic is still in its exponential phase, it is more convenient i) to consider the distribution of the marginal daily reproduction numbers, and ii) to calculate its entropy and simulate the epidemic dynamics using a Markovian model [2].
By using the daily new infectious cases given by [6], we can calculate M −1 for the period from 20th to 25th October 2020, by choosing 3 days for the duration of the infectiousness period and the following raw data for the new infected cases (Figure 3):
Oct 25 : 52010, 45422, 42032, 41622, 26676, 20468 : Oct 20
We have:
Hence, we can deduce the daily i.e., the vector (R1, R2, R3) :
The average transmission rate is equal to Ro ≈ 1, 174, value close to that calculated directly (Figure 3), giving V = (0.7, 0.085, 0.215), with a maximal daily reproduction rate the first day of the infectiousness period. The entropy H of V is equal to: H = −∑k=1,r VkLog(Vk) = 0.25 + 0.21 + 0.33 = 0.79.
3.4 Calculations of the Rj’s for different countries
3.4.1 Chile
By using the daily new infectious cases given by [6], we can calculate M −1 for the period from 1st to 12th November 2020, by choosing 6 days for the duration of the infectiousness period and the following 7-day moving average data for the new infected cases (Figure 4):
Nov 12 : 1408, 1394, 1387, 1384, 1385, 1389, 1405, 1362, 1359, 1382, 1370, 1400 : Nov1
We have:
Hence, after deconvolution we have:
The average transmission rate is equal to Ro ≈ 1, 011, value close to that calculated directly, with a maximal daily reproduction rate the last day of the infectiousness period.
Because of the negativity of R1, we cannot derive the distribution V and therefore calculate its entropy. The quasi-endemic situation in Chile since the end of August, which corresponds to the increase of temperature and drought at this period of the year [15], gives a constancy of new cases of infected and a periodicity of their occurrence corresponding to the length of the infectious period of about 6 days, analogue to the cyclic phenomenon observed in simulated stochastic data of Section 3.2. with a similar M-shaped distribution of the .
3.4.2 Russia
By using the daily new infectious cases given by [6], we can calculate M −1 for the period from September 30 to October 5, 2020, by choosing 3 days for the duration of the infectiousness period and the following 7-day moving average data for the new infected cases (Figure 5):
Oct 05 : 9473, 9081, 8704, 8371, 8056, 7721 : Sept 30
We have: and where:
The average transmission rate is equal to Ro ≈ 1, 073, value close to that calculated directly, with a maximal daily reproduction rate the first day of the infectiousness period.
Because of the negativity of R2, we cannot derive the distribution V and therefore calculate its entropy. The period studied corresponds to a local slow increase of new infectious cases at the start of the second wave in Russia, which seems to have a succession of slightly inclined 4-day plateaus followed by a step.
3.4.3 Nigeria
By using the daily new infectious cases given by [6], we can calculate M −1 for the period from November 5 to November 10 2020, by choosing 3 days for the duration of the infectiousness period and the following raw data for the new infected cases (Figure 6) :
Nov 10 : 166, 164, 161, 133, 149, 141 : Nov5
We have:
After deconvolution, we get:
The average transmission rate is equal to Ro ≈ 1.129, value close to that calculated directly, with a maximal daily reproduction rate the last day of the infectiousness period. V = (0.143, 0.342, 0.515) and the entropy H of V is equal to : H = −∑k=1,r VkLog(Vk) = 0.29 + 0.37 + 0.34 = 1.
3.5 Weekly patterns in daily infection cases
Daily new infectious cases are highly affected by weekdays, such that case numbers are lowest at the start of the week, and increase afterwards. This pattern is observed at the global level, as well as at the level of almost each single country or USA state. Hence, in order to estimate biologically meaningful infection rates, clean of weekly patterns due to administrative constraints, analyses have to be restricted to specific periods shorter than a week, or at the rare occasions when patterns escape the administrative constraints on weekly patterns. This weekly phenomenon occurs during exponential increase as well as decrease phases of the pandemic, and during stagnation periods in numbers of daily cases. In addition, the daily new infection case record is discontinuous for many countries/regions, which frequently publish on Monday or Tuesday a cumulative count for that day and the weekend days. For example, Sweden typically publishes over one week only four numbers, the one on Tuesday cumulating cases for Saturday, Sunday and the two first weekdays. Discontinuity in records further limit the availability of data enabling detailed analyses of daily infection rates and can be considered as extreme weekday effects on new case records due to various administrative constraints.
We calculated Pearson correlation coefficients r between a running window of daily new case numbers of 20 consecutive days, and a running window of identical duration with different intervals between the two running windows. There is typically a peak with a time-lag of seven days between the two running windows. This could reflect a biological phenomenon of seven infection days. However, examinations of frequency distributions of time-lags for r maxima reveals, besides the median at time-lag 7 days, local maxima for multiples of 7 (14, 21, 28, 35, etc). About 50 percent of all local maxima in r involve time-lags that are multiples of seven (seven included). This excludes a biological causation. We tried to control for week days using two methods, and combinations thereof.
For the first method, we calculated z-scores for each weekday, considering the mean number of cases for each weekday, and subtracted that mean from the observed number for a day. This delta was then divided by the standard deviation for the number of cases for that day. This was done for each weekday.
The second method implies data smoothing using a running window of 5 consecutive days, where the mean number of cases calculated across the five days is subtracted from the observed number of cases for the third day. Hence data for a given day are compared to a mean including two previous, and two later days.
We constructed two further datasets, one where the second data-smoothing method is applied to the z-scores from the first method, and the other, where the z-scores from the first method are applied to the data after data-smoothing from the second method (Figures 10 and 11).
These four datasets transformed according to different methods and combinations thereof designed to control for weekday were analysed using the running window method. Despite attempts at controlling for weekday effects, the median time-lag was always seven days across all four transformed datasets, and local maxima in time-lag distributions were multiples of seven. Despite data transformations, about 50 percent of all local maxima were time-lags that are multiples of seven, seven included.
Visual inspection of plots of these transformed data versus time for daily new infection cases from the whole world systematically show systematic local biases in daily new infection cases (after transformations) on Sundays and Mondays, for all four transformed datasets, with Sundays and/or Mondays as local minima and/or local maxima, along which method or combination thereof. Hence, the methods we used failed to neutralise the weekly patterns in daily new cases due to administrative constraints. This issue highly limits the data available for detailed analyses of daily new cases aimed at estimating biologically relevant estimates of infection rates at the level of short temporal scales, but it reinforces the hypotheses done in previous sections on the existence of an heterogeneity of the virulence along the infectiousness period, independent of administrative constraints observed in collecting new daily cases.
4 Discussion
Rhodes and Demetrius have pointed out the interest of the distribution V of the daily reproduction numbers [5]. In particular, they found that this distribution was generally not uniform, which we have confirmed here by showing many cases with the biphasic form of the virulence observed in respiratory viruses, such as those of influenza. The entropy of the distribution V makes it possible to evaluate the intensity of this biphasic character. By taking into account the mortality due to the Covid-19, the discrete dynamics of new cases can be considered as a Leslie dynamics governed by the matrix equation Xj = LXj−1, where Xj is the vector of the new cases living at day j and L is the Leslie matrix given by: and where bj = 1 − μj ≤ 1, ∀i = 1, &, r, is the survival recovered probability between days j and j + 1.
The dynamical Xk stability for L2 distance to the stationary infection age pyramid P= limjXj/∑ k=j,j−r+1 Xk is related to |λ − λ,|, the modulus of the difference between the dominant and sub-dominant eigenvalues of L, namely λ = eR and λ, (where R is the Malthusian growth rate), where P is the left eigenvector of L corresponding to λ. The dynamical stability for the distance (or symmetrized divergence) of Kullback-Leibler to P considered as stationary distribution is related to the population entropy H [4,19,20,21,22,23,24], which is defined if Ij = ∏ i=1,j−1 bi and pj = IjRj/λj, as follows:
The mathematical characterisation by population entropy defined in (18) of the stochastic stability of the dynamics described by equation (17) has its origin in the theory of large deviations [25,26,27]. This notion of stability pertains to the rate at which the system returns to its steady state conditions after a random exogenous and/or endogenous perturbation and it could be useful to quantify the variations of the distribution of the daily reproduction rates observed for many countries.
5 Conclusions and Perspectives
The article has used a discrete approach for describing the dynamics of the Covid-19 outbreak. In the ODE SIR case [28,29,30], marginal can be estimated in the same way, but by using continuous deconvolution equation. An improvement in this framework could be the introduction of the demographic notion of age class, because the daily reproduction rates distribution is depending on the state of the immune defences both of the infected and susceptible individuals, and on the mode of transmission, this mode being associated in social settings recording age and infection schedule with more secondary cases for young and adult than for elderly at home [31,32]. Then, for each age class k a reproduction number Rok can be calculated by summing the marginal , which correspond to this age class k, and eventually a global Ro can be estimated from the , weighted by the proportion of their age class in the whole population. Both global Ro and dedicated to age classes could serve to follow the effects of public health measures as lockdowns and quarantines.
Data Availability
All data comes from public data bases.
6 Author Contributions
Conceptualisation, J.D.; methodology, J.D., H.S.; K.O. and F.T. have performed calculations and Figures; all authors have equally participated to the other steps of the article elaboration.
8 Conflicts of Interest
The authors declare no conflict of interest
7 Acknowledgements
The authors hereby give their thanks to the International Excellence Fellowship of Karlsruhe Institute of Technology (KIT) funded within the Framework of the University of Excellence Concept “The Research University in the Helmholtz Association I Living the Change”.
Footnotes
Jacques.Demongeot{at}univ-grenoble-alpes.fr, Kayode.Oshinubi{at}univ-grenoble-alpes.fr, florence.thuderoz{at}gmail.com, varanuseremius{at}gmail.com