Abstract
We analyze the early data on COVID-19 expansion in selected European countries using an analytical parametric model. A description of the time dependence of the disease expansion and a method to evaluate trends of the expansion are proposed. Several features are observed in the data, namely a high predictability of the expansion of disease in Italy and a convergence of the “pushback” parameter towards a limiting value in all the countries where restrictive measures have been adopted. Basic predictions for the evolution of the disease expansion are made for selected countries with a stable evolution in the parametric space of the model. The findings presented here should contribute to the understanding of the behavior of the disease expansion and the role of the restrictive measures on the evolution of the expansion.
1. Introduction
The outbreak of new coronavirus SARS-CoV-2 causing severe respiratory tract infection in humans, known as COVID-19, is a global health concern. Restrictive measures have been adopted in many countries in order to mitigate the impact of the spread of the disease on public health system [1]. In this paper we propose a straightforward analytic description of the time dependence of the disease expansion under the restrictive measures and a method allowing to identify trends in the expansion and make predictions. The analytic parametric modeling can be used as an alternative to complex models for the disease expansion such as those reviewed e.g. in Ref. [2].
In the first section of this paper, we introduce the model, analyze the data which are available as of 31st of March 2020 from selected European countries and discuss several features seen in the data. In the second section, we explain calculations for predictions and we make predictions for three countries which appear to be on a predictable trajectory in the parametric space of the model. Data used in this work are taken from [3], cross-checked using information system described in Ref. [4], and from [5].
2. Analytic parametric model of COVID-19 expansion
Number of newly infected people from one infected person has a power-law dependence on time. In general, the total number of infected people in time t can be fully characterized by a function
where αi and ai are unknown parameters. The sum goes trough individual centers of the disease expansion. These centers carry constant parameters αi and ai only over a limited time interval which is characterized by a step function χi(t). The full description of the outbreak by a function (1) can be replaced by exponential dependence with time dependent parameter a(t), that is by the function
where the right-hand side of (2) represents a Tay-lor expansion of the time dependent parameter a. If one assumes that a(t) is constant or monoton-ically decreasing with time, one may assume that |bi| > |bi+1| and higher order coefficients of the Tay-lor expansion will contribute less to F (t). One may therefore start the description of the total number of infected people at a given time t with three approximations,
Approximation (3) is a pure exponential distribution which we shall observe if no human measures are taken to control the spread of the disease and no immunization of the population is assumed. Approximation (4) is modified exponential expansion where parameter b2, if negative, characterizes the “pushback” resulting from human measures to control the disease. Approximation (5) allows to control the validity of |bi| > |bi+1| assumption and a need for the higher order terms to model the spread of the disease.
Data can be fitted by (3), (4), and (5). The quality of the fits then decides which of the approximation serves as the best description of the data. When fitting the data over the full available time range from three representative European countries, namely Italy, France, and Czechia, we found that indeed |b1|≫| b2|≫| b3| and that approximation (5) has slightly worse χ2/NDOF than approximation (4) (typically by 10%). We may therefore conclude that approximation (5) and higher order terms are not needed for parameterizing the data.
To evaluate the evolution of the spread of the disease as a function of time, one can study the time dependence of parameters b1, b2, and N0. This can be done by fitting the data in time windows or by fitting the data between the time of the beginning and a given time t. The former method is more susceptible to fluctuations of the data which may occur within the time windows while the latter method may be biased by the change of the parameters in time. We tested both of these methods as well as various sizes of the time window with the conclusion that fitting from the beginning leads to less fluctuating values of parameters which exhibit the same trends and tend to converge to the same values as the values of parameters analyzed by fits in time windows. From now on, we therefore evaluate the evolution of parameters b1, b2, and N0 by fitting from the beginning up to a time t indicated on the x-axis of plots. The evolution of parameters b1, b2, and N0 is shown in upper left and lower panels of Figure 1. One can see that parameters are highly oscillating at the beginning but then for times greater than 20 days since the first registered cases, one can see a hint of a convergence. Parameter b1 first oscillates and then decreases and at later time it tends to converge for Italy and France towards a value in the interval of 0.3 − 0.35. Parameter b2 typically starts at zero which indicates the purely exponential spread of the disease at the beginning when no measures are taken. At later times, b2 tends to certain negative values. The study of correlations among parameters shows that the correlations are very high and the correlation coefficient may achieve values greater than 0.95 for certain times.
Upper left: Number of COVID-19 cases as a function of time since the first detection in a given country. The distribution is fitted by function defined in Equation (4). Fits ignore data from first five days and data with number of cases smaller than 100. Parameters of the fit: N0 (upper right), b1 (lower left), and b2 (lower right) evaluated as a function of time since the first detection in a given country (i.e. fit stops at the time indicated on x-axis).
The parameter b1 can be interpreted as a parameter quantifying the uncontrolled spread of the disease. As such, it will be assumed that it is universal for all the European countries since they share approximately the same default density of interactions among inhabitants (cities of similar density, similar level of public transportation, similar default cultural behavior). Consequently, for subsequent studies, the parameter b1 was fixed to a value 0.32 representing a value preferred by a minimum χ2 condition for the fit in the country with the most stable evolution which is Italy.
The time evolution of N0 and b2 parameters in the case of fixed b1 is shown in Figure 2. One may identify several features seen from the behavior of parameters:
Parameters of the fit of the distribution of number of COVID-19 cases by function defined in Equation (4) with b1 parameter fixed. N0 (left) and b2 (right) evaluated as a function of time since the first detection in a given country (i.e. fit stops at the time indicated on x-axis).
Parameter b2 starts from zero (no restrictive measures), then decreases as expected for all the countries, and then it becomes constant for countries where restrictive measures are in place for a longer time (namely Italy and France).
All the countries seem to tend to the same value of the parameter b2.
Parameters b2 and N0 are consistent with a constant over the last ∼ 20 days in Italy. This means that the situation in Italy is highly predictable as further discussed in Section 3.
Parameter N0 is much higher for the case of Italy than for other countries.
The features 1− 3, if confirmed by other studies, open questions for a detailed epidemiological analysis: What is the reason for the existence of a limiting efficiency of applied measures which is indicated by the data? Is it e.g. due to the high-efficiency for the disease to spread inside closed communities? Or is it e.g. due to having a certain fraction of the society failing to follow correctly the measures? The analysis presented here obviously cannot answer these questions. At the same time, it allows us to assess the efficiency of applying the existing measures. For example, if we observe that the parameter b2 achieves a limiting value of Italy even in countries where more restrictive measures have been applied, such as mandatory usage of masks in Czechia, then this indicates that the use of these measures does not bring further reduction of the spread of the disease1.
Since N0 reflects the number of cases at the time zero, feature 4 implies that the infection was likely present in Italy even before the time zero. This may also explain the excessive values of parameter b1 observed for first ∼ 5 days.
We should note that when fixing b1, the values of parameters N0 and b2 still remain correlated. In some cases the correlation coefficient is greater than 0.9. This may limit the predictive power of the model and a straightforward interpretation of the parameter N0 as the number of cases at time zero. A way to reduce these correlations needs to be further studied. So far we have tested alternatives to (3)–(5), e.g.
and
with bi, ti, and c being free parameters and k = 1, 2. None of these more ad-hoc functional forms provided a better description of the data than (4).
While N0 and b2 remain correlated, one can notice that the values of the N0 and b2 start to be constant in time for Italy and France even if the number of cases evolve with time. This builds a confidence in the predictive power of the model which is discussed in the next section.
3. Predicting further evolution
When being in the situation where all the parameters tend to constant values, one can predict the behavior of the spread of the disease using simple analytic formulae. An estimate of the time when the restrictive measures (b2) outperform the uncontrolled spread (b1), tmax, that is the time when no new cases of the disease should be registered is given by a derivative of (4) with respect to time,
which also allows to calculate the total number of cases,
An estimate of the time when the number of new cases should stop to grow, tΔ, is an inflection point of F (t) and it is given by a second derivative of (4) with respect to time,
The results of the predictions using Equations (8)–(10) are shown in Figure 3 for countries with constant parameters of the fit which is Italy and France, and for Czechia which seems to be converging to a constant value of parameters as well. We do not attempt to introduce systematic uncertainties on the predictions at this point. Instead, we use evaluation of the distribution using two methods: using fits in time windows of seven days and the fits over the full distribution. As already mentioned before, the later method brings results less susceptible to fluctuations.
Left: Estimates of time when no new cases should occur, tmax. Middle: total number of infected people at tmax, N (tmax). Both tmax and N (tmax) are shown as a function of the last day since the beginning of the spread for which the data were included to the fit. Right: Prediction for the number of new cases as a function of day in April 2020.
Left and middle panel of Figure 3 show the prediction for the tmax values and N (tmax) values, respectively, as a function of time since the first registered case. One can therefore see what prediction one would make in the past. The current prediction is represented by the last point of the distribution. One can again see that the prediction does not change over the last ∼20 and ∼ 10 days in the case of Italy and France, respectively. As said before, these countries seem to be on a highly predictable trajectory which can be illustrated for Italy by evaluating the prediction for the total number of cases for today which one would make 20 days ago. The prediction would be 100970 cases while data for today (30th of March) show 101740 cases. For the case of Czechia, one can see an evolution towards more optimistic scenario as the time increases. At the same time, the prediction remains relatively stable over the last few days.
The prediction for the tΔ is given in terms of the distribution of newly indicated cases as a function of time from today. One can see that the maximum of new cases should already been achieved in Italy, while in France and Czech Republic it is expected in about 4 and 1 − 4 days, respectively.
4. Summary and conclusions
We proposed a straightforward method for evaluating the temporal dependence of spread of COVID-19 disease. The analysis of the data indicate several features, namely the high predictability of the expansion of disease in Italy and a convergence of the “pushback” parameter towards a limiting value in all the countries where restrictive measures are applied. Predictions of the evolution of the spread of the disease are made for three countries that appear to be on a predictable trajectory in the parametric space of the model, namely for Italy, France, and Czechia.
The proposed model and analysis method represent an alternative to complex modeling which is simple to implement and independently verify, and which can be quickly extended towards more complete description of the situation in many different countries affected by the COVID-19 outbreak.
Data Availability
This works uses external data available from links included in the work or below
Acknowledgment
I’d like to thank Jiří Dolejší for useful discussions and suggestions and for careful reading the manuscript.
Footnotes
Email address: martin.spousta{at}mff.cuni.cz (Martin Spousta)
1 1At the same time it would obviously not prove that this particular measure is not very efficient in the case when no other measures are taken. Additionally, if the b2 value for Czechia goes below the b2 value for Italy, it would indicate that more restrictive measures applied in Czechia may increase the efficiency in reducing the spread of the disease.