Abstract
Here we present an improved mathematical analysis of the time evolution of the Covid-19 pandemic in Italy and a statistical error analyses of its evolution, including Monte Carlo simulations with a very large number of runs to evaluate the uncertainties in its evolution. A previous analysis was based on the assumption that the number of nasopharyngeal swabs would be constant, however the number of daily swabs has been increasing with an average factor of about five with respect to our previous analysis, Therefore, here we consider the time evolution of the ratio of the diagnosed positive cases to number of swabs, which is more representative of the evolution of the pandemic when the number of swabs is increasing or changing in time. We consider a number of possible distributions representing the evolution of the pandemic in Italy and we test their prediction capability over a period of up to four weeks. The results show that a distribution of the type of Planck’s black body radiation law provides very good forecasting. The use of different distributions provides an independent possible estimate of the uncertainty. We then consider five possible cases for the number of daily swabs and we estimate the potential dates of a substantial reduction in the number of diagnosed positive cases. We then estimate the spread in a substantial reduction, below a certain threshold, of the daily cases per swab among the Italian regions. We finally perform Monte Carlo simulations with 25000 runs to evaluate a random uncertainty in the prediction of the date of a substantial reduction in the number of diagnosed daily cases per swab.
1. Introduction
In a previous paper, we estimated the possible dates of a substantial reduction in the daily number of diagnosed positive cases of the Covid-19 pandemic based on the assumption that the number of nasopharyngeal swabs would remain roughly constant1,2. At the time of our previous analysis (March 26), the average daily number of swabs from February 15 was about 9000 per day. However, from March 27 up to April 25, the average number of daily swabs was about 45000 (recently on May 1, the number of daily swabs reached the maximum of 74208). Therefore, to study the evolution of the Covid-19 pandemic, we have to consider the analysis of the ratio of daily diagnosed positive cases per swab. To possibly mathematically predict the evolution of the pandemic in Italy we can fit the ratio of cases per swab using several different distributions, however some distributions are more suitable than others for forecasting the future behavior of the pandemic. We consider the following distributions: Gauss, Beta, Gamma, Weibull, Lognormal and two of the type of the Planck’s black body radiation law. The number of parameters chosen in some distributions is between two and three. It turns out that Planck’s law types distribution with three independent parameters provides the best predictions.
Furthermore, since the number of daily swabs depends on factors that are unknown to us, such as the daily availability of reagents and specialized personnel, we consider five possible cases for the daily number of swabs, we have also assumed some possible time evolution in the number of daily swabs which for brevity we do not report here. We fit the time evolution of the positive cases per unit swab up to April 25, using the two best-fit-prediction distributions, i.e., Planck with three parameters and Gamma, along with a Gauss distribution. After analyzing the time evolution of the positive cases per unit swab using these three distributions and five conceivable number of daily swabs, we estimate the evolution in the number of diagnosed positive cases and the dates of a substantial reduction in such a daily number. Furthermore, in section 3.1 we estimate the spread in the dates of a substantial reduction in the number of daily cases per swab among the regions of Italy, where the conditions are quite different from each other, including the number of swabs per person.
The different distributions that we use provide a possible independent way to estimate the uncertainty. Indeed, a basic problem is to mathematically estimate the uncertainty in the date of a substantial reduction of daily cases. For the purpose of estimating the random uncertainties, in section 3 we report the results of a Monte Carlo simulation with 25000 runs.
2 A mathematical analysis of the ratio of daily positive cases per swab
After analyzing the time trend of the ratio of daily positive cases to the number of daily swabs3–5, we found1,2 that this trend can be modeled by a Gauss distribution, however the time trend has also a certain amount of skewness that can better be fitted by choosing a skewed distribution such as the Weibull, Log-normal, Beta and Gamma distributions, and also other distributions of the type of the Planck’s black body law. This last one, reported in Fig. 1, shows the best prediction capabilities, with three parameters a and b and c: where t is the time. In Figs. 2a to 2e are reported the fits of the data with a distributions of the type of the Planck’s black body law with two parameters (i.e., with the exponent of the time variable t equal to 3 as in the Planck’s black body law), and with distributions of the type of the Gamma, Beta, Weibull, Lognormal respectively. The data can also be approximated by a function of the type of a Gauss function with three parameters a, b and c: as shown in Fig. 2f.
2.1 Prediction capabilities of the distributions fitting the data
As a relevant test to select the distribution which is more suitable to predict the evolution in time of the daily positive cases per swab, we consider for each of the above distributions, three different intervals of daily ratios: from February 25 until April 18, from February 25 until April 11 and from February 25 until March 28, i.e., one week before the date of April 25, two weeks before April 25 and four weeks before April 25, respectively. We then test which one gives the best predictions for the measured ratios during the last five days (from April 21 until April 25) and the lowest absolute value of the mean and Root Mean Square (RMS) of the residuals (difference between the measured daily data and the fitting function) up to April 25, from April 18, April 11 and March 28, respectively. As shown in Tables 1 and 2, over a long period of about one month the distribution providing the best prediction is of the type of the Planck’s black body law with three parameters. Over a short period of up to two weeks the Gamma distribution shows a slightly better prediction behavior but less stable than the Plank’s law with three parameters. The Beta and Plank with two parameters are also distributions with good prediction capability. The Weibull distribution does not fit well the data.
3. Analysis of daily diagnosed cases per swab of Covid-19 in Italy and Monte Carlo simulations
In the present section, we first analyze the expected dates of a substantial reduction in the ratio of the daily diagnosed cases per swab in Italy and we then perform a Monte Carlo simulation with 25000 runs to evaluate the random uncertainties in these dates.
Using the best fitting distribution, i.e., a Planck-type distribution with three parameters (see previous section 2) and Gamma-type and Gauss-type distributions, we find the dates when we expect that the ratio of daily diagnosed cases per swab will be below certain thresholds. We choose these thresholds to be equal to the minimum measured ratio of cases per swab (in the range February 25 to April 30), equal to 0.02163, incidentally corresponding to the first available datum of such ratio occurred on February 25; one fifth of it and one tenth of it. These dates are summarized in Table 4.
However, several uncertainties can influence the number of the diagnosed positive cases of the Covid-19 pandemic, in addition to the number of nasopharyngeal swabs that have increased with time as explained in the previous section. Another source of uncertainty is related to the fact that the number of swabs is associated both to new cases and to the repeated ones of persons already tested as positive. However if we assume that the percentage of re-tested people is approximately constant such error should be small.
To possibly estimate the uncertainties in the number of positive cases, we use a Monte Carlo simulations7–9 with 25000 runs, similarly to what done in previous works1,2, and the differences in the predictions among the best distributions considered previously. The Monte Carlo simulation should account mainly for random uncertainties. The uncertainty we consider in the Monte Carlo simulation does not take into account the difference between the real number of cases (which is unknown) and the daily diagnosed ones which can be one order of magnitude higher, or even more, than the actual cases. However, it is usual in statistics to use a sample as being representative of the population under study and the present Monte Carlo simulation is performed for the number of diagnosed daily cases per swab.
For convenience to the reader, we summarize here the procedure used previously1,2 for the total number of cases where instead here is applied to the ratio of cases per swab, furthermore the number of runs have been largely increased from 150 to 25000. We assume a measurement uncertainty in the daily number of diagnosed cases per swab equal to 20% of each daily ratio (Gaussian distributed). Then, a random matrix (m×n) is generated, where n (columns) is the number of observed days and m (rows) is the number of random outcomes, which we have chosen to be 25000. Each number in the matrix is part of a Gaussian distribution with mean equal to 1 and sigma equal to 0.2 (i.e., 20% of 1), either row-wise and column-wise. So, starting from the n nominal values of the daily data per swab, we generated n Gaussian distributions with 25000 outcomes, with mean equal to the n nominal values and with 20% standard deviation. Then, for each of the 25000 simulations, those n values (corresponding to the daily cases per swab of n days) are fitted with a three parameter function of the type of a Planck function (see section 2) and we then determine the date at which the number of daily positive cases will be less than a certain threshold that, for example, we choose to be equal to the minimum measured ratio of cases per swab (in the range February 25 to April 30), equal to 0.02163, one fifth of it and one tenth of it.
Finally, we calculate the mean and the standard deviation of the 25000 simulations for these three cases, the results are reported in Table 5.
In Fig. 3, is reported the histogram of the frequencies versus the day of a substantial reduction in the number of daily cases per swab corresponding to less than 0.004326, i.e., to less than one fifth of the minimum value of cases per swab measured during the period February 25 to April 30. The histogram approaches a Gaussian with mean equal to day 89, approximately corresponding to what reported in Fig. 1. The standard deviation is approximately 3 days.
3.1 Analysis of each region of Italy
We independently analyze each of the twenty Italian regions from February 25, 2020 (included) until April 25, 2020 (included). As in the previous sections also here we consider the ratio of daily cases over the number of daily swabs. Indeed, the number of daily swabs and other relevant conditions vary quite differently from one region to the other so that we can get an indication of the spread between different regions. To calculate such spread, we evaluate the date of the reduction of the daily cases per swab in each region below a certain threshold which is chosen to be the minimum value of cases per swab in each region in the range under study (25 February-25 April). We then fit the daily ratios of each region using a function of the type of the Planck’s law with three parameters and finally, for each region, we obtain the date at which the number of daily cases per swab reduces below the given threshold for each region. In Fig. 4 we report the 20 dates corresponding to the 20 regions. The calculation of the spread of the 20 dates provides a 1-sigma standard deviation of 11.4 days.
This result may also be displayed using the Central Limit Theorem. We choose a sample of 30 regions, with repetition, out of the 20 regions, for 1200000 times. We then calculate the mean of each sample and we report the frequencies of the mean of the 1200000 samples in the histogram of Fig. 5. We obtain a Gaussian with a standard deviation of 2.03 days, that, multiplied for the square root of 30, gives back a standard deviation of approximately 11.4 days as shown in Fig. 4. This represents the fact that each region has quite different conditions for what regards the intensity and phase of the pandemic, the number of daily swabs and other factors.
4. Modelling the daily swabs
Since the actual number of positive cases is much higher than the measured ones6, by increasing the number of daily swabs (which depends on factors that are difficult to predict, such as the daily availability of reagents and specialized personnel), the number of diagnosed positive daily cases would also increase. Therefore, since the number of daily swabs is significantly changing in time, the ratio of daily positive cases per swab is more meaningful than the number of daily cases. Nevertheless, in this section we consider five possible cases for the daily number of swabs corresponding to some relevant situations. We also tried to model the number of daily swabs using a Gaussian distribution, a distribution of the type of the Planck’s law and a linear monotonic increasing distribution, however for the sake of brevity we do not report this part here. The five cases we consider are: 9000, i.e., about the number of daily swabs equal to the mean of daily swabs between February 15 and March 26 (the date of our first analysis1,2); 24000, i.e., about the mean between February 15 and April 25; 45000, i.e., about the mean between March 26 and April 25; 67000, i.e., about the maximum number of swabs up to April 25 and 100000, i.e., an estimated possible upper bound to the number of daily swabs. In Table 6 are reported those five cases for distributions of the type of Planck’s law and the Gamma one, i.e., the best distributions for what regards the prediction capability as reported in Tables 1–3, and for a Gauss distribution as a lower bound for the predicted dates of a substantial reduction in the number of daily cases. In the case of the Planck-type distribution, we have also obtained the result of Table 6, using a Monte Carlo simulation with 25000 runs, as described in the previous section 3, with the same mean (corresponding to May 23) with a standard deviation of 2.25.
In our previous work1,2, a substantial reduction of the daily cases was chosen to occur when the number of cases reduced to about 100. However, since the number of daily swabs is significantly changing and thus the number of cases increases as the number of swabs, we normalize 100 by multiplying it by the ratio: “number of swabs divided by 9000”. The number 9000, as mentioned earlier, is approximately the average number of swabs up to March 26. Table 6 shows the dates of a substantial reduction of positive cases predicted with the three different distributions. The date of a substantial reduction of cases in the population should be independent of the number of swabs since it describes the real evolution of the pandemic at a certain stage. For this reason we introduce the normalization just described. In other words, the different numbers of daily cases reported in the first column of Table 6 represent the same actual condition of the pandemic independently on the number of daily swabs.
Since the number of daily swabs was increasing after March 26 by a factor of about five with respect to our previous analysis, a substantial decrease in the number of daily positive cases is reached later with respect to our previous mathematical prediction1,2. As mentioned at the beginning of this section, a better indication of the evolution of the pandemic is provided by the behavior of positive cases per swab as reported in Figs. 1 and 2 and in section 3. Indeed. the maximum value of the number of positive cases per swab was 0.462 reached on March 9, whereas the minimum measured value (in the range February 25 to April 30) was equal to 0.02163. According to our previous work1,2, the pandemic should have significantly reduced during the end of April. Indeed, on May 1, the ratio of daily cases per swab was 0.0265 (i.e., nearly its measured value at the beginning of the pandemic in Italy) and on May 5 the ratio was even smaller, reaching the historical minimum of 0.0195.
In Fig. 6 we report a 3D representation of the number of daily positive cases as a function of the number of daily swabs, from 0 to 100000, and of the day, from February 25, using a distribution of the type of Planck’s law.
5. Conclusions
Since the number of daily swabs was in Italy highly increasing from March 26 to April 25, we fit the ratio of the positive cases to daily swabs using several functions, including the Gaussian, Weibull, Lognormal, Beta and Gamma distributions and a function of the type of the Planck’s law, incidentally this last one fits well the number of daily positive cases in China too. Indeed, using the best fitting distribution, i.e., the Planck-type distribution with three parameters, and depending on the chosen threshold for the daily cases per swab (historical minimum from February 25 to April 30, or one fifth of it, or one tenth of it), we found that a substantial reduction in the daily cases per swab is between May 2 and May 31.
Furthermore, considering that is practically impossible to predict of the evolution of the number of daily swabs, we consider five possible relevant cases for the number of daily swabs. By considering these five cases and the Gauss, Gamma and Planck-type distributions, the range of a substantial decrease in the number of daily positive cases goes from April 26 (in agreement with our previous findings) to May 15, depending on the chosen distribution. To characterize a substantial decrease of the pandemic, we have assumed a threshold of 100 cases per day when an average of 9000 swabs per day are used. However, if for instance, the number of daily swabs is instead 67000 (the maximum number of swabs per day so far reached), the corresponding indication of that substantial decrease in the pandemic is given by a much higher number of cases, i.e., 744. Indeed, since the actual number of positive cases is much higher than the measured ones, by increasing the number of daily swabs, the number of diagnosed positive daily cases would also increase.
To estimate the random uncertainty in the dates of a substantial reduction of the pandemic in Italy, we used a Monte Carlo simulation, with 25000 runs, which provides a 1-sigma random uncertainty of approximately three days (calculated for a threshold of one fifth and one tenth of its historical minimum between February 25 and April 25). In addition, we also estimated the spread in a substantial reduction below a certain threshold of the daily cases per swab among the Italian regions.
Data Availability
All data referred in the manuscript are publicily available. The sources are provided in the References of the paper
Acknowledgments
We gratefully thank Richard Matzner (University of Texas at Austin) for helpful suggestions, Alessandro Paolozzi, and Claudio Paris (Centro Fermi).