Abstract
Results of a comparative statistical analysis of the daily data associated with New coronavirus COVID-19 infection of confirmed cases (Č) of the population in Georgia (GEO), Armenia (ARM), Azerbaijan (AZE), Turkey (TUR) and Russia (RUS) amid a global pandemic (WLD) in the period from March 14 to July 31, 2020 are presented.
The analysis of data is carried out with the use of the standard statistical analysis methods of random events and methods of mathematical statistics for the non-accidental time-series of observations.
In particular, a correlation and autocorrelation analysis of the observational data was carried out, the periodicity in the time- series of Č were revealed, the calculation of the interval prediction values of Č taking into account the periodicity in the time-series of observations from August 1 to 31, 2020 (ARM, AZE) and from August 1 to September 11, 2020 (WLD, GEO, TUR, RUS) were carried out.
Comparison of real and calculated predictions data on Č in the study sites from August 1 to August 31, 2020 is carried out. It was found that daily, monthly and mean weekly real values of Č for all the studied locations practically fall into the 99% confidence interval of the predicted values of Č for the specified time period.
A dangerous situation with the spread of coronavirus infection may arise when the mean weekly values of Č of the 99% upper level of the forecast confidence interval are exceeded within 1–2 weeks. Favorable – when the mean weekly values of Č decrease below 99% of the lower level of the forecast confidence interval.
Introduction
The New coronavirus (COVID-19) epidemic began in China at the end of 2019 and spread rapidly around the world. On March 11, 2020, a pandemic of this virus was recognized [1].
Despite the taken measures, in general, the growth of morbidity and mortality continues. Traditionally, the main burden in the fight against the pandemic falls on the doctors-epidemiologists. Together with them, scientists and specialists of various disciplines from different countries of the world joined in intensive research of an unprecedented phenomenon (including Georgia and neighboring countries - Armenia, Azerbaijan, Turkey, Russia [2-5]).
For example, experts from the field of physical and mathematical sciences together with doctors make an important contribution to research on the spread of the new coronavirus COVID-19. Despite the fact that scientists are dealing with extremely unstable time series of observations, depending on many factors (medical, methodology for identifying infected, natural, economic, social, political, emotional, religious, mental, etc.), which greatly complicates their systematization and forecasting, they proposed a variety of mathematical and statistical spatial-temporary models of the spread of the new coronavirus [6-17].
Georgia takes the fight against coronavirus seriously, closely monitoring the trends of the pandemic’s variability in the world and especially in its neighboring countries (Armenia, Azerbaijan, Turkey, Russia). In particular, in the paper [5] the preliminary results of a comparative statistical analysis of the mean weekly data associated with new coronavirus infection of confirmed, recovered and fatal cases in the population of the above-mentioned countries amid a global pandemic in the period from March 14 to July 31, 2020 are presented.
This work is a continuation of the above research [5]. This paper presents a detailed statistical analysis of data on daily values of confirmed cases of coronavirus infection in Georgia and neighboring countries in the context of a global pandemic, and also considers the possibility of using one of the simple statistical models to predict the variability of this infection, taking into account the periodicity of its changeability.
We used standard methods of statistical analysis of random events and methods of mathematical statistics for non-random time series of observations.
Study areas, Material and Methods
The study areas: World (WLD), Georgia (GEO), Armenia (ARM), Azerbaijan (AZE), Turkey (TUR) and Russia (RUS).
Data of John Hopkins COVID-19 Time Series Historical Data (with US State and County data) [https://www.soothsawyer.com/john-hopkins-time-series-data-with-us-state-and-county-city-detail-historical/] about daily values of confirmed coronavirus-related cases, from March 14 to August 21, 2020 are used.
In the proposed work the analysis of data is carried out with the use of the standard statistical analysis methods of random events and methods of mathematical statistics for the non-accidental time-series of observations [18-19].
The following designations will be used below: Mean – average values; Min – minimal values; Max – maximal values; Range – Max-Min; St Dev – standard deviation; σm – standard error; CV = 100·St Dev/Mean – coefficient of variation, %; Median – median value; R2– coefficient of determination; R – coefficient of linear correlation; RK – Kendall rank correlation coefficient; RS – Spearman rank correlation coefficient; Ra – coefficient of autocorrelation, Lag = 1 Day; KDW – Durbin-Watson statistic; Res – residual components (RC); Calc – calculated data; Real – measured data from March 14 to July 31 for statistical analysis (twenty weeks, 140 days); Real_1 – measured data for predictions calculation (84 or 42 days); Real_2 – measured data for comparison with calculated predictions data (from August 1 to August 31, 2020); 99%(+/−) – 99% confidence interval of average (the calculations were carried out without taking into account autocorrelation in the time-series of observations); 99%_Low – 99% lower level of confidence interval; 99%_Upp – 99% upper level of confidence interval; Δ = 100· (1-Parameter/Real_2),% - relative difference between measured and calculated values, (forecast relative error, %); α – the level of significance; Daily Values of Confirmed Coronavirus-Related Cases – Č.
The statistical programs Data Fit 7, Mesosaur and Excel 16 were used for calculations.
The curve of trend is equation of the regression of the connection of the investigated parameter with the time at the significant value of the determination coefficient and such values of KDW, where the residual values are accidental. If the residual values are not accidental the connection of the investigated parameter with the time we will consider as simply regression.
The calculation of the interval prognostic values of Č taking into account the periodicity in the time-series of observations was carried out using Excel 16.
The following rule of thumb for interpreting the size of a correlation coefficient (CC) is used [20]: 0 ≤ CC < 0.3 – Negligible correlation, 0.3 ≤ CC < 0.5 – Low correlation, 0.5 ≤ CC < 0.7 – Moderate correlation, 0.7 ≤ CC < 0.9 – High correlation, 0.9 ≤ CC ≤ 1.0 – Very high correlation.
Results and Discussion
The results in the Fig. 1-8 and Table 1-7 are presented.
In Fig. 1 and Table 1 data about statistical characteristics of daily values of confirmed corona-virusrelated cases, from March 14 to July 31, 2020 in Georgia and neighboring countries (Armenia, Azerbaijan, Turkey, Russia) amid a global pandemic are presented.
In particular, analysis of this table reveals the following:
Over the entire observation period, the largest number of infected per 1 million populations was on average observed in Armenia (93.1, range of change 2.0–260.7), the smallest – in Georgia (2.2, range of change 0–11.3). In the World, these values are respectively 16.2, 1.4 and 43.8.
The largest variations in Č values are observed in Azerbaijan (CV = 86.8%), the smallest in Russia (CV = 55.8%). Worldwide CV = 54.9%.
Dynamics of time variability of the values of Č is as follows:
World. Continuous growth, high (R, RS) and very high (RK) positive correlation with time.
Georgia. Small variability, negligible (R, RK, RS) negative correlation with time.
Armenia. On the whole moderate (RK) and high (R, RS) positive correlation with time.
Azerbaijan. Generally, very high (RS) and high (R, RS) positive correlation with time.
Turkey. Low (R, RK, RS) negative correlation with time.
Russia. Moderate (R), low (RS) and negligible (RK) positive correlation with time.
Changeability of the real daily values of new coronavirus-related confirmed cases from March 14 to July 31, 2020 in the study sites.
The time dependence of measured values of Č have fairly complicated behavior (Table 2, Fig. 2).
World – fourth order polynomial; Georgia and Armenia - seven order polynomial; Azerbaijan - six order polynomial; Turkey – tenth order polynomial; Russia - Č = exp(a+b/X+c·log(x)), where x are days.
At the same time, autocorrelation of residual components of Č for World, Georgia and Armenia is absent, and for Azerbaijan, Turkey and Russia it is positive. That is, for the first three locations, the calculated curve of dependence Č from time can be considered a trend, for the second three locations – a regression.
Changeability of the calculated daily values of new coronavirus-related confirmed cases from March 14 to July 31, 2020 in the study sites.
The smoothed out curves of variability of calculated values of Č have the following features (Fig. 2).
World. Continuous growth, min on 3/14/2020 (0.9 infected per 1 mln population), max on 7/31/2020 (34.7 infected per 1 mln population).
Georgia. Small variability in time with two periodic bursts: 4/12/2020–4/16/2020 (4.3 infected per 1 mln population) and 7/23/2020–7/26/2020 (3.1 infected per 1 mln population)
Armenia. Unimodal distribution with right skewness. The peak falls on 6/23/2020 of the study period (197.8 infected per 1 mln population). Then – decrease.
Azerbaijan. Growth up to 7/4/2020–7/5/2020 of the studied observation period (53.7 infected per 1 mln population), then – decrease.
Turkey. Unimodal distribution with left asymmetry. The peak falls on 4/13/2020 of the studied observation period (51.6 infected per 1 mln population). Then – a more or less monotonous decline with a slight surges.
Russia. A vaguely pronounced peak on 5/19/2020 (66.7 infected per 1 mln population) with a further monotonous decrease.
Thus, the nature of the temporal variability of Č values for the studied locations differs significantly from each other. This indicates that the spread of coronavirus infection in these countries during the study period was isolated in nature, associated with the closure of borders and the practically cessation of the movement of people from country to country.
This fact is clearly demonstrated by the data in Table 3. As follows from this table, there is practically no linear correlation between the studied sites by random components of Č values (with the exception of the Armenia-World pair, – Low correlation). The linear correlation between the investigated sites in terms of real Č values basically is significant. However, this correlation is most likely indirect in nature and indicates a similarity or difference in the development trends of a new coronavirus infection in these countries, which was noted in [5]. Otherwise, there would be a correlation for the random components of Č values.
The degree of expandability of coronavirus infection can be judged using autocorrelation analysis of a time-series of observations of Č. As follows from Table 4, in all studied locations (with the exception of Georgia), there is a fairly strong autocorrelation in the Č time-series (connectivity in the series, the influence of previous events on subsequent events).
Note, that Ra values close to 1 in the first lag could determine the presence of a positive autocorrelation of the residual components of the Č time-series in Azerbaijan, Turkey, and Russia (Table 2).
In table 4 data on values of peaks of periodicity of the C in the study sites are presented also (World, Armenia – 7 days; Georgia – 14 days; Azerbaijan – 28 days; Turkey, Russia – 20 days).
As follows from Fig. 1,2 and Table 2 in the period from 3/14/2020 to 31/7/2020 changeability in time of the daily values of new coronavirus-related confirmed cases in all locations have a nonlinear nature. Therefore, for the interval forecast of the Č values, we selected time-series with a linear tendency of variability of the daily values of new coronavirus-related confirmed cases after their peak values. The forecast period is half of the previous time-series of the Č for World, Georgia, Turkey and Russia, and about 75% – for Armenia and Azerbaijan (Table 5).
In Fig. 3-8 the results of calculating the forecast of the values of Č for the studied locations (taking into account on the periodicity in the time-series of observations) and their 99% of the upper and lower confidence intervals from August 1 to 31, 2020 are shown. For comparison with the calculated values, the same figures show data on the real values of Č.
Six-week prediction of the daily values of new coronavirus-related confirmed cases for world
Six-week prediction of the daily values of new coronavirus-related confirmed cases for Georgia
Monthly prediction of the daily values of new coronavirus-related confirmed cases for Armenia
Monthly prediction of the daily values of new coronavirus-related confirmed cases for Azerbaijan
Six-week prediction of the daily values of new coronavirus-related confirmed cases for Turkey
Six-week prediction of the daily values of new coronavirus-related confirmed cases for Russia
As follows from Fig. 3-8 and tables 6,7 daily, monthly and average weekly real values of Č for all the studied locations practically fall into the 99% confidence interval of the predicted values of Č for the specified time period.
Tables 6 and 7 also provide data on the coincidence of real and forecasted monthly and weekly average values of Č.
The best match between the calculated and real monthly mean values of Č (values of Δ) is as follows.
World: 5% with 99%_Low level; Georgia: 6% with calculated values within the confidence interval; Armenia: 21% with calculated values within the confidence interval; Azerbaijan: 57% with 99%_Low level; Turkey: 18% with calculated values within the confidence interval; Russia: 8% with calculated values within the confidence interval (Table 6).
The best match between the calculated and real weekly mean values of Č Č (range of absolute values of Δ) is as follows.
World: from 8 to 1 % with 99%_Low level; Georgia: from 15 to 26 % with calculated values within the confidence interval; Armenia: from 9 to 51 % with calculated values within the confidence interval; Azerbaijan: from 6 to 19 % with 99%_Low level and from 80 to 49 % with calculated values within the confidence interval; Turkey: from 1 to 28 % with calculated values within the confidence interval; Russia: from 0 to 16 % with calculated values within the confidence interval (Table 7).
It should be noted that the period of the best forecast of the weekly mean values of Č within the confidence interval is different for different locations. World: 2 weeks (abs Δ = 10–13%); Georgia: 2 weeks (abs Δ = 25–15%); Armenia: 2 weeks (abs Δ = 9–12%); Azerbaijan: – low accuracy within the confidence interval; Turkey: 3 weeks (abs Δ = 1–20%); Russia: 4 weeks (abs Δ = 0–16%).
It should also be noted that, in our opinion, a dangerous situation with the spread of coronavirus infection may arise when the mean weekly values of Č of the 99% upper level of the forecast confidence interval are exceeded within 1–2 weeks. Favorable – when the mean weekly values of Č decrease below 99% of the lower level of the forecast confidence interval.
It also should be paid attention to the increase or decrease in the degree of autocorrelation in the time-series of observations.
Conclusion
It is planned to continue similar studies for Georgia and neighboring countries in the future, in particular, clarifying data on the periodicity of processes, checking the temporal representativeness of short-term interval statistical forecasting of the new coronavirus infection COVID-19, taking into account the data on periodicity, etc.
Data Availability
https://www.soothsawyer.com/john-hopkins-time-series-data-with-us-state-and-county-city-detail-historical/
Source(s) of support
Nil.
Conflicting Interest
No.