Abstract
The international scientific community from different areas of knowledge has made efforts to provide information and methods that contribute to the adoption of the most appropriate measures to curb the spread of the COVID-19 disease. In particular, the data analysis community has been very active in publishing a large number of papers. A good part of them is related to the prediction of epidemic variables (number of cases and deaths) in different time horizons. To solve the problem of the prediction of COVID-19, an important place is occupied by the sigmoidal growth functions, as they have often been used successfully in previous epidemic outbreaks. The objective of this work was to investigate, on a statistical basis, the ability of classical growth functions to model the data from the COVID-19 pandemic. But for that, it was necessary to establish a clear classification of the 5 types of problems that can be faced with data analysis techniques in this specific context and to define a methodology based on quantitative metrics to measure the performance in solving these different types of problems. The basic concept used was that of an epidemic wave consisting of an initial-increasing and a final-decreasing phase. A classification of the COVID-19 waves in 4 types was done based on mining data from all available countries. Thus, it was possible to determine the resolvability of each type of problem depending on the stage of the epidemic wave. The biggest conclusion was the impossibility of solving the long-term forecasting problems (problem 5 – to estimate the total value of an epidemic wave) with data from the first phase only. Using this theoretical-methodological framework, we evaluated, using metrics specifically designed for these types of problems, the performance of 3 classic growth functions: Logistics, Gompertz and Richards (a generalization of the previous two) in 2 types of problems: (1) Description of the trajectory of the epidemic and (2) Prediction of the total numbers of cases and deaths. We used data from 10 countries, 7 of them with more than 100 daily deaths on the peak day. The results show a generalized underperformance of the logistic function in all aspects and place the Gompertz function as the best cost-benefit alternative, as it has performance comparable to the Richards function, but it has one less parameter to be adjusted, in the process of regression of the model to the observed data.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
VF is supported by CAPES (88882.349290/2019-01) and a Flagship grant from the South African Medical Research Council (MRC-RFA-UFSP-01-2013/UKZN-HIVEPI).
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The survey used public data available on the WHO website.
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
we used only public data available on the WHO website