Abstract
In the present work is used non-linear fitting of the “Gompert” and “Logistic” growth models to the number of total COVID-19 cases from the United States as a country and individually by states. The methodology allowed us to estimate that the maximum limit for the total number of cases of COVID-19 patients such as those registered with the World Health Organization will be approximately one million and one hundred thousand cases to the United States. Up to 04/19/20 the models indicate that United States reached 70% of this maximum number of “total cases” and the United States will reach 95% of this limit by 05/14/2020. The application of the nonlinear fitting of growth curves to the individual data of each American state showed that only 25% of them did not reach, on 04/19/20, the percentage of 59% of the maximum limit of “total cases” and that 17 of the 50 states still will not have reached 95% of that limit on 05/14/20.
Introduction
The first case of SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) infection appeared in the Hubei Province in China on December 31, 2019, as reported by the World Health Organization - WHO (SR1, 2020). According to Chan et al. (2020) in february WHO officially called the disease caused by SARS-CoV-2 of COVID-19 (Coronavirus Dease 2019) and on 11 march the COVID-19 was classified as a Pandemic.
In the United States, the first case of COVID19 was registered on January 10, 2020 and by April 20, 2020, 792,759 total cases, +28,123 new cases and 42,514 deaths had been registered (https://www.worldometers.info/coronavirus/). For the United States according to the “…CDC case counts and death counts include boh confirmed and probable cases and deaths.” (CDC, 2000). Daily statistics on the number of new cases, total cases and deaths have been the main variables for controlling the impact of COVID19 on society and especially on the capacity of the hospital health system to take into account the number of critical cases. The real impact of COVID19 in the world will hardly be known as there are many asymptomatic cases and we have limitations in the ability to carry out tests to confirm this disease, although the United States was the country that performed the most tests 4,026,360 however it reached 12,164 per 1 million people, as data from April 20, 2020 (https://www.worldometers.info/coronavirus/country/us/).
Several researchers have presented models to predict the evolution of the number of cases of infected by COVID19. Bliznashki (2020) applied a Bayesian Logistic Growth model to the COVID-19 data cases in the period between 4th of March and 31th of March and found that the data in this interval have small number of available data points without upper asymptote to can be well fitted by this model, but it will be useful with more actualizated data. Mondal and Ghosh (2020) studied the scenarios of the exponential and sigmoid growth of COVID-19 total cases data for 15 states of India considering a initial exponential growth and a extrapolation with a sigmoid-type profile. Peirlinck et al. (2020) estimated to the United States the nationwide peak of the outbreak on May 10, 2020 with 3 million infections across the United States.
In the present work we propose the use of non-linear adjustment of sigmoidal growth functions together with the total case data to estimate maximum limits for the number of total cases of COVID19 in the United States.
Metodology and Results
In order to predicted the maximum limit to the total cases COVID19 in the states of US was applyed to the data of cummulative cases a non-linear fitting of the Sigmoidal growth curves that are mathematical solutions to the infected cases growing models. According to Teleken et al. (2017):
The growth functions can be grouped into three models main categories: those with a diminishing returns behavior (Brody model), those with sigmoidal shape and a fixed inflection point (Gompertz, Logistic and von Bertalanffy models) and those with a flexible inflection point (Richards model). The Logistic, Gompertz and von Bertalanffy models exhibit inflection points at about 50, 37 and 30% of the mature weight (asymptote), respectively. On the other hand, the Brody model does not exhibit an inflection point.
Sigmoid growth curves have been used to estimate the time evolution of total cases, daily cases and deaths for countries like Iran (Ahmadi et al. 2020) and China, Italy and Germany (Mimkes & Janssen 2020). This work will be carried out the non-linear fitting to the cummulative cases COVID-19 using Gompertz and Logistic models. The matemathical functions used were:
Gompertz Model:
Logistic Model:
The parameter “a” gives the maximum limit to the total cases, the parameter “k” is related to the growth rate and “ti” corresponds to the inflection time coordinate.
The performance of each model was evaluated by Residual Sum Square (RSS) and two additional criteria based on the information theory were applied to compare the goodness of fit of the models (Burnham & Anderson, 2002): the Akaike information criterion (AIC) and the Bayesian information criterion (BIC).
Results and discussion
First was performed non-linear fitting of Gompertz and Logistic models to the data of cummulative cases COVID19 of Hubei (China), South Korea and US, available in the site https://data.humdata.org/dataset/novel-coronavirus-2019-ncov-cases. and was obtain the results shown in Table 1. The Logistic Model give better goodness fit to the Hubei(China), while Gompertz model adjusted better the data from South Korea and United States. In sequence, Table 1 presents the fitting results for the states of the United States, which the data were obtained in the site https://github.com/nytimes/covid-19-data/blob/master/us-states.csv. Although for the US data the best fit model was the Gompertz, considering individual fits for 14 of the 50 american states the best fit model were the logistic. The data considered to fit were up to 04/17/20, and the best fit models have the parameters in bold.
Defined the best model to adjust the total case data, the model predictions were made using the parameters “a” to define the maximum of the total cases and the other empirical parameters and the mathematical functions to make temporal predictions for the data observed on 4/19/20 and the determination of the dates on which 95% and 99% of the maximum total cases will be reached, according to Table 2. Model forecasts were made for Hubei (China) and South Korea, which are already in the stable control phase of COVID19 and the results of the total number of cases in the model adjusted well to the data of total cases observed on April 19, 2020 (comparing the predicted cases with those observed and verifying the low relative error value). The maximum percentage reached according to the model for that date was also estimated, as well as the forecast of dates when according to the model a percentage of 95% and 99%, respectively, of the maximum value of the total cases predicted in the model are reached. For the total case data from the United States, we had the Gompertz model as the best fit, from which we have an estimated maximum for the total MAX_Cases cases of 1,122,702.72. For April 19, 2020, the comparison between the number of cases predicted by the model and those observed results in a relative error of −0.62% and on this day the total number of cases predicted by the model corresponds to a percentage of 67, 19% in relation to the maximum of MAX_Cases total cases. Finally, the model predicts that the United States will reach 95% of the maximum total number of cases on 5/14/20 and 99% on 6/4/20.
Considering the COVID19 total case data from the states (plus Columbia, Guam and Puerto Rico) in the United States, the following results were summarized in Table 2: (a) limit for the maximum total cases for the United States of 1,085,939.78, which is approximately 3% less than the predicted total considering the data for the United States as a whole; (b) Only 25% of the states had a relative error greater than 6%, the average relative error in module was 4.90% for the value of “total cases” on 04/19/20; (c) Only 25% of the states reached less than 59% of the maximum value of “total cases” on 04/19/20; (d) Until 05/14/2020, only 17 of the 50 American states will not have reached 95% of the maximum limit of the “total cases”.
Final considerations
The non-linear fitting of growth functions to the data on the total number of COVID-19 cases is a useful tool to estimate the duration and impact of this pandemic, and may become increasingly accurate as the history of the disease evolves. This study demonstrates the importance of carrying out the COVID-19 tests for understanding the future of the development of this pandemic, only with precise and actualizated statistics will be possible to prepare the health system to the demand.
Data Availability
The data are all available.