ABSTRACT
This work focuses on a time series analysis and forecast of COVID 19. Decision makers and medical providers will find the work useful in improving cares to the disadvantaged demography, reduce the spread of the coronavirus and improve mitigation strategy to combat the impact of the disease. Our anatomy of COVID cases spans March 2020 to December 2020. COVID 19 forecasting cases and deaths models were built for the total population and blacks in eight states in the USA. State with medium to large populations of blacks were considered for the experiment. We defined COVID-19 Health Care Disparity (CHCD) as the difference between the percentage of Black Cases to Total Cases and Black Deaths to Total Deaths within a period. We hypothesized that a disparity exists if the ratio of black cases to the total COVID-19 population in a state, is less than the ratio of black deaths to the total deaths in the same state. The outcome of our experiment shows that there exists COVID-19 Health Care Disparity in the black community. Furthermore, all things being equal, our forecast suggests that the COVID-19 Health Care Disparities will continue at least till the end of the first quarter of 2021.
I. INTRODUCTION
As of December 28, 2020, the coronavirus pandemic has more than 80 million globally confirmed cases with a fatality of more than 1.7 million. The United states tops the list of confirmed cases with more than 19 million cases. India, Brazil and Russia are at a distant second, third and fourth with records of approximately 10.2 million, 7.48 million and 3.047 million respectively. On the fatalities count, the United States tops the list with a record of 333 129. Next to the United States are Brazil, India and Mexico with a loss of 191 thousand, 148 thousand and 122 thousand lives respectively. South Africa tops the list in Africa with 1.012 million confirmed cases and the fatality of 27 thousand [1]. The spread of the coronavirus has been controlled in parts with measures such as border closing, travel restriction and airport screening. However, research shows that these measures only reduce the spread of the disease without a far-reaching effect on its impact [2]. To improve mitigation strategy, most governments encouraged handwashing [3], social distancing [4] and mask covering [5]. We argue that, health care disparity is a major setback to an effective universal strategy in combatting the pandemic. Therefore, forecasting COVID-19 health care disparity is the major focus of this study.
In the United States, preliminary statistical data shows that the black population have been disproportionally affected by the pandemic [6]. Time series analysis points to the fact that the disease has a devastating, disruptive, and damaging impact on the black community [7]. Since the world woke up to the emergence of this diseases in the late 2019 [8], fatalities and mortalities rates have been on an upward trajectory among this community [9]. Counties and zip codes with large population of blacks have become synonymous with high COVID-19 fatalities. For example, blacks in Prince Georges, Montgomery and Baltimore counties are 32%, 20.1% and 30.3% respectively of the total population of the state of Maryland in the United. As of January 7, 2021, these three counties are first, second and third respectively in the number of coronavirus cases in the state. On the fatalities count, Montgomery county displaced Prince Georges county for the top spot. Prince Georges county came second while Baltimore county retained the third position [10].
Scientific evidence has connected the disproportionality of coronavirus cases in the black community to the poor health care and social economic disadvantage of the community [12]. Before the outbreak of the COVID-19 pandemic, researchers found that the life expectancy of the blacks are lower than other ethnic groups. Study shows that the average life expectancy of white women and black women are 81.0 years and 78.1 years respectively. Furthermore, average life expectancy of white male to the black make are found to be 76.1 year and 71.5 years respectively [13].
The disparity in the life expectancy of blacks when compared with the whites are not surprising. This is because several studies have shown that African Americans have been bearing the burden of most diseases in the US. For example, research on diabetics shows that blacks have more than 60% chance of getting the disease than the whites [14]. They also have 42% chance of being a new victim of HIV [15]. Furthermore, heart attacks data shows that the population are 20% more susceptible than the white [16]. The same narrative has been found to be true for obesity [17] and asthma [18]. Several studies have also shown that African Americans are more likely to die prematurely from any diseases than whites [19].
The prevalent poor health care system coupled with long standing pre-existing conditions have been found to have a drastic, significant and far-reaching impact on the mortality and fatalities rates of African American COVID-19 patients [20]. Patients with diseases such as hypertension, diabetes, congestive heart failure, chronic kidney disease and cancer have been found to have a higher mortality and fatality rates when contracted COVID-19 [21]. Ferdinand and Nasser argued that the prevalent cardiovascular disease (CVD) among African Americans which directly links with a poor health care condition of the community is to be blamed for the disproportionality in the coronavirus cases in the African American and other minority communities [22].
Socio-economic factors have also been found to contribute to the disproportionality in the coronavirus cases in the African American community. Studies have shown that African Americans and whites have a poverty rate of 22% and 9% respectively [23]. Furthermore, the median household income of the whites is found to be 10 times that of the blacks [24]. During the pandemic, only 20% of African American population could work from home as compared with the 30% of whites. Furthermore, 34% and 14% of African Americans and whites are likely to use public transportations respectively [25]. Unhealthy diet [26] and population density [27] are other contributing factors.
Since the outbreak of the coronavirus pandemic, there have been different studies on COVID-19. However, there seems to be a gap in literatures on a time series analysis specifically forecasting COVID-19 healthcare disparity and disproportionality in the African American community. The vulnerability of the community to coronavirus pandemic has been a subject of different studies [28]. We believe that all stakeholders should have an answer to the question; is there COVID-19 healthcare disparity in the African American community? If the answer is yes, the next question is; how do we forecast COVID-19 health care disparity? Such an answer will help in improving the quality of cares to the affected population. Therefore, using scientific principle, the goal of this study is to analyze COVID 19 datasets from April 12, 2020 to December 25, 2020, understand hidden patterns and discover knowledge. The tsunamic eruption of the coronavirus pandemic seems to send the same panicking waves over the world, but data suggests that its tornedo disproportionally torments different demographics [29].
The paper is organized as follows; COVID-19 times series analysis, COVID-19 healthcare disparity and results are discussed in sections 2, 3, and 4 respectively. We conclude our study in section 5 with the implication of the study in section 6.
2. COVID-19 TIME SERIES ANALYSIS
We began our investigation by exploring the dataset to see underlying patterns, trends and seasonalities at different time lines. Dataset for this experiment was obtained from The COVID Racial Data Tracker. The repository is a collabrotion project between the COVID Tracking Project and Boston University Center for Antiracist Research [30]. COVID 19 datasets for the states of FL, GA, MD, MS, NC, PA, SC and VA were extracted. Data cleansing was done at the preprocessing stage to make it suitable for analysis.
We ploted area graphs to show the visual representation of the underlining patterns of the dataset. Graphical representation was shown for the total cases and deaths in each of the states. Cases and deaths of blacks were also plotted for each state. Research has shown that graphical data visualization has a direct impact on the effectiveness of data analysis. Thus visual methodology of data exploration and representation has a unique way of making information noticeable, salient and memorable [31]. Most humans are visual learners. An area graph combines the attributes of line and bar charts. In our analysis, states are represented with shaded areas staked on the top of one another. This arrangement shows how the impact of the virus changes over time in each of the states. Our graph is broken down further into yearly quarters. At the end of each quarter, we show the state of the virus in each of the states. Figures 1, 2, 3 and 4 show the area graphs of total cases, black cases, total death and black deaths respectively.
As shown in figure 1, at the end of each quarter, many states have seen their share of the affliction of COVID-19 followed different trajectories. For example, at the end of the first quarter, FL was approximately one hundred and forty thousand total cases. However at the end of the second quarter, cases at FL has skyrocketed to over seven hundred thousand. This is a five time increment. As shown in the graph, the virus timeline was divided into 3 quarters. This is because the pandemic impacted the United States sometimes in March 2020 [32]. Although it had began in China in 2019 [33] and the World Health Organization declared it a pandemic in March 11, 2020 [34].
The trajectory of the graph shows that the first quarter of COVID-19 started at the begining of April. Therefore, April to June is COVID-19 first quarter. As shown in the graphs, at the first quarter, the virus had a little impact in all the states. The major impact of the virus became very abovious at the second quarter (July to September). On the other hand, states like MD closed the first quarter with sixty six thousand cases, while it closed with one hundred and twenty four thousand in the third quarter. This is a little below two times increment. Compared with FL, it seems MD was more effective at reducing the spread of the disease.
The virus slowed down at the early second quarter but gained momentum towards the end of the quarter. The brief period when the virus slowed down might be the beginning of the summer period in the US. The slow pace of the virus during this period alludes to the fact that temperature might have played a role. However, a further looks at the graph shows that the high temperature assumption of the summer period did not last. By August, FL was on fire, the virus seemed to have a found an abode in the Sunshine state.
As the total number of COVID-19 cases were rising in each of these states, fatalitity rate was on increase. Figure 3 shows that at the end of the first quarter, PA was hardest hit with coronavirus deaths. The state recorded approximatley 6,500 deaths. COVID-19 victims at NC was a liitle above 1,000, FL was at 3,400. However at end of the third quarter, FL took the lead with more than 14,000 deaths (this is around 300% increase). It seems PA found a way to reduce the rate at which COVID-19 was taking its victims; its death count increased to only 8,000 (this is around 23% increase). Except for GA which was close to FL and PA in high death counts, other states closed the second quarters with approximately 3,000 COVID-19 victims. By December 16, FL, PA and GA lead the death tolls and recorded more than 20 thousands, 13 thousands and 10 thousands deaths respectively. Other states were at around 4 000 and 5 000.
3. COVID-19 HEALTH CARE DISPARITY (CHCD) AND AFRICAN AMERICAN COMMUNITY
The time series analysis shows the underlying patterns of the waves of COVID-19 in the states under investigation. Figures 1 to 4 suggest that there is a disproportionality in the percentage of blacks who contracted COVID-19 to the percentage who died of it. For example, as of December 13, 2020, in FL, the total cases and black cases were 1,155,335 and 146,128 respectively-this is a 12.65% of black cases to the total cases. However, the total COVID-19 deaths and black deaths were 20,490 and 3,461 respectively-which is a 16.89% of black deaths to the total deaths. Also, in GA there were total cases and black cases of 488,338 and 132,709 respectively-this is a 27.18% black cases to the total cases. However, there were total deaths and black deaths of 10,228 and 3,567 respectively-this is 34.87% black deaths to the total deaths. In both FL and GA, the results suggest a disproportionality of 4.24% and 7.70% respectively. Table 1 was created for all the eight states in our study.
We define COVID-19 Health Care Disparity (CHCD) as the difference between the percentage of Black Cases to Total Cases and Black Deaths to Total Deaths. The intuition here is that if the result of the former is more than the later, it suggests that there is a disproportionality in COVID-19 impacts.We computed CHCD as of December 13, 2020. Table 1 shows the result.
3.3.4 Experiment
As stated in section 2, states with medium to large populations of blacks in the United States considered include; Florida (FL), Georgia (GA), Maryland (MD), Mississippi (MS), North Carolina (NC), Philadelphia (PA), South Carolina (SC) and Virginia (VA). For each state in our study we will; 1) analyze disparity in COVID-19 fatalities for the total population and blacks, 2) build models with the capability of forecasting COVID-19 total cases and number of blacks who will likely contrast COVID 19, 3) build models to forecast COVID-19 total deaths and black deaths, and 4) compute a COVID 19 Health Care Disparity Table.
We experimented to determine CHCD for the first quarter of 2021 in the states under investigation. As shown in figure 5, dataset went through a pre-processing stage to make it suitable for the experiment. At the design and development stage, forecasting models were designed and developed using Holts and Holts-Winters methodologies. We evaluated the performance of each models at the evaluation stage. Selection of best models was done at the select best models’ stage. Computation of Black/Total Death Ratio (BTDR) and Black/Total Case Ratio (BTCR) was done at the compute BTDR and BTCR respectively. Finally, a CHDR table was created at the CHDR table stage.
3.1 Hypothesis
All things being equal, we made the following hypothesis:
Null Hypothesis H0 - COVID-19 Health Care Disparity will continue during the pandemic.
Alternative Hypothesis HA - COVID-19 Health Care Disparity will NOT continue during the pandemic.
We computed the percentage of the Black COVID deaths and compared it with the total number who died of it. This is the Black/Total Death Ratio (BTDR): Percentage of the blacks COVID cases was also computed and compared to the total number of COVID cases. This is the Black/Total Case Ratio (BTCR); COVID-19 Health Care Disparity (CHCD) is the difference between BTCR and BTDR If the result is positive, it suggests that there is a disparity in the proportion of blacks who contracted COVID to the numbers that survived it. Table 1 suggests that MD, NC and SC are doing worse than other states in the proportion of blacks who survived COVID-19 when compared with the number of those who contracted it. Using equations 1 to 3, we forecasted CHCD for the coronavirus pandemic for the first quarter of 2021.
3.2 Forecasting CHCD
The next stage of our investigation was the design, development, and evaluation of forecasting models. The proposed model will forecast; total cases, total death, black cases and black deaths. Our forecast will be to the end of 2021 first quarter. Exponential smoothing will be used to build our forecasting model. An exponential forecast is a univariate time series forecast methodology. Its unique feature is that forecast is based on exponential decaying average of the weights of past observations. Thus, most recent observations are apportioned more weights than old observations. This approach makes it more reliable and accurate in forecasting wider range of time series than the moving average. This forecasting methodology is grouped into; simple, Holts (double exponential) and Holt-Winter (triple exponential).
3.2.1 Simple Exponential Smoothing
The simple exponential smoothening does not consider trend or seasonality in forecasting. This puts a limit to the effectiveness of its application. In the simplest form, the forecasted value y′t+h|T is equal to the last observed value yT for h = 1, 2, …. Equation 4 can be simplified further to be weighted average of all past observations. We can improve on equation 5 by including decaying weights to past observations. This is the intuition of exponential smoothening: y1, …., yt are t observations. The decay rate is represented as parameter α; where 0 ≤ α ≤ 1. As α moves towards 1, the most past observations are given more weights; making the learning rate to become faster. On the other hand, a value close to 0 reduces the learning rate because more weights are given to the past observations.
3.2.2 Holt (Double Exponential Smoothing)
Holt exponential forecast is an extension of the simple exponential forecast methodology. It includes the trend smoothing parameter β for the trend bt in addition to the decay rate α for the smoothing factor at level lt. This improves the effectiveness and accuracy of its forecasting capability.
Holt Forecasting equation can be defined as; The estimated forecast y′t+h|t consist of the level lt and trend bt for h = 1, 2, …., with t observations.
The level lt can be expressed as; The Trend bt can be also be expressed as;
3.3.4 Holts-Winter (Triple Exponential Smoothing)
There are two variations of Holt-Winters seasonal methods; the additive and multiplicative. Each variant consists of forecast y′t+h|t, level lt, trend bt and seasonal component st equations. lt, bt and st are the level, trend and seasonal components respectively. The smoothing parameters α, β and γ respectively for the lt, bt and st.
For time t with m frequency of seasonality the Holt-Winters additive method is: Level lt, trend bt and seasonal component st can be represented as equations 11, 12 and 13 respectively. The Holt-Winter multiplicative variant can be represented as:
3.4 Evaluation
For each of the states we forecasted the total number of people that may contract COVID-19 as well as the number who will be blacks. Also, for each state, we forecasted the number of those who will die if they contract COVID-19. Furthermore, forecast was done for the blacks who may die of COVID-19. Holt and Holt-Winters were used as our forecasting models. Performance evaluation was done with Mean Absolute Percentage Error (MAPE). The forecasting error et is given as the difference between the estimated value y′t and the actual value yt, pt is the percentage ratio between the error of the model and the actual value. The MAPE is the absolute mean of pt The MAPE has been effectively used in evaluating the accuracy of forecasting models. In predicting infant mortality rate, Purwanto et.al. compared the effectiveness of ARIMA, Neural Network and Linear Regression using MAPE [35].
4 RESULT
The results of our experiment are shown graphically in figures 5 to 12. As shown the graphs are divided into 2020 and 2021. The forecast is for the first quarter of 2021. As proposed in section 3, we experimented with Holts and Holts-Winters exponential smoothing forecast methodologies. For each state in our study, we forecasted for total cases and death. We also forecasted for black cases and deaths. We compared the performance of the two forecasting models in each of the state. Performance evaluation was based on MAPE.
The performance comparison table is shown in table 2. The table shows that in most of the states, Holts-Winters exponential smoothing outperformed the Holts exponential smoothing. This suggests that seasonality is a factor in most of the states.
As shown in table 2, in most of the states, except for a few outliers, the Holt-Winter (Holt_W) models have lower MAPE than the Holt models. Notably in MD, the fact that Holt outperformed Holt-Winters in forecasting deaths suggests that seasonality did not factor into the forecast of deaths in this state. The MAPE of SC on both models are in double digits; suggesting that our models did not completely capture all the underlying time series patterns of the state. A better model is defined as the one with the lower MAPE. Therefore, model selection was based on the performance on the MAPE. Using this approach, we obtained table 3.
As shown in table 3, HW outperformed Holt in most of the cases and deaths. Table 4 shows the forecasted values from the selected models. For example, Holt Winter was selected as the better model for the total cases in FL because it has a lower MAPE as compared with Holt. The forecasted value at FL turned out to be 2, 399, 349. In SC, Holts Winter was also selected as the better model to forecast the total cases with a forecasted value of 509, 004.
As shown in table in table 4, SC is the only state with a negative CHCD result. This result is not surprising because we have found out in table 2 that its forecast may not be accurate. The MAPE for Holts and Holts-Winters models of the state are in the double digits.
5 CONCLUSION
In this study we have analyzed the impact of COVID-19 in the African American. States considered in the United States are FL, GA, MD, MS, NC, PA, SC and VA. These eight states have large populations of African Americans. Time series analysis were used to show the disparity in COVID-19 Health care. Visualization of the trajectory of the coronavirus pandemic was shown using area graphs. For a better understanding of the time series, timelines were shown in months as well as in quarters of 2020. We studied the trajectory of total cases, black cases, total death, and black death. The time frame of our work spans March 13 to December 16, 2020. We computed the COVID-19 Health Care Disparity for the time frame.
Furthermore, we designed, developed and evaluated COVID-19 forecasting models using Holts and Holts-Winters exponential smoothing forecasting methodologies. Forecast was made for the total cases, black cases, total death and black deaths. Using MAPE, we built a model selection table containing the best forecasting results. A forecast table was then built for total cases, black cases, total deaths and black deaths. Finally, we computed the COVID-19 health care disparity for each row.
The results of our experiment suggest that we do not have enough evidence to reject our NULL hypothesis. Therefore, we contend that COVID-19 Health Care Disparity will continue to the end of the first quarter of 2021.
LIMITATION OF STUDY
This study has the following limitations.
Study was conducted in December 2020, before the introduction of vaccine. Rate of vaccination in each state will have a major effect in the accuracy of our model.
Since study was limited to 8 states, CHCD may be different in states that do not have a large population of African Americans
Data Availability
Dataset is available at COVID Racial Data Tracking Project
6 AKNOWLEDGMENTS
This work is funded by the National Science Foundation grant number 2032345.
Footnotes
1timothy.oladunni{at}udc.edu, 2max.denis{at}udc.edu, 3eososanya{at}udc.edu, 4joseph.adesina{at}nwu.ac.za