Abstract
Background Understanding the impact of non-pharmaceutical interventions remains a critical epidemiological problem in South Africa that reported the largest number of confirmed COVID-19 cases and deaths from the African continent.
Methods In this study, we applied two existing epidemiological models, an extension of the Susceptible-Infected-Removed model (eSIR) and SAPHIRE, to fit the daily ascertained infected (and removed) cases from March 15 to July 31 in South Africa. To combine the desirable features from the two models, we further extended the eSIR model to an eSEIRD model.
Results Using the eSEIRD model, the COVID-19 transmission dynamics in South Africa was characterized by the estimated basic reproduction number (R0) at 2.10 (95%CI: [2.09,2.10]). The decrease of effective reproduction number with time implied the effectiveness of interventions. The low estimated ascertained rate was found to be 2.17% (95%CI: [2.15%, 2.19%]) in the eSEIRD model. The overall infection fatality ratio (IFR) was estimated as 0.04% (95%CI: [0.02%, 0.06%]) while the reported case fatality ratio was 4.40% (95% CI: [<0.01%, 11.81%]). As of December 31, 2020, the cumulative number of ascertained cases and total infected would reach roughly 801 thousand and 36.9 million according to the long-term forecasting.
Conclusions The dynamics based on our models suggested a decline of COVID-19 infection and that the severeness of the epidemic might be largely mitigated through strict interventions. Besides providing insights on the COVID-19 dynamics in South Africa, we develop powerful forecasting tools that allow incorporating ascertained rate and IFR estimation and inquiring into the effect of intervention measures on COVID-19 spread.
Key Messages
This study delineated the COVID-19 dynamics in South Africa from March 15 to July 31 and confirmed the effectiveness of the main non-pharmaceutical intervention— lockdown, and mandatory wearing of face-mask in public places using epidemiological models;
COVID-19 spread in South Africa was found to be associated with both low ascertained rate and low infection fatality ratio;
According to the long-term forecast, by December 31, 2020, the cumulative number of ascertained cases and total infected would reach roughly 801 thousand and 36.9 million respectively.
1 Introduction
The coronavirus disease 2019 (COVID-19) caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), was first detected in early December 2019 in Wuhan, China. The first case was confirmed in South Africa on March 5, 2020. As of October 20, 2020, there are 1,262,476 confirmed cases (cumulative total) and 28,601 deaths confirmed in Africa1. South Africa remains the ‘epicenter of the outbreak in the African continent’1 with the largest number of confirmed cases (800,872) and deaths (21,803), contributing to 53% of the total confirmed cases and 89% of deaths, while accounting for only 5% of population in Africa as of December 4, 20202. Although, there is no seroprevalence survey result published on the population in South Africa to our knowledge, an antibody survey on 3,000 blood donors in Kenya, a Sub-Saharan African country, estimated 1.6 million people with SARS-CoV-2 antibodies by the end of July 20203, implying the possibility of a large degree of underreporting/undetected cases in Africa, including South Africa. Thus, understanding the key epidemiological constructs for COVID-19 outbreak is paramount for containing the spread of COVID-19 in South Africa, as well as explaining the disparity between seroprevalence estimates and reported number of cases.
1.1 Interventions
With a universal goal to ‘flatten the curve’, a series of non-pharmaceutical interventions were implemented by the government in South Africa, that have been gradually lifted since early May 20204. On March 27, 2020, South Africa adopted a three-week nationwide hard-lockdown (level 5) along with closure of its international borders, which was extended to April 30, 2020. Thereafter, to balance the positive health effects of strict interventions against their economic costs5, South Africa began a gradual and phased recovery of economic activities with the lockdown restriction eased to level 44, allowing inter-provincial travel only for essential services. From June 1, national restrictions were lowered to level 3 allowing for inter-provincial travel and school opening (Table 1). Face-mask wearing was mandatory in public places at all times, with limitations on gatherings, and sale of alcohol and cigarettes were restricted6. Although these interventions implemented at an early stage had a higher potential for pandemic containment, previous studies6–9 reported a consistently large value for the estimated basic reproduction number (R0) ranging from 2.2 to 3.2 in South Africa by models trained with data in relatively early time windows. Using data observed under various intervention scenarios over a longer period of time, we carry out a thorough investigation to assess the current COVID-19 spread and the effect of these interventions, which will provide valuable insights into the transition dynamics of COVID-19 and intervention deployment in South Africa, and beyond.
1.2 Unascertained cases and deaths
Based on the clinical characteristics of COVID-19, a majority of patients are symptomatic (roughly 84% according to a recent study10), most of whom have mild symptoms11 and tend to not seek testing and medical care. While private hospitals have reached maximum capacity, public and field hospitals beds have still some margin left with additional challenges due to scarcity of staff12. Several recent studies13–15 reported that a nonnegligible proportion of unascertained cases contributed to the quick spreading of COVID-197. It is suggested that only 1 in 4 mildly ill cases would be detected in South Africa16. The relatively lower testing rate in South Africa (Table 1; Figure 1(b)) coupled with a very high positive rate of testing especially in July and August17, suggests inadequacy of testing, as well as the possibility of a large unobserved number of unascertained cases18. The WHO situation report dated October 20 reports an addition of 429 retrospective deaths over just 7 days from mortality audits in South Africa further questioning the reliability of COVID-19 mortality data19. Thus, modeling both ascertained and unascertained cases and deaths can measure infection fatality ratios (IFRs, the proportion of deaths among all infected individuals20) of COVID-19, leading to a better understanding of the clinical severity of the disease.
1.3 Epidemiological models
The Susceptible-Infectious-Removed or SIR model21 is arguably the most commonly used epidemiological models for modeling the trajectory of an infectious disease. A recent extension of SIR, called extended-SIR or eSIR22, was developed to incorporate user-specified non-pharmaceutical interventions and quarantine protocols into a Bayesian hierarchical Beta-Dirichlet state-space model, which was successfully applied to model COVID-19 dynamics in India23. One major advantage of this Bayesian hierarchical structure is that uncertainty associated with all parameters and functions of parameters can be calculated from posterior draws without relying on large-sample approximations23. Extending the simple compartment structure in eSIR model, the SAPHIRE model24, delineated the full transmission COVID-19 dynamics in Wuhan, China with additional compartments by introducing unobserved categories13. In this article, we extended the eSIR approach to the eSEIRD model to combine the advantages of the two existing models, using a Bayesian hierarchical structure to introduce additional unobserved compartments and characterize uncertainty in critical epidemiological parameters including basic reproduction number, ascertained rate and IFR, with input data as observed counts for cases, recoveries and deaths. Furthermore, we applied these three models and compared the results of the eSEIRD model with the eSIR and SAPHIRE model, with the following primary objectives: (i) characterizing the COVID-19 dynamics from March 15 to July 31; (ii) evaluating the effectiveness of the main non-pharmaceutical intervention—lockdown, and mandatory wearing of face-mask in public places; (iii) capturing the uncertainty in estimating the ascertained rate and IFR; and (iv) forecasting the future of COVID-19 spread in South Africa.
2 Methods
2.1 Study Design and Data Source
COVID-19 data for South Africa were extracted from the COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University25 from the onset of the first 50 confirmed case (March 15) to November 29, 2020. We fitted the models using data up to July 31 and predicted the state of COVID-19 infection in South Africa in a short-term window, from August 1 to August 31, and a relatively long-term window up to December 31. To compare the model short-term prediction performance of different models, we used the symmetric mean absolute percentage error (SMAPE), given by: where At is the observed value from August 1 to 31 and Ft is the forecast value in this time period. This design enabled us to select an optimal modeling strategy for South Africa data and check the robustness of prediction performance across different models.
2.2 Statistical Methodology
We considered two existing epidemiological methods, the eSIR and SAPHIRE model, and the extension of eSIR, viz., the eSEIRD model, as described in Section 1.3. The infection transition schematic diagrams for the three models are shown in Figure 2.
Parameter settings
Table S.1.1 summarizes the list of notations and assumptions. We assumed a constant population size (N= 57,779,622) for all models and fixed a few transition parameters below in the SAPHIRE and eSEIRD model. First, we set an equal number of daily inbound and outbound travelers (n), in which n = 4 × 10−4N from March 15 to 25 estimated by the number of international travelers to South Africa in 201826, otherwise n = 0 when border closed, i.e. after March 26. We fixed the transmissibility ratio between unascertained and ascertained cases at α=0.55 assuming lower transmissibility for unascertained cases27, an incubation period of 5.2 days, and a pre-symptomatic infectious period of Dp=2.3 days28,29, implying a latent period of De=2.9 days. The mean of total infectious period was Di+Dp = 5.2 days28, assuming constant infectiousness across the pre-symptomatic and symptomatic phases of ascertained cases30, thus, the mean symptomatic infectious period was Di=2.9 days. We set the period of ascertained cases from reporting to hospitalization Dq=7 days, the same as the median interval from symptom onset to admission reported31,32. The period from being admitted in hospital to discharge or death was assumed as Dh = 8.6 days33.
Choice of Initial states
For the eSIR model, the prior mean for the initial infected/removed proportion was set at the observed infected/removed proportion on March 15, and that for the susceptible proportion was the total number of the population minus the infected and removed proportions22.
For the SAPHIRE model, other than setting prior parameters for initial states, we set the number of initial latent cases E(0) was the sum of those ascertained and unascertained cases with onset during March 15-17 as De=2.9 days13 and the number of initial pre-symptomatic cases P(0) was that from March 18-19 as Dp =2.3 days13. The number of ascertained symptomatic cases I(0) was assumed as the number of observed infected cases on March 15 excluding H(0), R(0) and D(0) (the initial numbers for hospitalized, recovered, and deaths). The initial ascertainment rate (r0) was assumed as 0.10 as reported in literature15,34, implying , and a sensitivity analysis with r0=0.25 was conducted to address weak information for r0 obtained in South Africa and variation of r0 in different scenarios. H(0) was assumed as 50% of the observed ascertained cases on March 9 (by assuming the period from reported to hospitalized was 7 days31,32 at the early stage of the pandemic). In addition, we denoted R(0) as the sum of observed recovered and death cases on March 15. The number of initial susceptible cases S(0) was calculated as the total population (N) minus E(0), P(0), I(0), A(0) and R(0).
In the eSEIRD model, we set the prior mean of initial ascertained, unascertained and hospitalized cases as I(0), A(0) and H(0) discussed above. However, since the latent compartment incorporates the pre-symptomatic cases, the mean of the initial latent cases was set as the sum of those ascertained and unascertained cases with onset during March 15-19 as De + Dp = 5.2 days13.The prior mean of initial recoveries and deaths were fixed as the number of observed recovered and death cases on March 15, respectively. Therefore, the prior mean of initial susceptible compartment was set as the total population excluding the mean of other compartments.
Prior distributions
In the eSIR model, the log-normal priors were used for the removed rate ν and the basic reproduction number R0, in particular ν ∼ LogN(−2.955, 0.910), with E(ν) = 0.082 and SD(ν) = 0.1 22, and with E(R0) = 3.2 and SD(R0) = 123. Flat Gamma priors were used for the scale parameters of the Beta-Dirichlet distributions as follows: ω ∼ Gamma(2,0.0001), λI ∼ Gamma(2,0.0001) and λR ∼ Gamma(2,0.0001) 23. In the eSEIRD model, apart from same prior for , the ascertained rate r ∼ Beta(10,90)35, the priors for IFR for non-hospitalized cases κ1 ∼ Beta(0.03,2.93) and for hospitalized cases κ2 ∼ Beta(0.44,1.76) with mean equal to 0.1% and 20%, respectively33. In addition, to account for the effect of time-varying contact rate on the transmission rate, we set a time-varying contact rate modifier π(t) in the eSIR and eSEIRD model, π(t)= 1 before lockdown, π (t)= 0.75 during strict lockdown and π(t)=0.9016,23,36 after September 20 when the interventions were largely eased. Note that the modifier π(t) is a conjectural quantity and hence must be guided by empirical studies23. Using MCMC sampling method for the eSIR and eSEIRD model, we set the adaptation number to be 104, thinned by 10 draws to reduce autocorrelation, and set a burn-in period of 5 × 104 draws under 1 × 105 iterations for 4 parallel chains.
We fit the SAPHIRE model in four time periods: March 15-March 26, March 27-April 30, May 1-May 31 and June 1-July 31, separated by the change-points of the lockdown strictness level, and denote the ascertained rate and transmission rate in the time periods as r1, r2, r3, r4, β1, β2, β3 and β4. We used r1 ∼ Beta(10,90) and reparameterized r2, r3 and r4 by
Where . We assumed δ1, δ2 and δ3 ∼ N(0,1), and a non-informative prior for all transmission rates β1, β2, β3 and β4 ∼ Unif(0,2), to reflect lack of information about these hyperparameters13. Therefore, β and r were assumed to follow different distributions for these four time periods. Finally, the effective reproduction number was given by . Posterior samples were drawn using the delayed rejection adaptive metropolis algorithm implemented in the R package BayesianTools (version 0.1.7). We set a burn-in period of 105 iterations and continued to run 105 iterations with a sampling step size of 10 iterations.
Methodology implementation details were given in the Supplementary section S.1, with a comparison between the four models in the Supplementary S.2. All analyses were conducted in R (version 4.0.0), and source codes are available at https://github.com/umich-cphds/south_africa_modeling. Posterior mean and corresponding 95% credible interval (95% CI) were reported for the parameters of interests.
3 Results
3.1 Reproduction number and intervention evaluation
The estimated posterior mean of R0 was similar in the eSIR (2.05 (95%CI: [1.81,2.31])) and eSEIRD (2.10 (95%CI: [2.09,2.10])) model and robust when r0 = 0.25 (Table 2). To evaluate the time-varying effect of non-pharmaceutical interventions, we evaluated the effective reproduction number (Re) in different lockdown periods using SAPHIRE model and it demonstrated that Re decreased dramatically from 3.47 (95%CI: [3.32,3.61]) before lockdown to 1.39 (95%CI: [1.36,1.41]) after lockdown implementation though still significantly above 1, suggesting that the effective contact rate decreased 60% in the lockdown time period. When lockdown was eased to a relatively less strict level in the latest two time periods, the Re increased slightly to 1.43 (95%CI: [1.42,1.45]) under lockdown level 4 and 1.58 (95%CI: [1.57,1.58]) under lockdown level 3 (Table 2; Figure 3(e)).
3.2 Short-term and long-term forecasts
We forecasted the total cumulative number of infections, including unascertained cases, in the SAPHIRE model up to August 31 depending on the time-period considered for estimating the trend. The estimated cumulative number of infections was: (a) 24.8 million if the trend of the strict lockdown (level 5) was assumed, (b) 28.2 million with if the trend of the lockdown level 4 was assumed, and (c) 35.2 million if the trend of lockdown level 3 was assumed. All the short-term forecasts in SAPHIRE model were robust under different r0 settings which contradicts the intuition to some degree that different situation in the early stage may lead to different trajectory of pandemic (Table 3). The eSEIRD model also output the predicted total cumulative number of cases which was 32.0 million under r0 = 0.10, or 28.9 million under r0 = 0.25, and the total deaths counts as 22 or 19 thousand when r0 = 0.10 or 0.25, respectively, by August 31 (Table 3). Furthermore, we used the eSEIRD model to forecast the epidemic trajectory for a relatively longer time period, where we found that by December 31, the cumulative number of ascertained cases and total infected would reach roughly 801 thousand and 36.9 million (which is around 60% of the total population in South Africa), respectively. The number of total deaths was forecasted as 28 thousand at the same time.
3.3 Fitting and prediction performance
All the three main models fitted the COVID-19 data in South Africa with high accuracy as the estimated daily new cases were close to the observed numbers (Figure 3(a)-(c)). However, the SAPHIRE model performed best in terms of predicting cumulative infected cases with the smallest SMAPE (1.81% for 15 days and 2.96% for 31 days when r0 = 0.10) while the eSEIRD model had the second smallest SMAPE (4.78% for 15 days and 6.02% for 31 days when r0 = 0.10) (Table 4). Therefore, for selected important time points, the predicted number of cumulative ascertained infected cases for the SAPHIRE and eSEIRD model were closer to the observed numbers compared with the eSIR model(Table 3). The predictive accuracies for the three candidate methods substantiate their credibility in terms of capturing the transmission dynamics for the time-period considered in this study.
3.4 Unascertained cases and deaths
As demonstrated by SAPHIRE modeling results in Figure 3(d), the large number of unascertained and pre-symptomatic cases contributed to the rapid spread of disease. The estimated ascertained rates were very low: 9.53% (95% CI: [8.70%, 10.40%]), 1.85% (95% CI: [1.74%, 1.98%]), 2.21% (95% CI: [2.16%, 2.26%]), and 1.84% (95% CI: [1.82%, 1.86%]) in the four time periods evaluated, respectively (Table 2; Figure 3(f)). Specifically, in the latest three time periods after lockdown, the ascertained rates estimates were almost consistent with time. Similarly, in the eSEIRD model, the estimated ascertained rate was also at a very low level as 2.17% (95%CI: [2.15%, 2.19%]) (Table 2). As of August 31, the overall under-reported factor for the infected cases is estimated as 46 and 54 in the eSEIRD and SAPHIRE model, respectively.
By the eSEIRD model, the overall IFR was estimated as 0.04% (95%CI: [0.02%, 0.06%]) while the observed overall case fatality ratio was estimated as 4.40% (95% CI: [<0.01%, 11.81%]) (Figure 4). Furthermore, the eSEIRD model provided Bayesian estimates for IFR and deaths among hospitalized and non-hospitalized cases. The estimated IFR for the hospitalized cases was 12.06% (95% CI: [11.76%, 12.35%]) which was much higher than that for non-hospitalized cases (less than 0.01%), and these estimates were robust to the choice of initial ascertained rate r0. The under-reporting factor for deaths was estimated very close to 1, suggesting that most deaths occurred in hospitals.
4 Discussion
This modeling study investigates the spread process of COVID-19 in South Africa, ‘the hardest hit country on the African continent’19, considering the unascertained cases and population movement in different time periods at the same time, and evaluating the effect of the intervention strategy employed. Moreover, our study provides powerful methodological tools to estimate the IFR and predict deaths due to COVID-19 by making use of the reported deaths. The SAPHIRE model characterizes the transmission dynamics of COVID-19 in South Africa as follows: it spread rapidly in South Africa before lockdown with a large effective reproduction number comparable to that in the early stage in Wuhan without interventions13. The lockdown intervention and mandatory face-mask wearing in public places employed in South Africa seemed to contain the spread of COVID-19 effectively as the Re decreased dramatically and it increased slightly due to the relaxation of lockdown stringency afterwards. However, the Re was consistently above 1 throughout the whole period analyzed, which implies the interventions failed to dampen the transmission fully, further substantiated by the basic reproduction number estimates in the eSIR and eSEIRD model as well. To stop the pandemic or prevent the resurgence, more strict intervention policies, such as lockdown, mandatory face-mask wearing, are suggested based on these results taking account their potential economic costs at the same time5,37.
The estimated ascertained rate is very low in South Africa compared to that reported for many other countries13,15,35, also implied by the low testing rate and high testing positive rate in South Africa17. As of September 21, the number of total tests conducted is 4.0 million, suggesting that about 7% population were tested17. Furthermore, the estimated ascertained rate is consistent with that in other multiple global epicenters under severe pandemic of COVID-19, such as France, the United States, Italy and Spain in March34. The large number of unascertained cases may contribute significantly to the rapid spread of COVID-1927,38,39. Therefore, even though the spread of COVID-19 is exhibiting an optimistic pattern of decline as indicated by the decay in daily ascertained cases starting at the end of July, with high probability, there is still a large number of active infectious cases as suggested by the low ascertained rate. Considering the unascertained infections, our findings suggest that there are roughly more than 40% of the total population in South Africa infected by July 31 and more than 60% by the end of year 2020. Our long-term forecasts for November 1 are much lower but closer to the observed numbers, compared to the long-term projection in NICD report in May16, which also used a stochastic compartmental transmission model with a generalized SEIR structure accounting for disease severity and the treatment pathway, fitting early-stage data up to April 3016. For instance, as of November 1, the NICD report projected an estimated 3.4-3.7 million laboratory-confirmed cases, whereas the eSEIRD model prediction was 793 thousand, much closer to the observed count: 727 thousand confirmed cases. However, more surveillance testing and effective testing strategies under conditions of limited test availability, such as contact tracing of the contacts and confirmed cases, will be helpful to curtail the pandemic in South Africa6.
Although highly transmissible and lowly ascertained, the COVID-19 IFR is estimated as 0.05% in South Africa, comparable to the estimates in other locations with similar low mortality rate based on serological data40. The low IFR may due to the entire South African population being relatively young such that decreases the fatal impact on general population to some extent41. Our estimates of the IFR of hospitalized cases are much higher than that for non-hospitalized cases, suggesting that the most severe cases may have been admitted to hospitals despite the relatively lack of the testing arrangements.
Comparison of the models
The eSIR and the SAPHIRE model have been successfully applied to the data in India and Wuhan, China, separately22,31. While SAPHIRE model still has a great robust prediction performance on COVID-19 cases, the eSIR model has relatively poor predictive capacity for capturing the change in the trend of the epidemic in time for neglecting some important clinical characteristics. The eSEIRD model has a comparable 15-day prediction performance to the SAPHIRE though relatively sensitive to the initial ascertained rate, which is more reasonable as the trajectory of pandemic would change with the number of initial infectious cases. Moreover, it is useful to measure the IFR of COVID-19 accurately accounting for the unascertained cases when evaluating the impact of pandemic.
Strengths and Limitations
Our research investigated and supported some important epidemiological and clinical characteristics of COVID-19 and estimated and projected the trend of the spread in South Africa accounting for some critical information obtained, such as the population movement and the prior distribution of the ascertained rate and IFR. It is worth noting that we provide useful statistical tools for predicting infections and deaths and accurately estimating for substantive parameters accounting for both the reported cases and deaths information at the same time.
However, there are some important limitations. First, the assumptions in the models were collected from previous reports from other countries because of the lack of such information for South Africa, especially the fixed values for hyper-parameters. Though the estimation of parameters and prediction of infections seem to be robust to these assumptions to some extent, the inference and prediction would be much more convincing when based on accurate information in South Africa using these statistical tools. Second, the ascertained rate was assumed to follow the same distribution across the whole time period in the eSEIRD model although it might be time-varying depending on the accumulating knowledge and deployment of clinical resources for COVID-19, given the spatial variation within South Africa regarding the population density and movement, as well as regarding location of COVID-19 hotspots and hospital resources. Further, the population density is highly heterogeneous in different regions in South Africa with higher concentration near high-density economic hub cities, such as Cape Town and Durban. COVID-19 cases are also diversely spread. For instant, Gauteng Province is a very small, highly dense province with roughly 30% of total cases in the nation, and 49% of confirmed cases cluster in KwaZulu-Natal, Eastern Cape and Western Cape Province. Without considering these heterogeneities and potential confounding factors in individual region, the conclusion on the national data might be biased. The burden of HIV and tuberculosis comorbidity, particularly among the less privileged socio-economic population, also adds to the complexity of analyzing the COVID-19 data from South Africa42. In addition, in this paper we implicitly assumed that the recovered cases would not be infected again but it is still inconclusive based on extant research for COVID-1943. It might lead to a resurgence if this assumption is not valid and the interventions are totally lifted. Thus, it may be needed to conduct some serological surveys on COVID-19 among the general population in South Africa to confirm the national, as well as provincial, seroprevalence and thus provide more powerful evidence to support the evolving benefits of nonpharmaceutical interventions decisions and of their uptake, furthermore, provide guidance to manage provincial level disparity.
Data Availability
The data of this study are openly available in the COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University at https://github.com/CSSEGISandData/COVID-19.
Funding
This work was supported by grants from the National Science Foundation [grant numbers DMS-1712933 (to B.M.) and DMS-2015460 (to J.D.)] and from National Institute of Health [grant number 1 R01 HG008773-01 (to B.M.)].
Acknowledgements
The first and second author (X.G. and B.M.) would also like to thank the Center for Precision Health Data Sciences at the University of Michigan School of Public Health, The University of Michigan Rogel Cancer Center and the Michigan Institute of Data Science for internal funding that supported this research.
Footnotes
↵1 World Health Organization, “Estimating Mortality from COVID-19: Scientific Brief, 4 August 2020” (World Health Organization, 2020), 19.