Abstract
In January 2020 China reported to the World Health Organization an outbreak of pneumonia of undetermined origin in the city of Wuhan, Hubei. In January 30, 2020, the World Health Organization declared the outbreak of COVID-19 as a Public Health Emergency of International Interest (PHEI).
Objectives The aim of this study is to assess the impact of a COVID-19 epidemic in the metropolitan region of São Paulo, Brazil.
Methods We used a generalized SEIR (Susceptibles, Exposed, Infectious, Recovered) model, with additional Hospitalized variables (SEIHR model) and age-stratified structure to analyze the expected time evolution during the onset of the epidemic in the metropolitan area of São Paulo. The model allows to determine the evolution of the number of cases, the number of patients admitted to hospitals and deaths caused by COVID-19. In order to investigate the sensibility of our results with respect to parameter estimation errors we performed Monte Carlo analysis with 100 000 simulations by sampling parameter values from an uniform distribution in the confidence interval.
Results We estimate 1 368 (IQR: 880, 2 407) cases, 301 (22%) in older people (≥60 years), 81 (50, 143) hospitalizations, and 14 (9, 26) deaths in the first 30 days, and 38 583 (IQR: 16 698, 113, 163) cases, 8 427 (21.8%) in older people (≥60 years), 2181 (914, 6392) hospitalizations, and 397(166, 1205) deaths in the first 60 days.
Limitations We supposed a constant transmission probability Pc among different age-groups, and that every severe and critic case will be hospitalized, as well as that the detection capacity in all the primary healthcare services does not change during the outbreak.
Conclusion Supposing the reported parameters in the literature apply in the city of São Paulo, our study shows that it is expected that the impact of a COVID-19 outbreak will be important, requiring special planning from the authorities. This is the first study for a major metropolitan center in the south hemisphere, and we believe it can provide policy makers with a prognosis of the burden of the pandemic not only in Brazil, but also in other tropical zones, allowing to estimate total cases, hospitalization and deaths, in support to the management of the public health emergence caused by COVID-19.
Introduction
In January 2020 China reported to the World Health Organization an outbreak of pneumonia of undetermined origin in the city of Wuhan, Hubei. Initially 44 cases were reported, having as common exposure contact the Wuhan seafood market. An increasing number of unrelated secondary cases have since been detected across China, and leading to the dissemination of cases into several countries [1]. The etiologic agent was identified as a new coronavirus, of the betacoronavirus family, which has since be named SARS-CoV-2, and the resulting disease COVID-19 [2]. In January 30, 2020, the World Health Organization declared the outbreak of COVID-19 as a Public Health Emergency of International Interest (ESPII) [3, 4], and in March 11 declared it a Pandemic [5].
Previously to 2019, two highly pathogenic coronavirus had been described in the world. The first named SARS-CoV and described in 2003, was responsible for an epidemic of severe acute respiratory syndrome (SARS), initiated in China and with secondary cases in 26 other countries, accounting for a total of 8,096 cases and 774 deaths [case-fatality rate (CFR): 9.6%] [6, 7]. The second virus, named MERS-CoV, was identified in 2012 is a betacoronavirus responsible for the Middle East Respiratory Syndrome (MERS) [8]. Cases of MERS-CoV are reported sporadically ever since, resulting from zoonotic transmission, with a few outbreaks associated with human to human transmission, resulting in 2449 cases and 845 deaths (CFR: 34,5%), with the majority (84%) reported in Saudi Arabia [9].
SARS-CoV-2 has significant differences relative to MERS-CoV and SARS-CoV. In just over a month of the epidemic, more cases of COVID-19 were confirmed than in the entire history of SARS-CoV and MERS-CoV. Previous outbreaks of SARS-CoV and MERS-CoV have been linked to epidemic amplification phenomena, with few cases were responsible for a disproportionately high number of secondary cases, the so-called super-spreaders, with a significant number of cases resulting from nosocomial transmission. This characteristic allowed outbreaks to occur even in scenarios with an average basic reproduction number R0 of less than one [4, 10–14]. Unlike SARS-CoV and MERS-CoV, over-dispersion events does not seem to be of a major relevance for the COVID-19 epidemic, suggesting a more homogeneous transmissibility in the population [15, 16].
Homogeneous transmissibility and the potential for transmission from asymptomatic sources [17] brings COVID-19’s behavior closer to other respiratory transmission viruses, such as measles or influenza [13]. Influenza viruses, despite the differences in relation to viruses of the coronavirus family, have similar modes of transmission, and the associated clinical syndromes. Eventually new influenza viruses originating from genetic recombination in animals infect humans, and subsequently transmitted in the population, having been responsible for pandemics in the past, with the occurrence of hundreds of thousands of cases and thousands of deaths worldwide [18, 19].
Brazil has one of the largest public health care systems in the world [20], and understanding how an eventual COVID-19 epidemic in the country could affect this system is central for the preparation of a proper response. In the past, epidemiological models have been used to predict the occurrence of measles cases in order to support decision making in public health emergencies. This paper aims to present tools capable of making projections of the impact of a COVID-19 epidemic in the major metropolitan region of the country. This of particular importance not only for the size of its population, but also for being the main hub of arrival and departure from the country.
We analyze the expected time evolution during the onset of the epidemic in the metropolitan area of São Paulo, with a total estimated population of 21.5 million individuals, the fourth largest in the world. We use a generalized age-stratified SEIR model with the addition of hospitalized population variables (SEIHR model) to predict the occurrence of cases, the expected number of patients admitted to hospitals and deaths caused by COVID-19. Our approach can be adapted straightforwardly to other cities and countries, which is of great relevance in low and middle income countries, with reduced availability of health infrastructure and preparedness to respond to an emergency.
Materials and methods
It has been shown that age-specific contact rates describes with more accuracy the dynamics of transmission of measles when using an age stratified SEIR model [21], and allows to grasp specifics of social behavior as coded in the contact matrix. Owing to different impacts of the disease across the population according to age, we consider the following age groups: 0–9, 10–39, 40–49, 50–59, 60–69, ≥ 70 years. The population in each group is obtained from the 2010 Brazilian census, corrected by the estimated population in São Paulo in 2019 (see supporting information), except for ages from 0 to 1 year, where the actual population from birth data was used [22, 23]. The variables in the model for each age class are proportions with respect to the total population at time zero: Susceptibles (Si), Hospitalized (Hi) due to COVID-19, Exposed (Ei) (in the incubation period and not infectious), Infectious (Ii) and Recovered (Ri) B individuals, i = 1, …, M, (SEIHR model), with M the number of age groups. They are such that at time zero we have with the fraction of the population in a given age group ni = Si + Hi + Ei + Ii + Ri and M the number of age groups. The probability of hospitalizations of an infected individual of age-group i is estimated as being with 0.18 being the proportion of severe and critical cases, µCOV is the overall estimated letality of the disease and the proportion of fatal cases by the number of cases in age group i [24], which yields ζ1 = 0, ζ2 = 0.0157, ζ3 = 0.0313, ζ4 = 0.102, ζ5 = 0.282 and ζ6 = 0.892 for µCOV = 2.3%. We assume that fatalities only occurs among hospitalized individuals. The different parameter values required are given in current published data and shown withe the corresponding sources in Table 1. The aging rate from age group i to age group i + 1 is denoted by νi and is given by the inverse of the age span of the group (in the corresponding time unit). We put ν0 = νM = 0 in the model equations below.
We consider that hospitalized individuals are isolated and do not contribute to the force of infection, defined by for the i-the age group, with βi,j the transmission matrix, which is estimated as follows. We first consider the contact matrix Ci,j as given by the average number of physical contacts per time unit of and individual of age group i with any individual of group j. Since the present available information does not allow to determine a probability of contagion for each specific group, we consider the transmission probability per contact Pc being the same for all infected individuals. We thus have that βi,j = PcCi,j. There are some studies determining the contact matrix for different regions in the world, but none for any Brazilian city. So we considered the study in Ref [29] where the contact matrix was determined from field studies for eight different European countries. Our working hypothesis was that these results can reasonably be transposed for the metropolitan area of São Paulo. Since contact matrices for these different countries do not vary significantly, we take their average for Ci,j, and the contact matrix resulting from this procedure is shown as a heat map in Fig 1. The transmission probability is then obtained by adjusting the value of the basic reproduction number from the relation
We also suppose that only severe and critical cases are hospitalized.
The model schematic is given in Fig 2 and the corresponding system of differential equation is: where all variables are taken at time t except where explicitly indicated, and δij is the Kronecker delta (1 if i = j and 0 otherwise). This system is well defined, in the sense that all variables always remain positive and below 1 if there is no population growth. This can be verified straightforwardly simply by noticing that the gradient at the boundaries of the significance region point inward.
The solution of Eq (3) was implemented in C, and additional analysis and scripts in the symbolic language system MAPLE and are available on demand.
Results and discussion
With the onset of an outbreak in a geographically delimited region, a behavioral change is expected in the population in a relatively short time span, as well as government and health officials to intervene with drastic measures in order to reduce contacts, and thence disease propagation. Mathematically, this amounts to judiciously reduce the values of some components of the contact matrix. We consequently restrict ourselves to the first 60 days of the possible outbreak, beyond which results from simulations would not correspond any longer to a realistic setting. Nevertheless, our approach lends itself easily to model specific interventions, as for instance school closure, using the analogous of a school-term forcing in the contact matrix, commonly used to study periodic oscillations in measles [21].
The time evolution for the total number of cases, hospitalized individuals, and total fatalities, for each age group, from the average or median values for the parameters in Table 1, are shown in Figs. 3, 4 and 5, respectively. In order to investigate the sensibility of our results with respect to the estimation errors in different parameter used, we performed a Monte Carlo analysis with 100 000 simulations by sampling parameter values from an uniform distribution, in the interval defined by the corresponding confidence interval. We considered at time zero 10 cases in the age group of 10 to 39 years. In all that follows the number of cases are discounted from this initial value. Results for the medians and inter-quartile intervals for the total number of cases, fatalities and number of hospitalized individual are shown in Table 2, for 30 and 60 days of time evolution. The transmission probability Pc obtained from the Monte Carlos study is well fitted by a log-normal distribution with median of 0.148 and inter-quartile interval of (0.106, 0.247) (see supporting information).
This is a first modeling of COVID-19 dynamics using an age-stratified model, similar to approaches for other respiratory diseases epidemics having a historical and clinical relevance in a number of conditions [30, 31]. This type of approach is relevant for the planning of age-dependent intervention policies, e. g. by school closing or restrictions on public gatherings. It is also the first study of a possible epidemic of COVID-19 in a large metropolitan region in the South hemisphere.
Parameters used are described in the literature except for hospitalization probability according to age that was estimated. The contact matrix was adapted from a study of European countries, which are expected to display a similar contact structure as the major city in the southern hemisphere, and comparable in population size to the Wuhan region. Although these two cities have different climates, there is no available empirical evidence to assert its effect on the disease propagation, although clear evidence exists for H1N1 and H3N2 viruses [32]. Further research is in need to clarify this point.
Previous outbreaks of the highly pathogenic coronavirus SARS-CoV and MERS-CoV were related to epidemic amplification with a small number of super-spreader cases causing an elevated number of secondary cases, with a high impact in hospital settings. This explains outbreaks with a basic reproduction number R0 smaller than one [4, 10–14]. The value of R0 depends not only on the specifics of the disease, but also on a number of environmental factors, being affected by a change of social behavior in the population and by isolation of infected individuals. This was clearly observed during the recent evolution of the China outbreak, since its onset in December 2019, with a gradual decline of R0 [33]. This is the main reason why in the present study we restricted the time span of our prognosis to 60 days, as we estimate that after a month of the onset of the outbreak, behavioral change is expected to occur, and public authorities interventions are also expected to occur during the first 60 days from the start of the outbreak.
The present study simulates the impact of COVID-19 on the local health system by a prediction of the number of hospitalized individuals, and consequently provides a tool for policy decision makers to plan the needs of healthcare services in providing people living in an area that is an international hub for the spread of COVID-19, mainly for other countries in South America. The expected number of hospitalized individuals in the first 30 days of the outbreak should be easily absorbed by the existing infrastructure in the metropolitan area of São Paulo, but increases rapidly as the outbreak unfolds, and one expects a rapid saturation of the health settings, which varies according to age. Our approach is therefore a relevant tool to provide authorities responsible for the preparations for a possible outbreak the possibility to know how much time they have at their disposal to prepare complementary health infrastructures. A more systematic and detailed analysis in this directions is the subject of ongoing research.
The determination of the value of R0 is affected by the supposition that both the time from the start of symptoms, and confirmation, and sub-notification proportions, are roughly constant during the outbreak, resulting in a possible overestimation during the initial stage of the epidemic. The raise in the number of confirmed cases may be in great part be due to a better sensitivity of the health system surveillance and a decrease of the time lag for laboratory confirmation.
Limitations
In this section we summarize limitations of our work. We assumed that different age-group have the same transmission probability per physical contact, although no information is available to avoid this assumption. We also considered that that every severe or critic case will be in need for hospitalization and supposed, again due to a lack of more detailed data, that the probability of an infected individual from a given age group is proportional to the reported values of the death rate in this same group. Another relevant limitation was to consider that all primary health services will have the same capacity in identifying the severity of clinical conditions, in a region with more than 20 million people living with a GINI index ranging from 0.40 to 0.69 [25], indicating, among other factors, quite different levels of access to tertiary healthcare services. Furthermore, one must consider the possibility that the health infrastructure available can be precociously collapsed. We also assume that, during the 60 days lapse from the epidemic onset, no significant behavioral changes would occur, neither strong interventions from by policy decision makers. More recently the percentage of asymptomatic cases was estimated to be as high as 34.6% [17], while all epidemic parameters previously obtained considered that all cases are symptomatic. On the other hand, the number of asymptomatic cases is expected to have a significant impact on the disease dynamics at later stages, when a substantial proportion was infected. It has nevertheless important consequences in the efficiency of isolation procedures. Although we consider here that hospitalized individuals are no longer able to infect is an oversimplification, and may contribute significantly to the number of reported cases among health professionals. The period that an infected individual transmits the virus, as given in Tab 1, is probably underestimated and was based on the oly available study that explcitly cites it [28].
Conclusion
Despite having a low case fatality rate, COVID-19 has high transmissibility, with as a consequence a large number of cases when introduced in a naive population, and a large number of hospitalizations and deaths. A COVID-19 epidemic in a major urban area like São Paulo would promote a significant burden in the health care system. Measures to limit the spread of the disease will be necessary in order to slow the epidemic growth and avoid depleting the available hospitals beds and intensive care unit. Mathematical models can contribute in predicting the expected burden of disease. The approach presented here allows a detailed assessment of the impact of the onset of the epidemic in the major metropolitan area in the south hemisphere, despite major limitations in our understanding of the COVID-19 dynamics. For instance, the role of asymptomatic individuals in the progression of the epidemic is still unknown, and no data is available regarding the differences in the transmission rates between different age groups and the impact of the weather in the reproductive number.
Data Availability
The data will be available on demand
Author contributions
WMR, WNA and JHRC discussed and designed the initial study design. FSGS, VBG and TAHR performed the data gathering and analysis. TMRF wrote and tested all the computer code, run all simulations, and performed the main analysis of corresponding results. Some additional statistical analysis were performed by FSGS and VBG. All authors contributed equally to discussing the final design of the study and participated in the subsequent analysis, discussion and assessment of final results.
Acknowledgments
TMRF was partially financed by CNPq (Brazil) under grant no. 305842/2017-0.