Abstract
Obtaining reasonable estimates for transmission rates from observed data is a challenge when using mathematical models to study the dynamics of “infectious” diseases, like Ebola. Most models assume the transmission rate of a contagion does not vary over time. However, these rates do vary during an epidemic due to environmental conditions, social behaviors, and public-health interventions deployed to control the disease. Therefore, obtaining time-dependent rates can aid in understanding the progression of disease through a population. We derive an analytical expression using a standard SIR-type mathematical model to compute time-dependent transmission rate estimates for an epidemic in terms of either incidence or prevalence type available data. We illustrate applicability of our method by applying data on various public health problems, including infectious diseases (Ebola, SARS, and Leishmaniasis) and social issues (obesity and alcohol drinking) to compute transmission rates over time. We show that transmission rate estimates can have a large variation over time, depending on the type of available data and other epidemiological parameters. Time-dependent estimation of transmission rates captures the dynamics of the problem and can be utilized to understand disease progression through population accurately. Alternatively, constant estimations may provide unacceptable results that could have major public health consequences.
1. Introduction
An epidemic is a function of environmental factors and a contact structure that varies over time, which in turn leads to varying transmission potential of an “infection”. We also refer the word “infection” to describe social influences exerted by a typical influential individual with a particular social problem that results in a naive (to the social problem) individual getting involved in the problem. For example, an alcoholic might influence an abstainer into initiating drinking. Many authors have studied outbreaks of social problems and infectious diseases using compartmental transmission model. Qualitative aspects of homogeneous mixing models with constant transmission potential of an infection are well understood for various applications. These models are relatively easy to analyze and can answer questions, at the population level, with good precision. Homogeneous mixing compartmental models have a long history, however, quantification of temporal transmission potential of an infectious agent, an input variable for this type of model, has been a challenge.
William Hamer first published a paper in 1906 containing an epidemic model for the transmission of measles where his observation included the incidence of new cases, in a time interval, is proportional to the product, SI, of the density of susceptibles (S) and the density I of infectives (I) in the population. The formulation of incidence can be explained by considering some epidemiological quantity. Consider a single susceptible individual in a homogeneously mixing population of size N. This individual contacts other members of the population at the rate c, per unit time, and a proportion I/N of these contacts are with individuals who are “infectious”. If the probability of transmission of infection given contact is ρ, then the rate at which the infection is transmitted to a susceptible is ρcI/N, per unit time, and the rate at which the susceptible population becomes infected is ρcSI/N.
The contact rate is often a function of population density, reflecting the fact that contacts take time and saturation occurs. If c is assumed approximately proportional to N or equal to constant, incidence can be represented by terms like βSI (referred as mass action incidence) or βSI/N (referred as standard incidence), respectively. The parameter β, which includes the contact rate c, is called as a “transmission coefficient” (or “effective contact rate” or “transmission potential:) with units as time−1. At low population densities mass action is a reasonable approximation of a much more complex contact structure, however, in general, standard incidence is more appropriate for modeling transmission for human diseases or influences for social problems. The term βI/N is sometimes referred as the force of infection, i.e., per capita rate at which susceptible members of the host population are getting infected. On the other hand, the transmission rate, represents the number of new infections per unit of time generated by an infected individual. The transmission rate is calculated by dividing incidence for a given time period by a disease prevalence for the same time interval.
Most infectious disease data is collected in form of incidence and/or prevalence. Prevalence of a “disease” in a population is defined as the total number of cases of the disease in the population at a given time, whereas prevalence proportion is computed by dividing the total number of cases in the population by the number of individuals in the population. It is used as an estimate of how common a condition is within a population over a certain period of time. Incidence is a measure of the risk of developing some new condition within a specified period of time. Incidence proportion (also known as cumulative incidence) is the number of new cases within a specified time period divided by the size of the population initially at risk. When the denominator is the sum of the person-time of the at-risk population, it is also known as the incidence density rate or person-time incidence rate. Using person-time rather than just time handles situations where the amount of observation time differs between people, or when the population at-risk varies with time. Prevalence is a measurement of all individuals affected by the disease within a particular period of time, whereas incidence is a measurement of the number of new individuals who contract a disease during a particular period of time. So prevalence and incidence proportion at the time t is given by I(t)/N(t) and β ∗ (S(t)/N(t)) ∗ (I(t)/N(t)), respectively.
In compartmental mathematical models, varied assumptions are made based oncharacteristics of a modeling disease which lead modelers to focus on more important aspects of the epidemic. For example, an epidemic that occurs on a timescale that is much shorter than that of the population replenishment (that is, epidemic occurs at a much faster rate than births and deaths in the population), constant population size can be assumed. Additional common features of these models might include temporary or permanent recovery of infected individuals and a birth rate into infective class. Whether establishment or a major outbreak of an infectious disease or a social problem will occur in a population, requires extensive experience or a mathematical model of disease dynamics and estimates of the parameters of the disease model. Here, we provide a method for estimating the transmission coefficient. A suitable set of data for estimation of β includes prevalence and incidence of the outbreak in question. There are many different methods for estimating β but most of them results in an aggregate value over time. The methods in literature include estimation using regression of prevalence and time since start of an epidemic [1], estimating from equation for basic reproductive rate when threshold density is known [2], estimating from equilibrium prevalence [3,4], using age prevalence curves [5], inferring from behavior or contact data [6], and iterative comparison of field prevalence data with model predictions [7].
Some researchers have modeled time varying transmission coefficients for diseases that follow seasonal patterns but using a predefined functional form [8]. On the other hand, a study by Finkenstadt and Grenfell [9] uses a discrete time model that allows for a temporally varying transmission parameter with a period of one year with no assumption on functional form. However, their estimation is computationally intensive and assumes that reporting interval of the available data must be an integer fraction of the serial interval of the disease. Here, we provide an analytical formula for estimating transmission coefficient over time. Examples of social problems like alcohol drinking and obesity and infectious diseases like Ebola, Visceral Leishmaniasis (or Kala-azar), and SARS are used to show relevance of the analytical work. The available data of US college alcohol drinking and obesity outbreak in US include prevalence trends, whereas incidence data of Ebola outbreak in West Africa (Guinea, Sierra Leone, and Liberia), Kala-azar outbreak in Bihar, and SARS epidemic in Hong Kong are used for the estimation.
In this paper, we compute time dependent and independent transmission coefficient of Ebola virus disease along with other health care problems like college alcohol drinking, the obesity epidemic in United States, the spread of Visceral Leshmaniasis, and the spread of the 2003 SARS Outbreak in Hong Kong. The remaining paper is stratified as follows: Section 2 provides a compartmental SIR model and two analytical expressions of transmission coefficients based on prevalence and incidence data; examples for computing coefficient over time using each of the two expressions and field data are shown in Section 3; and finally, the results are discussed in Section 4. Fig.(1) represents the overview of this paper.
2. Materials and Methods
2.1. Formulation for Time Dependent Estimation
Consider a “disease” outbreak in a population that follow the following system of differential equations: where R(t) = 1− S(t) −I(t) and parameters are defined in Table 1 and Table 2.
Following steps carried out in Hadeler [10] and using Equations (1) and (2), we derive two explicit expressions for β(t): one based on prevalence data and other on the incidence of the disease. The main derivation steps for are mentioned below.
2.1.1. β(t) as a function of prevalence
Suppose prevalence data are available. Derivation of β(t) as a function of prevalence is carried out as follows. Adding Equations (1) and (2) we get
Setting c(t) = b(t) + γ(t) and d(t) = γ(t) + µ(t) in Equation (3) and solving it we obtain where .
Isolating β(t) from Equation (2) we obtain β(t) as function of prevalence (I) where S(t) is given by Equation (4).
Note, beside prevalence (I), we also need I′ to compute β(t) using formula 5. However, I′ can be approximated using prevalence data.
2.1.2. β(t) as a function of incidence
On the other hand, suppose incidence data are available. In order to calculate expression of β(t) as a function of incidence (w(t) = β(t)SI) we first solve Equation (2) for I with initial condition I(T) (where T ∈ [0, L] is a time at which the prevalence proportion, I(T), is available) and get where .
Using this expression of I(t) in Equation (1) and solving the resultant equation for S with initial condition S(0) we get
Thus, where S(t) and I(t) are given by Equations (7) and (6), respectively.
Note, we need prevalence at time point T, I(T), to compute β(t) using formula (7). The time point T can be appropriately chosen, close to maximum of prevalence and not towards starting or end of epidemic.
2.2. Time-Independent Estimation: Bayesian Analysis
The Bayesian Monte Carlo Markov Chains (MCMC) approach can be used to quantify uncertainty around the transmission rates and compare our analytical estimates with it.
Let θ represents vector of our transmission parameters and y = (y1, y2, …, yT)T is the available data set. We can take likelihood function in our bayesian approach as where T is the total number of data points in the data set, σ is the appropriately chosen variance and f (θ) is the model output function for which data are used. If there are more than one data sets are used then the likelihood can be modified as follows:
While Bayesian approach can provide uncertainty around time-independent average transmission rate, it doesn’t inform how the transmission rate varied over time and uncertainty itself is constant over time. Therefore, this approach, while assists in understanding uncertainty in disease progression, it does not address the challenge of capturing changing transmission rates over the progression of an epidemic with respect to time.
3. Results
We use four examples to show how to estimate β over time from the available epidemiological data. The examples provide a method to study social and public health issues. To compute estimates of β(t), we use first order discretization for derivatives and composite trapezoidal rule for integration as given below
These discretization are used in the formulas given in Equations (5) and (8).
We can avoid this discretization by choosing a function, for example, a polynomial, that can be fitted to the prevalence and incidence temporal data. This fitted function can then be used directly in the Equations (5) and (8).
3.1. Using Incidence Data
In this section, we apply available incidence data to three past epidemics: the 2014-2016 Ebola outbreak in West Africa, the 2005 outbreak Visceral Leishmaniasis in the Indian state of Bihar, and the 2003 SARS outbreak in Hong Kong.
3.1.1. 2014-2016 Ebola Outbreak in West Africa
In this section, we estimate the transmission coefficient, β(t) for the 2014-2016 Ebola epidemic in West Africa using available incidence data. The number of reported cases per month were retrieved from the Center for Disease Control and Prevention (CDC) and are shown totaled as West Africa (Figure 3a), and individually for Guinea (Figure 3c), Sierra Leone (Figure 3e), and Liberia (Figure 3g) [20]. For these estimates, prevalence is taken as May 31, 2015, as this point is close to the maximum prevalence and not towards the start of the epidemic (see Section 2.1.2). Incidence is calculated by dividing these case counts by the 2016 population for each country, as reported by the United Nations (UN) [21]. We assume a constant recovery rate of 10 days (α(t) = α), a constant relapse rate of 10 years (γ(t) = γ), no vertical transmission (p = 1), and a constant population (b(t) = µ(t) = u = 0); since the CDC data provides monthly case counts, these parameters are adjusted to per month rates. We estimate β(t) by simplifying Equation (6) as follows:
On discretizing Equation (10) we get following expressions. If t ≤ T, where and
If t > T, where where and m1 = α.
For the estimation of β(t) with regards to available incidence data, the estimates are found in Table 2 (see Appendix A.2) and are shown for West Africa (Figure 3b), Guinea (Figure 3d), Sierra Leone (Figure 3f), and Liberia (Figure 3h). Comparing the results for each region, we find the largest temporal estimate for both the mean and median β(t) to be that of Guinea (see Table 3 and Figure 2). Analysing the estimates for transmission rate temporally, we observe that transmission rate follows the incidence pattern reflecting the effects of exponential incline in the beginning of epidemic as well as impacts of disease-acquired immunity as well as non-pharmaceutical interventions implemented over the course of epidemic (Figure 3).
3.1.2. 2005 Occurence of Visceral Leishmaniasis in Bihar, India
Visceral Leishmaniasis (VL) is a vector borne infectious disease that is spread from person to person by a bite of the tiny insect, sandfly. Large population suffers from VL in some tropical and subtropical countries of the world. The highest burden of the VL is found in Indian state of Bihar. We obtained underreporting adjusted 2005 incidence data of Bihar from [7]. The data contain number of new cases during past month adjusted for underreporting. The Expression (13) is used to estimate β(t) via two different models. The first model was for a single outbreak and hence demography was not considered whereas the second model assumed birth and death though with a same per-captia rate.
If t ≤ T then where and
If t > T where and where (for i = 1, 2), m2 = µ and m3 = α + µ.
Since annual epidemic during 2005 started showing clear trend of decaying in the month of October, we took this time to compute the prevalence of VL in Bihar. Prevalence during October 2005 was computed under assumption that 25% of worldwide leishmaniasis prevalence is from VL cases whereas remaining is from other forms of Leishmaniasis. It also assumed 20% of global burden is in Bihar. Since some proportion of a population are naturally immune to the disease, we carried out estimation for three different values of initial proportion of susceptibles, namely, 0.1, 0.5 and 0.8. Recovery rate of 0.21 per month and influx/outflux rate of the population of 0.00138 was computed using data from Mubayi et al. (2010) [7]. The other assumptions of the model include constant recovery (i.e., α(t) = α), no vertical transmission (i.e., p = 1), permanent recovery (i.e., γ(t) = 0) and same constant per-capita incoming and outgoing rates (i.e., b(t) = µ(t) = µ). We only model human population and do not take into account vector population explicitly. Thus, β(t) could be interpreted as vectorial capacity of sandfly population transmitting infection between humans.
The obtained estimates of β(t) are given in Table 3 (see Appendix A.2) and Figures 4a, 4b and ??. The β estimates that we have computed here are comparable to corresponding estimates in [7] (in this reference the mean estimates are βh = 0.13 (with Median=0.11, Std=0.08, Q1=0.07, Q3=0.17) and βv = 0.12 (with Median=0.11, Std=0.08, Q1=0.07, Q3=0.16) where around 75% of the population was susceptible).
3.1.3. 2003 SARS Outbreak in Hong Kong
Severe acute respiratory syndrome (SARS) is a viral respiratory illness caused by a coronavirus. SARS epidemic in Hong Kong is shown in Figure 7a. We estimated transmission coefficient using a single outbreak model with parameters values given in Table ??. The formula used for estimating β(t) is
On discretizing Equation (15) we get following expressions. If t ≤ T, where and
If t > T, where where and m4 = α.
The temporal estimates of β(t) are shown in Table 4 (see Appendix A.2) and Figure 7b.
3.2. Using Prevalence Data
We use US national college alcohol drinking and obesity data as examples in this section. In Appendix A, We also present a hypothetical example with synthetic prevalence data and known time varying transmission rate to illustrate the ability of our analytical expression to accurately capture the time-dependent transmission rate
3.2.1. College Alcohol Drinking
The available alcohol drinking data represent prevalence (proportion of cases at a certain time) and not incidence (new cases over time period). This is because the data is based on the survey where the drinking pattern estimates are obtained by asking individuals their drinking behavior during past one year. Hence, data can be interpreted as the number of individuals in certain drinking category at a particular time. Therefore, we use formula given in Equation (5) to estimate β(t). We assume that drinking is a result of social influences exerted by drinkers (I) on susceptibles (S) or social drinkers. Individuals recovered from drinking at a constant rate α (i.e., α(t) = α). The recovery is assumed to be permanent (i.e., γ(t) = 0). The incoming and departure rates are same (i.e. µ(t) = b(t) = µ) and p = 1. These assumption are reasonable in context of the type of data (college population) used here.
Alcohol drinking data, obtained from Engs et al., 1997 and 1999, is given in the Table 5 [11,12] that represent the trend observed in national college drinking surveys. The recovery rate, α is taken to be 0.17 [4]. We estimate β(t) using simplified Equation (5) and above assumptions as follows where
If µ = 0, this equation can be reduced, where f2 is −αI(x).
We found that mean estimate of β is 1.04 (std=0.3; Table 5 see Appendix A.2 and Figure 8) during 1982-94 for the national college drinkers. The estimates of β are comparable to the estimates obtained in the [4]. These estimates of β(t) are all contained in 95% CI of the estimates in the [4], which are β0 = 1.69 (95% CI[0.63, 2.75]) and β2 = 0.75 (95% CI[0.29, 1.21]).
Engs et al., 1994 and 1997 suggest that 65% of freshman are drinkers during the start of Fall semester. Hence, we assumed that 0.65 proportion of incoming students are drinkers, i.e., p = 0.35. We assumed negligible change in size of a college population and consider rate of enrollments equal to combined rate of graduation and drop out rates (i.e., b(t) = µ(t) = µ).
3.2.2. Obesity Epidemic in US
We use model to see whether weight gain in one person is associated with weight gain in his or her family members and friends. Obese persons is an individual whose body-mass index (the weight in kilograms divided by the square of the height in meters) is greater than or equal to 30. It is found that there has been increasing number of obese persons in a community and a person’s chances of becoming obese increases dramatically if he or she had a parent, sibling, friend or spouse who became obese in a given interval [23]. The most reasonable explanations for the obesity epidemic, include changes in which luxuries and food consumption are being promoted in the society and has not spared any socioeconomic class. An obesity is a result of individual’s choice and behavior which is influenced by appearance and behavior of others in the community. Hence, it suggests that just like the spread of drug-use or infectious diseases, weight gain in one person might influence weight gain in other person. That is, it’s not that obese or non-obese people simply find other similar people to hang out with. This influence could be direct or indirect, which can vary continuously over time and may depend on demographic and social factors of the community as well.
We used annual CDC data from references [17] and [13] to estimate parameters for our obese epidemic model. The data obtained from [24] include a age-adjusted prevalence of obesity in US using the projected 2000 U.S. population.
The model assumes constant population and hence b(t) = µ(t) = µ. It is assumed that 6% of children are born obese [13]. The vale of recovery rate is assumed to be equal to an average of rate at which an overweight individuals move on diet (4.068 × 10−3 per week [17]) and rate at which an obese individual stops or reduces bakery, fried meals and soft drinks consumption (4.4379 × 10−3 per week [17]). We assume obesity reduces life span by 6 to 7 years. Hence if average life span in US is 78.4 years than average life span of at-risk population for obese is (78.4 − 6.5) years. The estimated β from [17] ranges from 0.02 to 0.04. These estimates are much lower than our estimated values in Table 6 (see Appendix A.2) with range of (0.36, 3.02). This is because the region of our study differ from the region modeled by [17]. Our results suggest that estimates of transmission coefficient increases with increase in µ and decrease in initial size of susceptible population, S(0). where and
4. Discussion
Compartmental models have provided valuable insights into the epidemiology of many infectious diseases. Transmission coefficient, a product of contact rate and probability of transmission given a contact, is a parameter in the compartmental model which naturally varies over time. This coefficient had the greatest effect on predictions of dynamics of disease or social problem and difficult to estimate. However, due to lack of detailed data as well as complexities involved in numerical estimating this parameter, most studies estimate it as a time-independent parameter averaging it over the course of epidemic. In this study, we present a method to estimate time-dependent transmission rate using two types of data commonly reported during infectious disease outbreaks: the time series of the number of infectives (or prevalence) and the number of new cases generated during a period of time (or incidence). By deriving an analytical method that uses a standard deterministic model and these data sets to directly estimate β(t), this new approach resolves the computational challenges often involved with more complex model. By applying our approach to a number of infectious diseases, we illustrate applicability of our methods in various contexts. Moreover, similar approaches can be applied with any appropriate mathematical model to derive time-dependent transmission rate for diseases whose dynamics may need to incorporate other factors such as environment (for.e.g., Cholera) or vector-dynamics (for. e.g. dengue).
Utility of approach presented in this manuscript is demonstrated using several public health problems including Ebola, Visceral Leishmaniasis, US college alcohol drinking and obesity outbreak in the US. In particular, we estimated the temporal estimates of transmission rate for Ebola during 2014–2016 outbreak in West Africa (aggregated) as well as for individual countries of Liberia, Guinea and Sierra Leone. Our results though limited by the accuracy of data, demonstrated the wide-variability in transmission risks across the three countries. Moreover, we found that our temporal estimates of transmission risk followed the pattern of incidence closely reflecting the substantial contribution of transmission risk towards the nature of disease progression.
During the times of public health emergencies due to an infectious disease outbreaks such as Ebola outbreak in West Africa or ongoing COVID-19 pandemic, effective reproductive numbers are often estimated using incidence data to understand the progression of disease and inform strategies to curb the transmission. While estimates of effective reproductive numbers are useful, combining it with estimation of time-varying transmission risk through our approach can be more informative to inform public health decision making. Transmission risk at a particular time is a product of contacts and probability of transmission. Thus it can be used to make short term predictions about new infections as well as it can inform how much reduction in contact patterns or risk of transmission (through mask/vaccination/hygiene) can reduce the transmission parameter sufficiently to reverse the trend of an epidemic.
In the current study, we used simple deterministic model along with simple integration numerical techniques to show how commonly reported data (incidence and prevalence) can be utilized in informing temporal transmission risk, and thus manage public health challenges more effectively. Practical application of our approach would improve with use of more complex models (appropriate) as well more sophisticated integration techniques. Moreover, analytical derivation can be used to understand the impact of changes in any other input parameter (such as smaller/longer quarantine periods) on transmission risk in a straight-forward way. Similarly, an area of future research can expand presented framework to understand how incomplete data may alter the quality of parameter estimation. Therefore, value of analysis reported here is as a beginning point for future research that will build on current approach to develop computational models that can inform policies in swift manner during public health emergencies.
We believe using our methods can provide good approximation of time dependent transmission coefficients and goodness of approximation should increase with use of more sophisticated numerical integration techniques.
Data Availability
Data from public domain has been obtained
Author Contributions
The contribution of different authors are “Conceptualization, Anuj Mubayi, Aditi Ghosh and Abhishek Pandey.; methodology, Anuj Mubayi, Abhishek Pandey, Aditi Ghosh, Christine Brasic; software, Abhishek Pandey, Christine Brasic, Anamika Mubayi, Parijat Ghosh; validation, Anuj Mubayi, Abhishek Pandey, Christine Brasic and Aditi Ghosh; formal analysis, Anuj Mubayi, Abhishek Pandey, Christine Brasic and Aditi Ghosh; investigation, Anuj Mubayi, Abhishek Pandey, Christine Brasic, Aditi Ghosh, Anamika Mubayi, Parijat Ghosh; resources, Christine Brasic, Parijat Ghosh, Anamika Mubayi; data curation, Anuj Mubayi, Christine Brasic, Parijat Ghosh, Anamika Mubayi; writing—original draft preparation, XAnuj Mubayi, Abhishek Pandey, Christine Brasic and Aditi Ghosh; writing—review and editing, Anuj Mubayi, Abhishek Pandey, Christine Brasic, Aditi Ghosh, Anamika Mubayi, Parijat Ghosh; visualization, Anuj Mubayi.; supervision, Anuj Mubayi; project administration, Anuj Mubayi; funding acquisition, none.
Funding
“This research received no external funding”
Institutional Review Board Statement
“Not applicable”.
Informed Consent Statement
“Not applicable”
Data Availability Statement
In this section, the data is collected from public domain and is included in the Tables in Appendix.
Conflicts of Interest
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.
Appendix A.
Appendix A.1. Estimation of Time Dependent Transmission Coefficient using Synthetic Prevalence data
We demonstrate our method of using prevalence data to estimate time dependent transmission coefficient using synthetic prevalence data generated with two particular choices of transmission coefficients (constant and seasonal with respect to time) and the model ((1) and (2)) with rest of parameters given by Table A1.
We used two particular choice of transmission coefficients,
β(t) = 200,
β(t) = 200(1 − ϵ cos 2πt), with ϵ = 0.1
to generate daily prevalence data for five years and estimated monthly transmission coefficient using (5). The monthly estimates for time dependent transmission coefficient were reasonably accurate and close to the true values of the transmission coefficients used to generate prevalence data in both the cases when transmission coefficient was constant and when it was periodic (Figure 1).
Appendix A.2. Tables
Footnotes
anujmubayi{at}yahoo.com
abhishek.pandey{at}yale.edu
brasiccs23{at}uww.edu (C.B.); ghosha{at}uww.edu (A.G.)
anamikamubayi{at}yahoo.co.in
phool.ghosh{at}gmail.com
Abbreviations
The following abbreviations are used in this manuscript:
- CDC
- Center for Disease Control
- MCMC
- Monte Carlo Markov Chains
- SIR
- Susceptible–Infectious–Recovered
- SARS
- Severe Acute Respiratory Syndrome
- USA
- United States of America
- US
- United States
- VL
- Visceral Leishmaniasis