Abstract
We describe a novel approach to recovering the underlying parameters of the SIR dynamic epidemic model from observed data on case incidence or deaths. We formulate a discrete-time approximation to the original continuous-time model and search for the parameter vector that minimizes the standard least squares criterion function. We show that the gradient vector and matrix of second-order derivatives of the criterion function with respect to the parameters adhere to their own systems of difference equations and thus can be exactly calculated iteratively. Applying our new approach, we estimate four-parameter SIR models on two datasets: (1) daily reported cases of COVID-19 during the SARS-CoV-2 Omicron/BA.1 surge of December 2021 - March 2022 in New York City; and (2) weekly deaths from a plague outbreak on the Isle of Bombay during December 1905 - July 1906, originally studied by Kermack and McKendrick in their now-classic 1927 paper. The estimated parameters from the COVID-19 data suggest a duration of persistent infectivity beyond that reported in small-scale clinical studies of mostly symptomatic subjects. The estimated parameters from the plague data suggest that the Bombay outbreak was in fact driven by pneumonic rather than bubonic plague.
Introduction
Nowadays our sophisticated graphic software can draw attractive plots showing how many people have fallen victim to a highly contagious disease over the course of days, weeks or months. But our graphs alone don’t teach us how to reliably determine the underlying risk of transmission from an infected to a susceptible person, or the amount of time that an infected individual remains contagious to others, or what proportion of the population was already infected at the critical point in time when the epidemic wave took off, or how many people remain at risk of infection.
We’ve just described what mathematicians call the inversion problem [1-4]: how to work backwards from limited data on incident cases or deaths to recover the key parameters underlying our dynamic epidemic models. The problem was born nearly a century ago when Kermack and McKendrick (KM) fit a curve derived from their now-classic model to datapoints of weekly deaths from a plague outbreak on the Isle of Bombay [5]. Since then, scores of investigators have searched for a robust, workable method of estimating the parameters of what has famously come to be known as the SIR (Susceptible-Infected-Removed) model, and the race to find a solution has accelerated with the arrival of the COVID-19 epidemic.
What has made the inversion problem so difficult is that, with some possible exceptions [6-10], the SIR model of coupled differential equations does not admit a closed-form mathematical solution that can be readily used to test the model’s predictions against the observed data. That major stumbling block has left us with a motley collection of second-best alternatives.
One idea has been to back out the parameters from the certain salient characteristics of the observed epidemic curve, such as the initial exponential rate of increase of cases [11, 12], the time to reach the peak incidence [13], the rate of decline after the peak [14], and the proportion of the population that is ultimately infected [15-17]. This approach may give us point estimates of the key parameters, but it does not provide any uncertainty ranges around the estimates.
Another idea is to pare down the set of parameters to be identified by making judicious use of prior information on some parameters [18-21], in some cases derived from previous waves of a multi-wave epidemic [22]. Perhaps the most traveled road to a solution has been the use of various parameter search algorithms [18, 23-29] which, when it comes down to it, offer only a marginal improvement over brute force search [30]. Bayesian estimation may be better able to integrate prior information into our search procedures [31-36], but its computational burden is usually even greater. Last but not least, we can resort to trial and error combined with visual inspection [37].
The present study, we suggest, offers an easily workable solution to the inversion problem. Rather than seeking a closed-form, analytical solution to KM’s system of differential equations, we pursue an alternate strategy. First, following the lead of other investigators [35, 38-40], we develop a discrete-time version of their classic, continuous-time SIR model. This step allows us to write their dynamic system in terms of difference equations rather than differential equations. Second, similarly following in others’ footsteps [26, 41-44], we define a least squares objective function to test our SIR model’s predictions against the observed data.
Third, in what appears to be an innovation, we show that both the gradient vector and Hessian matrix of second-order derivatives of our objective function with respect to the parameters follow their own systems of difference equations. As a result, both the gradient and Hessian can be rapidly and exactly computed by straightforward iteration, an approach that is computationally far superior to numerical approximation [45].
Fourth, once we have calculated the gradient and Hessian, we can use the well-known Newton-Raphson algorithm [46] or Gauss-Newton [47] algorithms to find the global optimum. Fifth, relying on the minimum least squares criterion, we can then calculate the variance-covariance matrix of the parameters and thus determine their confidence intervals [48]. Sixth, our approach permits us to readily determine what parameters are in fact identified when we have only time-series data on new cases or deaths.
We apply our strategy to the estimation of a four-parameter SIR model to two types of datasets. First, we study COVID-19 incidence over a 99-day interval from the December 4, 2021, through March 12, 2022, during the Omicron/BA.1 wave in New York City. Second, we study 31 weeks of data on deaths from the plague from December 17, 1905, through July 15, 1906, during an outbreak in the Isle of Bombay, originally studied by KM [5].
Statistical Methods
Discrete-Time SIR Model
Following the lead of other investigators [4, 35, 38-40], we adopt a discrete-time approach. In our SIR epidemic model, the time axis is marked off in equally spaced intervals t = 0, 1, …, T, where the duration of each interval is sufficiently small as to adequately approximate the classical, continuous-time version [5, 49-52]. At any time t, individuals within this closed population can be in one of three mutually exclusive states: susceptible (S), infected (I), or removed (R). The latter state, which includes both recovered living individuals and decedents, is assumed to be absorbing. To minimize possible complications arising from the non-identifiability of multiple parameters [53-56], we assume a fixed, demographically closed population of size N.
Let St, It, and Rt denote the respective numbers of individuals in each of the three states at time t. The dynamic path of the epidemic is governed by the following deterministic, coupled difference equations:
Equations (1a) through (1d) represent the well-known forward Euler approximation to the underlying continuous-time SIR model of coupled differential equations [57]. Apart from the population size N, this dynamic system has two parameters: β and γ. In equations (1a) and (1b), β >0 is an infection transmission parameter, while in equations (1b) and (1c), the parameter γ >0 gauges the proportion of infected individuals who transition to the removed state at each time period. The multiplicative term βSt−1It−1/N in equations (1a) and (1b) reflects the law of mass action [58], whereby susceptible individuals become infected in proportion to their frequency of contact with currently infected individuals. All individuals within the population are assumed to mix homogeneously, with no subgroup of individuals mixing preferentially with any other subgroup.
The final equation (1d) reflects the constant size N of the population and is consistent with equations (1a) through (1c). Strictly speaking, one should adjust the size N of the mixing population in equations (1a) and (1b) to take account of removals by death. Unless the overall death rate is substantial, this adjustment is usually ignored in model implementations.
To further simplify matters, we assume that initially all individuals are either susceptible or infected. Specifically, at t = 0, we have:
In equations (2a) and (2b), the additional parameter i0 represents the proportion of the entire population of size N that is initially infected. We address the more general case where R0 >0 below.
In the dynamic system (1), the mean duration of infection is time-invariant and equal to 1/γ. Given the initial conditions (2), the system will result in an epidemic wave so long as (1 − i0) β/γ >1, a well-known result known as the epidemic threshold theorem [24, 51, 59, 60]. Assuming that 1 − i0 ≈ 1, most authors write this epidemic threshold condition as ℛ0 >1, where ℛ0 = β/γ is the basic reproduction number [24, 52, 61, 62].
Parameter Estimation Problem
We do not have direct observations on the underlying state variables St, It, and Rt. If we had such data, our inversion problem would border on trivial [39, 63, 64]. Instead, we observe only the reported counts of new infections at various intervals. For now, we assume that new infection counts yt are observed at each discrete time t. Below we address the more general case where such counts are observed less frequently. The counts yt represent observations on the output variables Xt, t = 1, …, T, which from (1) correspond to:
It will be helpful to define the parameter α = 1 − γ as the proportion of infected individuals who remain infected from one time period to the next, so that the probability an infected individual remains infectious after τ time periods is ατ. Given the definition of the output variable Xt in (3), our dynamic system (1) can be redefined as:
So long as the state variables adhere to the condition that that St + It + Rt = N, an explicit difference equation for Rt is unnecessary.
We can now characterize our parameter estimation problem. Given observations yt on the output variables Xt, we want to estimate the unknown parameters β, α, i0, and N. Our stumbling block is that we cannot express Xt as a closed-form function of these parameters. We know only that the output variables Xt adhere to the dynamic system defined by (2), (3) and (4), which in turn depends on the parameters β, α, i0, and N.
Solution Strategy: Nonlinear Least Squares
To proceed, we need some additional notation. Let y = (y1, …, yT)′ and X = (X1, …, XT)′, respectively, denote column vectors of the observed incidence data and the corresponding output variables at each time t, where we use boldface symbols denote vectors or matrices. Let θ = (β, α, i0)′ denote the column vector of the unknown parameters excluding the population size N. For now, we condition on N, but as we discuss below, this additional parameter can be separately identified once θ has been estimated. Let X(θ) represent the functional dependence of the output variables on the parameters, suppressing for now that the fact that X also depends on N. Finally, let the operator D denote the gradient of partial derivatives with respect to the parameters. For example, the column vector represents the gradient of partial derivatives of the output variable Xt at time t.
We want to optimize some objective function V(y, X(θ)) with respect to θ. While X(θ) has no closed-form expression, the time-specific gradient vectors DXt of partial derivatives of Xt with respect to the elements of θ follow a system of difference equations and can thus be readily calculated iteratively. The same goes for the Hessian matrices D2Xt of second-order partial derivates with respect to the elements of θ. Once we have computed the gradient vectors DXt and Hessian matrices D2Xt, we can use the Newton-Raphson algorithm [46] or, under more restrictive conditions, the Gauss-Newton [47] algorithm to optimize V.
To see how this solution strategy works, we specifically consider the nonlinear least squares optimization criterion V(y, X(θ)) = (y − X(θ))′(y − X(θ)), which can be written in summation notation as:
This criterion has been widely used in attempts to fit the SIR model to incidence data [26, 41-44]. It is well known that minimizing V is equivalent to computing the maximum likelihood estimate of θ under the assumption that the observations yt are independently normally distributed N(Xt(θ), σ2) with respective means Xt(θ) and homoscedastic variance σ2. In Appendix A, we adopt a more general maximum likelihood framework and specifically consider an alternative Poisson distribution.
Calculating the Gradient of the Least Squares Optimization Criterion V
The gradient of the least squares criterion V in equation (5) with respect to the parameter vector θ is:
From (6), we learn that the gradient DV depends on the gradients DXt. Our next step is to derive specific expressions for the latter gradient terms. Taking the derivative of (3) with respect to θ, we get: where the notation DV represents the unit column vector . Taking the derivative of (1a), we get:
From (1b) and (3), we can write .
Taking the derivative of this expression gives: where Dα similarly represents the unit vector (0,1,0)′.
We next need to compute the initial values of the gradients DS0 and DI0. Since I0 = i0N and S0 = (1 − i0)N, we can write the gradients DS0 = −NDi0 and DI0 = NDi0, where the gradient Di0 simplifies to (0,0,1)′. Given these initial gradient values, we can use (8) and (9) to iteratively compute the vectors DSt and DIt for all t = 1, …, T. Once these gradients have been computed, we can then apply (7) to iteratively compute the corresponding gradients DXt. Those quantities in turn yield the gradient DV of our objective function through equation (6).
Calculating the Hessian Matrix
Let DX denote the T × 3 matrix whose t-th row is the vector DXt′. Taking the derivative of the gradient in (6), we obtain:
From (10), we similarly learn that the Hessian D2V depends on the gradients DXt, which we have already derived, as well the Hessian matrices D2Xt.
To develop the corresponding iterative formulas for the Hessian matrices D2Xt, we need some additional notation. For two column vectors A and B of dimension L × 1, we define the outer product A · B as the L × L matrix AB′ formed by the cross-products of the elements of A and B. (Note that B · A is the transpose of A · B.) We further define A⨀B = (A · B) + (B · A) as the symmetric matrix formed by the sum of the outer product and its transpose. In the three-dimensional case, A = (a1, a2, a3)′, and B = (b1, b2, b3)′, we have:
Now taking the derivative of DXt as displayed in (7), we get:
The Hessian matrices D2Xt depend in turn on the Hessian matrices D2St and D2It. The corresponding expressions for these Hessian matrices are:
To complete our calculations, we note that the initial values D2S0 and D2I0 are simply null matrices.
Taken together, equations (10), (12), (13) and (14), along with the initial null values of D2S0 and D2I0, permit us to iteratively compute the Hessian D2V, just as we outlined for the gradient DV above.
Multiple Local Optima; Highly Correlated Parameter Estimates
In the Newton-Raphson [46] approach to optimization, which we employed here, the current value of the parameter vector θ(k) is repeatedly mapped into an updated value θ(k+1) according to the well-known rule:
In equation (15), the gradient DV, as defined in (6), and the Hessian D2V, as defined in (10), are both computed at the current value of the parameter vector θ(k). The step size 0 < q ≤ 1 is under control of the programmer. This approach works flawlessly when the objective function V is globally convex, for example, when each output variable Xt is a linear function of θ. But that’s hardly the situation in the SIR inversion context.
To begin with, there is good reason to suspect that V is not globally convex and may have multiple local optima. While the classical SIR model predicts a single-peaked wave of incident cases Xt so long as ℛ0 >1, the observed data yt often display multiple peaks over time. (A good example is the two-peak plot of the 2001 Dengue fever outbreak in Havana [60].) In that case, the search algorithm embodied in (15) may easily end up at a local rather than a global optimum of V as it gets trapped in a region of the parameter space that fits one of the multiple peaks.
What’s more, there will be regions of the parameter space where the Hessian D2V is not positive definite because the second term in equation (11) (that is, will not necessary be a positive definite matrix. In that case, the updated parameter rule (15) may not result in a decrease in the objective function V as the algorithm veers away from the optimum.
These difficulties may be partly addressed by the Gauss-Newton algorithm [47], which in the current context entails the following parameter updating rule:
The idea here is that matrix 2 DX′DX, which corresponds to the first term in our expression for the Hessian D2V in (10), is always positive definite, so that the rule (16) will consistently result in a decrease in V. However, the matrix DX′DX may be ill-conditioned, with eigenvalues spanning multiple orders of magnitude, so that the steps θ(k+1) − θ(k) jump around uncontrollably and fail to converge to zero. While various modifications such the Levenberg-Marquardt algorithm have been proposed to recondition the matrix DX′DX [65], in the final analysis our search procedure may end up being exquisitely sensitive to the choice of the initial parameter vector θ(0).
Even when our objective function V has an interior global minimum, where it is at least locally convex in the neighborhood surrounding the optimum, we may still encounter an inverted ridge or ravine where the parameters are highly correlated. There may indeed be a unique value θ* at the very bottom of the ravine that achieves the global minimum V*. But along the floor of the ravine, there is a one-dimensional curve containing θ* where V is almost equal to V*. Strictly speaking, the parameters are structurally identified, but they are not practically identified [66].
To confront these problems of multiple equilibria and correlated parameters in the applications below, we took care to construct and inspect maps of the least-squares criterion V and the gradient DV as functions of the parameters. We also examined the path of successive updates θ(k) in parameter space generated by our Newton-Raphson search algorithm.
Confidence Intervals; Reparametrizing
Our reliance on the nonlinear least squares criterion V in equation (5) permits us to estimate confidence intervals for the estimated parameters [48]. Conditional upon N, the variance-covariance matrix of the parameters θ = (β, α, i0) is C = s2(DX′DX)−1, where s2 = V/T, and where DX and V are evaluated at the optimum. The standard errors are the square roots of the diagonal elements of C. The symmetric 95 percent confidence intervals, based upon the assumption of an asymptotic normal distribution, can be evaluated as ±1.96 standard errors about the estimates. We can then use the Delta method [67] to compute the standard errors and confidence intervals around nonlinear functions of the parameters, such as the basic reproduction number ℛ0 = β/γ = β/(1 − α) and the mean duration of infection 1/γ = 1/(1 − α).
In some applications where the data yt exhibit substantial variability and the resulting value of s2 is large, the estimated symmetric confidence interval surrounding α may exceed the allowable range of 0 ≤ α < 1. In such cases, we can reparametrize, instead specifying θ = (β, ω, i0) with α(ω) = 1/(1 + e−ω), where the substitute parameter ω is unbounded. Our expression for DIt in equation (9) will remain valid, but with the modification that . Our expression for D2I in equation (14) will become:
The additional term on the right-hand side of equation (17), which does not appear in (14), contains the matrix , where .
Recovering the Population Size Parameter N
While X(θ) has no closed-form expression in terms of θ, it turns out that each element Xt is a linear function of the population size parameter N. That is, Xt can be written in the form φt(θ)N, where φt(θ) is a function of the remaining parameters θ.
To prove this result, we only need to show that the state variables St and It are themselves proportional to N. Thus, if for all times t = 1, …, T, we can show that St = ft(θ)N and It = gt(θ)N for some functions ft(θ) and gt(θ), then from equation (3), we would have Xt = βSt−1It−1/N = βft−1(θ)gt−1(θ)N. Thus, φt(θ) = βft−1(θ)gt−1(θ).
We can prove that St and It are proportional to N by mathematical induction. From equations (2a) and (2b), respectively, we known that S0 = (1 − i0)N and I0 = i0N. So, the proposition is true for t = 0. Now suppose that St = ft(θ)N and It = gt(θ)N, at time t. We claim that St+1 = ft+1(θ)N and It+1 = gt+1(θ)N necessarily hold for some functions ft+1(θ) and gt+1 (θ). From (1a), we can write . We have , and so . Similarly, , and so gt+1 (θ) = gt(θ)α + βft(θ). ▪
The fact that each output variable Xt is proportional to N permits us to employ an iterative procedure resembling the expectation-maximization (or EM) algorithm [68] to recover an estimate of that parameter. Let’s say that we have a parameter estimate N(n) at iteration n of the algorithm. We employ the previously described procedure to estimate θ(n) conditional upon N(n). That corresponds to the maximization (or M) step. (From the coding standpoint, when apply the Newton-Raphson parameter search rule described in (15) to estimate θ(n), we’re looping on k within a loop on n.) Given both N(n) and θ(n), we have output-variable estimates for all t. We now compute:
That is, we compute an adjustment factor κ(n+1) by regressing y on X(n). For the expectation (or E) step, we update the population size as N(n+1) = κ(n+1)N(n). This iterative procedure will work for any optimization criterion V and not just for the least squares criterion described above.
Non-Identifiability of the Initially Resistant Population
We have thus far assumed in equation (2c) that no one in the population was initially resistant to infection, that is, R0 = 0. Now let’s assume more generally that a non-negative fraction 1 >r0 ≥ 0 of the population is already resistant at t = 0. We thus replace the restricted initial conditions (2) with the following more general initial conditions:
So long as the R0 initially resistant individuals mix homogeneously with those in the susceptible and infected states, equation (3) governing the output variables Xt as well as the equations of motion (4) for the state variables St and It remain unchanged.
Apart from our original parameter vector θ = (β, α, i0) and the population size N, it appears that we now have an additional parameter r0 to be estimated. It turns out, however, that r0 cannot be identified in our discrete SIR model from the observed data y alone [54].
To see why, let’s suppose that we estimate our more general model conditional upon some fixed value of r0. We denote the resulting minimum least squares estimates as θ*(r0) and N*(r0) to show their dependence on the value of r0 chosen. We denote the corresponding individual components of θ*(r0) as β*(r0), α*(r0), and . Our original parameter estimates conditional upon r0 = 0 thus correspond to β*(0), α*(0), , and N*(0). Then the following conditions hold for all 1 >r0 ≥ 0:
What’s more, the paths of the output variable Xt and the state variables St and It at the optimum values of the parameters, but not the state variable Rt, will be independent of r0. Accordingly, the optimum value of V will likewise be independent of r0.
We again rely upon mathematical induction to prove that the state variables St and It at the optimum are independent of r0. First consider t = 0. For any value of r0, we have from (19a) that . Applying (20c) and (20d), we get , which does not depend on r0. Similarly for any value of r0, we have from (19b) that . Again applying (20c) and (20d), we get , which is likewise independent of r0. Now assume that St and It are independent of r0. We show that St+1 and It+1 must also be independent of r0. From (1a), we have St+1 = St(1 − β*(r0)It/N*(r0)). Applying (19a) and (19d), this expression resolves to St+1 = St(1 − β*(0)It/N*(0)). From (1b) and (3), we have It+1 = It(α + β*(r0)St/N*(r0)). Again applying (20a) and (20d), this expression similarly resolves to It+1 = It(α + β*(0)St/N*(0)). ▪
Interpreting the Estimated Population Size N: Underreporting and Incomplete Mixing
While the additional parameter r0 is not identifiable from the data y alone, we might still be able to identify it from other data. Suppose, for example, that we had sharp prior information on the basic reproduction number ℛ0 = β/γ = β/(1 − α). Let’s denote this prior estimate . We first compute the basic reproductive number implied by our parameter estimates conditional upon r0 = 0. So long as , we can then use our prior information to conclude that .
Why couldn’t we similarly take advantage of prior information on the population size to ascertain the initial proportion r0 of resistant individuals? Suppose we knew from census data that the population contained individuals. We would first estimate N*(0) conditional upon r0 = 0, and then, so long as , we would have the estimate . Unfortunately, reliance solely on census data for prior estimates of population size N is complicated by two other phenomena: underreporting and incomplete mixing.
In many contexts, particularly in recent applications of the SIR and related compartmental models to COVID-19 incidence, it has been widely recognized that a significant number of incident cases may have gone unreported [69]. In the absence of reliable information on the temporal pattern of such underreporting, the most parsimonious approach to this phenomenon has been to assume a case identification ratio equal to a constant p < 1 [70-72]. In that case, our model would need to be modified to accommodate the reality that our reported incidence data yt are in fact estimates of pXt rather than Xt. We’ve already learned, however, that Xt = φt(θ)N, where φt(θ) is a function of the remaining parameters θ, and this is the case even in our more general model admitting r0 >0. Accordingly, our reported incidence data yt are really estimates of pXt = φt(θ)(pN), and the estimated population size derived from our model is really an estimate of pN.
We thus have a knotty problem of confounding. We can estimate population size N*(0) assuming that there are no initially resistant individuals and no underreporting. If we had census data , we could account for both phenomena, writing . Without more information, we cannot separately identify p and r0.
We would ordinarily interpret the parameter N to gauge the size of the population at risk for contagion. This population would consist of all individuals who homogeneously mix with each other in accordance with the law of mass action embodied in equations (1) and (3). Many investigators, however, have properly recognized that the underlying assumption of homogeneous mixing may not apply to the entire population [19, 73-78].
Let’s say we’re analyzing an outbreak on a college campus with known student enrollment . When we run our model with r0 = 0, we obtain an estimate N*(0) that is, say, only about 10 percent of . This finding does not necessarily imply that 90 percent of the student body was resistant or that only one in ten cases was reported. Instead, it may mean that only small fraction of the student body was directly involved in the mixing process that generated the outbreak. A good example would be the COVID-19 outbreak at the campus of the University of Wisconsin-Madison in September 2020, where total student enrollment was , but where the large fraction of cases was concentrated in two on-campus student residence halls with a combined population of about 2,932 [79].
Any comparison between the estimated population size N*(0) and the census statistic thus entails three confounded interpretations. Some cases have gone unreported. Some individuals may be initially resistant. And other individuals may be susceptible but remain outside the core population through which the infectious agent has spread.
Adapting the Model to Data on Removals
Our analysis has thus far assumed that each data point yt is an observation on the corresponding output variable Xt capturing the number of individuals who transitioned from the susceptible to the infected state at time t. Alternatively, we can construct an analogous model where the data point yt is instead an observation on a distinct output variable Zt capturing the number of individuals who transitioned from the infected to the removed state at time t. This approach would be important if we had data on the number of deaths during an epidemic of a disease with a near 100 percent fatality rate, as KM observed in the case of 1905-1906 plague outbreak on the Isle of Bombay [5].
In this alternative model, the output variable Zt is:
Continuing with the least squares framework, the objective function is now:
To minimize this objective function employing the same strategy above, we will need to compute the gradient vector DZt and the Hessian matrix D2Zt. From (25), we immediately have:
The corresponding Hessian matrix is:
To compute these two expressions, we will need the gradient DVt and Hessian D2Vt, but these respective quantities were already computed in equations (9) and (14) above. To complete our estimation procedure, we note that the output variables are similarly linear functions of the population size parameter N, so that the EM-type algorithm described above likewise applies.
Working with Aggregate Data
In some applications, we may have only aggregate data reported over a coarse time scale, rather than the finely divided time axis assumed so far. For concreteness, let’s assume that the underlying SIR epidemic model is valid when the time axis t = 0, 1, …, T is marked off in days, but we only have data ym on the cumulative number of deaths during each 7-day week, indexed by m = 1, …, M. Days t are mapped into weeks m by the relation , where the floor operator ∥x∥ maps into the largest integer j such that j ≤ x. We further assume that T = 7M, so that week M, which corresponds to the last week, ends on day T.
Now define the M × T aggregation matrix W with element wmt = 1 if and only if m = h(t). Otherwise, wmt = 0. While our finely disaggregated model generates daily values Zt of the output variable from the underlying parameters θ, our data ym represent observations on the aggregated values , where w denotes row m of the matrix W and where Z denotes the T × 1 column vector with coordinates Zt. (From the computational standpoint, we’re calculating the integral required to convert a continuous- to a discrete-time SIR model [40].) In vector notation, our least squares criterion becomes:
Let DZ denote the T × 3 matrix whose row t is the gradient DZt, and let W(DZ) denote the M × 3 matrix derived by multiplying the M × T matrix W by the T × 3 matrix DZ. The gradient of V is then given by:
Let w denote row m of the matrix W, so that represents the predicted value. The corresponding Hessian matrix becomes:
Accordingly, we can run our SIR model to generate the state variables Zt and then calculate the derivatives DZt and D2Zt in the disaggregated time scale and then apply the aggregation matrix W to weight the results.
Data
Omicron Wave, December 2021 – March 2022, New York City
We studied the reported daily incidence of COVID-19 during the SARS-CoV-2 Omicron/BA.1 wave of December 2021 – March 2022 in New York City, NY, United States, a city of population 8.49 million. Our data consisted of daily counts of cases reported by the New York City department of health [80], where the date of report was intended to be the date when a positive test was performed or when the diagnosis of COVID-19 was otherwise made.
Our data showed systematic variation in case counts by day of the week, with many fewer cases diagnosed over the weekends. To account for these fluctuations, we converted the raw case counts ct into centered 7-day moving averages, that is, , where t indexes the date of report. Figure 1 shows the raw counts of reported cases per day ct (connected gray datapoints) as well as the daily case counts yt adjusted for the day of the week (red datapoints). We relied upon the adjusted daily counts yt to estimate our SIR model parameters.
Plague Outbreak, December 1905 – July 1906, Isle of Bombay
Figure 2 displays the average daily number of reported deaths from plague in the Isle of Bombay during each week over a 31-week period, beginning with the week starting Sunday, December 17, 1905, and continuing through the week starting Sunday, July 15, 1906. The average daily deaths were computed from the weekly totals plotted by KM in their classic 1927 paper [5]. The height of each bar is the average daily number of deaths during that week. The area of each bar equals the reported number of deaths during that week.
In our analysis of the plague death data, we interpreted the weekly death counts ym, indexed by m = 1, …, 31, as observations on the aggregated values , where Zt are the underlying output variables with daily index t = 1, …, 217, defined in equation (21), and where wmt are elements of the aggregation matrix W described above. We estimated the parameters of our SIR model by minimizing the least squares criterion V defined in equation (25).
Data Processing and Computation
The raw COVID-19 case counts (ct) were downloaded from the New York City data portal [80] in comma-separated-value (CSV) format and converted via Stata Statistical Software Release 17 [81] into internal Stata (DTA) format. The 7-day moving averages (yt), displayed in Figure 1, were computed via Stata programming language [82]. The data on weekly deaths from plague in the Isle of Bombay were taken from the graph on page 714 of the original KM paper [5]. All code for parameter estimation described in the Statistical Methods section above was written in Mata [83], a matrix programming language embedded within Stata. Calculations were carried out on a MacBook Pro with a 2.3 GHz 8-Core Intel Core i9 processor.
Results
Omicron Wave, December 2021 – March 2022, New York City
Figure 3 shows the predicted values of the output variable Xt as a connected curve superimposed on the observed datapoints yt for the Omicron wave in New York City. The computation time to achieve convergence of our Newton-Raphson algorithm was 0.93 sec.
Table 1 summarizes the resulting parameter estimates. At t = 0, corresponding to December 3, 2021, the infected proportion i0 was an estimated 0.8 percent. The estimated population-size N was about 12 percent of the city’s total population. The estimated basic reproduction number ℛ0 was on the order of 4. The estimated mean duration of infectivity 1/(1 − α) was in the range of 2 to 3 weeks. The 95% confidence intervals were estimated conditional upon N, as described above.
Figure 4A plots the least squares criterion V (blue curve, left axis) and the first partial derivative ∂V/∂β (red curve, right axis) as functions of the parameter β. The remaining parameters have been held constant at their estimated values. The criterion V reaches a minimum at the optimum β = 0.233, where ∂V/∂β = 0. The function V is convex in the interval from β = 0.169, where ∂V/∂β reaches a minimum, to β= 0.352, where ∂V/∂β reaches a maximum.
Figure 4B above displays the analogous plot of V and ∂V/∂α as functions of the parameter α, where the remaining parameters are similarly held constant at their optimum values. The criterion V reaches a minimum at the optimum α = 0.941, at which point ∂V/∂α = 0. The function V is convex in the interval from α = 0.868, where ∂V/∂α reaches a minimum, to α = 1, the boundary of admissible values of α, where ∂V/∂α remains positive.
Figure 5 plots the projection of V onto the (α, β) plane. Again, the remaining parameters (i0, N) were held at the optimum values given in Table 1 above. The darkest area represents a ravine where V attains its lowest values. The yellow point in the center is the optimum (α, β) = (0.941, 0.233) where V is minimized.
Plague Outbreak, December 1905 – July 1906, Isle of Bombay
Figure 6 reports our analysis of data on deaths from plague during December 1905 – July 1906 in the Isle of Bombay, originally reported and studied by KM [5].
As in Figure 2 above, we have displayed the raw data on deaths as a bar graph, where each vertical bar covers one week, indexed m = 1, …, 31, where the height of each bar is the reported average daily deaths that week, and where the area of each bar is the reported number of weekly deaths (ym). The superimposed green curve represents the fit to the data, based upon the KM’s approximation , where sech represents the hyperbolic secant, and where we have divided their published formula by 7 to convert their estimates into daily units. The green curve thus corresponds to the authors’ estimates of the underlying output variable Zt.
The red curve in Figure 6 connects our predictions of the daily values Zt for t = 1, …, 217. The underlying parameter estimates were: β = 0.610, with 95% confidence interval (0.537, 0.683); α = 0.44 (0.376, 0.515); i0 = 2.80 × 10−5 (1.26 × 10−5, 4.34 × 10−5); and N = 5.103 × 104. The estimated basic reproduction number was ℛ0 = 1.101 (1.094, 1.107), while the estimated mean duration in the infected state was 1/(1 − α) = 1.804 (1.579, 2.031).
Additional Results
In addition to our main results above, we performed various ancillary analyses, which are reported in appendices. Appendix B provides an approach to computing the unconditional confidence intervals around the parameter estimates for the New York City Omicron wave. In Appendix C, we display the two- and three-dimensional paths of successive parameter estimates derived from our application of the Newton-Raphson algorithm to the New York City data. In Appendix D, we report the results of a robustness test, where we re-estimate our SIR model for New York City after multiplying the original case counts yt by mean-preserving, lognormally distributed noise.
In Appendix E, we account for the observation that some of the SARS-CoV-2 cases reported during the rising phase of the New York City epidemic wave of Figure 1 were in fact infections by the Delta variant, while some of the cases reported during the declining phase were in fact Omicron BA.2 infections. To that end, we reran our SIR model after multiplying the original case counts yt by the estimated proportion of Omicron BA.1 cases at each date t. We also reran our SIR model on the original case counts yt, but restricted the observation interval to those dates t during which the BA.1 variant was estimated to comprise at least 80% of all cases.
In Appendices F and G, we report estimates of our SIR model for two other jurisdictions during the COVID-19 epidemic. Appendix F shows our estimates for the SARS-CoV-2 Omicron wave in Los Angeles County, CA, during the same time period covered by our study of New York City above. Appendix G shows our estimates based upon an outbreak of SARS-CoV-2 at the University of Wisconsin-Madison during September 2020, originally reported in a study of the potential super-spreader influence of a nearby cluster of local bars [79].
Finally, Appendix H provides a diagnostic plot of the least squares criterion V as a function of the parameter β for the Isle of Bombay plague data, comparable to Figure 4A for New York City above.
Discussion
Is SIR the Correct Structural Model for Omicron?
The mere fact that we have found a workable solution to the SIR inversion problem does not necessarily mean that the Susceptible-Infected-Removed model is the most appropriate description of the data under study. Our illustrative analysis of the SARS-CoV-2 Omicron/BA.1 wave in New York City during December 2021 – March 2022 cogently brings home the point.
Figure 3 shows a tight fit of the SIR model-predicted curve Xt to the case incidence datapoints yt for New York City. Figures 4A, 4B, and 5 confirm that the estimated values of the parameters β and α are indeed situated at the minimum of a convex region of our least squares criterion function V. Appendix C confirms that successive updates of the parameter vector θ(k) generated by the Newton-Raphson algorithm follow a path along the convex surface of V. Appendix D confirms that our estimates are robust to the inclusion of noise in the observed data.
Despite these indicators that the underlying model was not somehow ill-conditioned, the mean duration of infectivity, computed as 1/(1 − α), was estimated from the New York City data to be on the order of 17 days, with a 95% confidence interval of two to three weeks. That’s way out of line with what is known from direct clinical measurement. In one cohort of 55 symptomatic Omicron-infected patients, only 13.5 percent continued to shed virus ten days after infection [85]. Yet our New York City-based parameter estimate for α would give the proportion remaining infectious after ten days at α10 = 54.4%.
To be sure, some of our alternative analyses yielded lower estimates of the mean duration of infectivity. When we attempted to exclude Delta and BA.2 infections from the original New York City case counts yt, the estimated mean duration dropped to about 10 days (Appendix E). When we reran our SIR model on the Omicron wave in Los Angeles County, the estimated mean duration was about 9 days (Appendix F). But these estimates are still don’t come close to those based on the available clinical data. When we reran our SIR model on an outbreak at the University of Wisconsin-Madison in September 2020, we obtained an estimated mean duration of about 7 days (Appendix G). But that outbreak was quite likely driven by the ancestral strain of SARS-CoV-2, and certainly not by Omicron.
Despite its remarkable fit to the data, SIR may thus fail as a structural model of the Omicron wave, even if it apparently succeeds as a reduced form model [86]. While its parameters β and α can indeed be backed out from the data and the inversion problem solved, these parameters do not necessarily warrant the structural interpretation that we assumed in our exposition of the SIR model in equations (1) through (5) above. That is not to say, however, that an adequate reduced form model is incapable of making accurate projections [87].
We should not, however, abandon the possibility that SIR may indeed be the correct structural model of the Omicron wave. To the contrary, the parameter estimates may reveal a critically important characteristic of the Omicron variant that clinical studies, which are necessarily biased toward symptomatic patients, have so far missed. In effect, the Omicron wave was so massive not because the variant had such a high per-contact infectivity (through the parameter β), but because a much larger proportion of asymptomatic infected individuals remained persistently infectious (through the parameter α). We also need to consider that the degree of persistent infectivity is not determined solely by biological characteristics of the virus, but also by the extent to which infected individuals isolated themselves from others. That behavioral component of the parameter α may have changed during the Omicron wave.
It might be argued that there is a third plausible interpretation, namely, that SIR may indeed be the correct structural model, but that the case incidence data analyzed here were inadequate to separately identify the key parameters β and α. That is, our four-parameter version of SIR is what some physicists have described as a “sloppy” model [53, 55]. Indeed, the dark ravine in the surface map of V in Figure 5 points to a high correlation between the two parameters in the topological neighborhood of the global optimum in the (β, α) plane, an observation that has been noted previously [71]. That finding suggests that the basic reproduction number ℛ0 = β/γ = β/(1 − α) is the only identifiable or “stiff” parameter.
Still, the results in Appendix C, which follows the path of the Newton-Raphson search algorithm in parameter space, suggest that β and α are indeed separately identified from the available count data. In Figure C2, as the algorithm approaches convergence to the minimum of the criterion V, the implicit derivative dα(k)/dβ(k) does not approach a constant value, as we would expect in the case of non-identification.
What About the Results for the Plague?
Not only do our SIR predictions fit reasonably well to the observed data on Bombay plague deaths in Figure 6, but they virtually coincide with the curve of the approximate closed-form solution drawn by KM [5]. While others have fitted SIR-type compartmental models to the 1905–1906 Bombay plague data [88], our study appears to be the first replication of the authors’ century-old result.
The plague, caused by the bacterium Yersinia pestis, can spread through human populations by various transmission pathways. Bubonic plague, generally thought to be the cause of a series of Bombay outbreaks that began in 1896 [89-91], is transmitted to humans through rat fleas [92], but transmission between humans also appears to occur via infected human fleas or body lice [93]. Pneumonic plague, by contrast, is transmitted by direct human-to-human transmission via respiratory droplets and can occur as a complication of bubonic plague [94, 95]. Pneumonic plague is more lethal, but transmission requires closer person-to-person contact and is thus thought to have a lower basic reproduction number ℛ0 [94].
So, does our SIR model similarly succeed as a reduced form model but fail as a structural model of the Bombay plague outbreak? Several structural compartmental models of plague transmission have been tested, typically involving separate states and parameters for humans and vectors [88, 89, 95-97]. In a rat-flea transmission model for bubonic plague, for example, the structural model contained a separate sub-epidemic module for rats with a distinct basic reproduction number for rat-to-rat transmission [97]. A human-flea-human structural model appeared to have the best fit to nine plague outbreaks during Europe’s Second Pandemic from 1348 to 1813 [95]. Our minimalist SIR model has none of these structural features.
These general observations, however, hardly exclude the possibility that the pneumonic form may have been responsible for the 1905-1906 outbreak studied by KM. In his classic 1926 Treatise on Pneumonic Plague [98], Wu Lien-Teh acknowledged the 1908 Indian Plague Commission’s conclusion that pneumonic constituted less than 3 percent of cases overall. (See [98] at p. 9.) But he did not hesitate to point out that some “well-characterized pneumonic outbreaks are on record.” (p. 90-92.) “In fact,” he added, “it was this great incidence of respiratory diseases in Bombay which attracted Childe’s attention and led to his finally establishing the pneumonic form of plague as a distinct entity.” (pp. 11-12). He further noted an observation bias in favor of bubonic plague due to the rapid mortality from pneumonic “…because, as a rule, such patients succumb so quickly that they seldom reach the hospital.” (p. 269).
What’s more, the parameter estimates derived from our solution to the inversion problem are strikingly consistent with what is known about pneumonic plague transmission. Lien-Teh’s data suggested a median duration of illness of about two days from the onset of symptoms to inevitable death from heart failure ([98] at p. 249, 260). Our point estimate of the parameter α = 0.446, implying an average infectious period of 1.8 days, is similarly consistent with more contemporary estimates of the parameters of pneumonic human-to-human transmission [95]. Lien-Teh further stressed a very low early infectivity due to the initial lack of cough (p. 296). Transmission through contact at close range, he stressed, appeared to have been required (p. 299). His observations, along with more contemporary estimates of pneumonic human-to-human transmission [94, 95], are consistent with our estimate of a basic reproduction number ℛ0 = 1.1.
The plague may indeed have been initially imported into Bombay in its bubonic form through rat-flea-human transmission, and the disease may have been sustained between outbreaks either by repeated importations or by an urban reservoir of selectively immune rats [89]. Still, our data are consistent the intriguing hypothesis that responsive isolation measures by British authorities [90] may have altered the mode of transmission during the 1905–1906 outbreak from the bubonic rat-flea-human mode to the pneumonic human-human mode, a phenomenon observed in the 17th century when the Derbyshire village of Eyan fell victim to the Black Death [99]. If so, our parsimonious SIR model is in fact the appropriate structural model of the outbreak.
Why are the Estimated Population Sizes N So Small?
Table 1 reports estimated population size parameters of N = 1.013 million for New York City, while Appendix Table F1 gives N = 1.046 million for Los Angeles County. As noted, these estimates of N were only 11.9 percent of the total census population of New York City and 14.5 percent of the corresponding census population of Los Angeles County. Where, then, did all the other millions of inhabitants go?
As discussed above, we cannot distinguish between three possible explanations of the shortfall. First, there is concrete evidence of significant underreporting of Omicron infections, due principally to the widespread availability of home rapid antigen testing [69], particularly in New York City [100, 101]. Second, despite the evidence of immune escape on the part of the Omicron variant [102], a substantial fraction of the population may still have retained long-term cellular immunity, particular through the administration of multiple vaccine doses [103]. Third, contrary to the underlying assumption of homogeneous mixing, there is strong evidence that a substantial proportion of the population avoided retail establishments, drinking and eating places, transportation venues, worksites and other high-risk locations [104].
Our analysis of the Bombay outbreak data gave an estimated size parameter N = 51 thousand individuals, which comes to about 6.5 percent of estimated Bombay population of 775 thousand in 1900 [105]. The same three factors could well be responsible for the shortfall. In particular, there is significant evidence for the acquisition of at least humoral immunity against Yersinia pestis among plague survivors [106].
Limitations and Extensions
Parameter search algorithms such as Newton-Raphson [46] and Gauss-Newton [47] have superior performance when they rely on closed-form expressions for the gradient vector and matrix of second derivates, rather than on numerical approximation [45]. Still, as Figures 4A and 4B teach us, there will nonetheless be a restricted region of the parameter space where the least squares criterion function is convex. Some of these regions may contain local rather than global optima. To avoid such problems of implementation, we plotted the least squares criterion V and its first partial derivatives as functions of the parameters. Still, finding the right initial values and keeping the parameter search within bounds remain unavoidable challenges.
As seen in Figure 1A and Appendix Figures F1 and G1, striking day-of-the-week effects on COVID-19 testing and case reporting required us to pretreat the raw case counts. Our centered, 7-day moving average appeared to be the most flexible nonparametric approach to this task. Alternative approaches that incorporate parametric models of day-of-the-week effects directly in our model of the output variable Xt have yet to be tried.
We have focused sharply on the original SIR model, rather than its numerous variations. Still, our basic approach can be extended to these more complex models. Our findings offer a caution, however, that such models as SEIR, which require an additional parameter governing the transition from an intermediate exposed state to the infected state, may very well turn out to be sloppy [71].
One exemption may be the well-studied SIRS model [59, 107], where individuals in the R (recovered) state can transition back to the S (susceptible) state as a result of waning immunity. The SIRS model still has three states, and the definitions of the output variables Xt and Zt in equations (3) and (21), respectively, remain unchanged, but the parameter vector θ has an additional component governing the rate of transition from R back to S. Since the SIRS model is known to admit oscillations with a stable endemic equilibrium [59, 108], we would be in a position to run our parameter recovery algorithm on multiple waves of data yt. That is a task for future research.
Data Availability
The data and programs used in this work are publicly available on the Open Science Framework at https://osf.io/3b4hv/.
Declarations
Sole Authorship
JEH is the sole author of this work. He alone is responsible for the conceptualization of the work, the analysis of the data, the drafting of the manuscript, and the construction of the graphics. He warrants that it is entirely his original work. No requests for permission to use copyrighted material are required.
Acknowledgments
The opinions expressed here are solely those of the author and do not necessarily reflect those of the Massachusetts Institute of Technology, Eisner Health, or any other organization or individual. I gratefully acknowledge the helpful comments of the following individuals on Version 1 of this manuscript: Julio R. Banga (Consejo Superior de Investigaciones Científicas, Spain), Keith Godfrey (University of Warwick, UK), Weiguo Han (University Corporation for Atmospheric Research, USA), Siran Keske (Koç University School of Medicine, Turkey), Eduardo Massad (University of Sao Paolo, Brazil), Thomas Michelitsch (Sorbonne Université, France), Dimiter Prodanov (IMEC, Belgium), Emerson Sadurni (Benemérita Universidad Autónoma de Puebla, Mexico), Reinhard Schlickeiser (Ruhr-Universität Bochum, Germany), Boris Schmid (University of Oslo, Norway), and James A. Yorke (University of Maryland, USA).
Availability of Data and Programs
The data and programs used in this work are publicly available on the Open Science Framework at https://osf.io/3b4hv/.
Human Subjects Declaration
This study relies exclusively on publicly available data that contain no individual identifiers. Links to the data sources are provided in the manuscript references.
Competing Interests Declaration
The author has no competing interests.
Funding Statement
This study did not receive any funding.
Appendix A. Poisson Likelihood Function
While we relied upon the nonlinear least squares criterion V, as defined in equation (5) in the main text, to develop our estimation strategy, we could have employed other optimization criteria. To illustrate, we consider the case where the observed observations y consist of count data on the number of reported cases, and assume that each observed data point yt is independently Poisson distributed [71] with mean Xt. The joint likelihood of combined observations will be:
As before, we assume that each output variable Xt is a function of the vector of underlying parameters θ. The log likelihood will then be:
In equation (A2), the additional term K = − log(yt!) does not depend on the underlying parameters θ. The gradient of the log likelihood function will then be:
The gradient expressed in (A3) differs from the gradient in equation (6) only in terms of the weighting factors in each summation term. The Hessian matrix of second-order derivatives is:
While the Hessian matrix in equation (10) above was positive definite reflecting an objective function V to be minimized, here the Hessian matrix is negative definite reflecting an objective function ℓ to be maximized. The remaining computations of the gradients DVt will be computed iteratively exactly as described in the main text.
Given these expressions for the gradient Dℓ and Hessian D2ℓ of the likelihood function, we can then use the Newton-Raphson algorithm [46] to find maximum likelihood estimates.
Appendix B. Computation of Unconditional Confidence Intervals: New York City Omicron
In the main text, we estimated the variance-covariance matrix of the parameters θ = (β, α, i0) conditional on N as C = s2(DX′DX)−1, where s2 = V/T, where DX was the T × 3 matrix whose rows were the gradient vectors DXt defined in (7), and where DXt and V were evaluated at the optimum. In this Appendix, we offer one possible approach to computing the unconditional variance-covariance matrix of the combined parameters (θ, N) and display the comparative results for the data on the 2021-2022 Omicron wave in New York City.
We again rely upon [48] to compute the unconditional variance-covariance matrix , where s2 = V/T is estimated as above, but where is the T × 4 matrix whose rows are the augmented gradient vectors . In contrast to equation (7) in the main text, we compute:
Here, we define DX = (1,0,0,0)′ and DX = (0,0,0,1)′ as unit vectors in four dimensions. Equation (B1) is used only to evaluate the augmented matrix and was not used to estimate the optimum values of the parameters. The quantities and V continue to be evaluated at the optimum values of the parameters, which remain unchanged from the main text.
Table B1 compares the conditional and unconditional confidence intervals.
Appendix C. Search Path of the Newton-Raphson Algorithm: New York City
In Figure 5 of the main text, we mapped the least squares criterion V as a function of the parameters (β, α). The optimum point, we found, was situated along a ravine where β and α appeared to be highly correlated. Here, we study the path of parameters θ(k) during successive iterations of the Newton-Raphson algorithm described in equation (15) of the main text.
Figure C1 shows the projection of the path of θ(k) onto the (β, α) plane. Rather than apply the EM algorithm to estimate the population-size parameter N, as described in equation (18) of the main text, we conditioned on a fixed N = 1.013 × 106. We chose starting values , corresponding to point A. The step size was q = 0.1.
From the starting point A, the algorithm initially proceeded in the direction of increasing β and decreasing α. By point B, the algorithm had reversed course and began to follow along the ravine described in Figure 5 of the main text, ultimately reaching convergence at point C = (0.233, 0.940). At each iteration, the matrix D2V was found to be positive definite.
Figure C2 below plots the ratio of the partial derivative ∂V/∂β to the partial derivative ∂V/∂α as a function of the iteration number k during the evolution of the algorithm. This ratio is the negative of the implicit derivative dα/dβ along the search path. The ratio is slightly increasing in k along the segment of the search path where 87 ≤ k ≤ 122. Otherwise, the ratio continues to decline even after the path enters the ravine identified in Figure 5 of the main text.
Figure C3 above shows the path of the parameters in three dimensions, along with the corresponding points A, B, and C. The search path from B to C reflects not only the movement along the ravine of Figure 5, but also the progressive increase in the estimated parameter i0.
Appendix D. Robustness Test: New York City Omicron
Here, we report a robustness test of the parameter estimates based on the COVID-19 incidence data from New York City. To that end, we re-estimated the SIR parameters after multiplying the original incidence data by random, lognormally distributed noise.
Figure D1 above plots the noise-augmented COVID-19 incidence data (red datapoints) and the resulting least-squares predicted incidence (orange curve), along with the predicted incidence based on the original data (blue curve). The noisy data were generated as , where yt were the original data, and where the components ηt were independent draws from a Gaussian N(μ, σ2) distribution. The parameters were calibrated as and so that the lognormal random variable had mean and standard deviation .
Table D1 compares the parameter estimates based on the noise-augmented data with those derived from the original data. For these calculations, we relied on the re-parametrized θ = (β, ω, i0) with α(ω) = 1/(1 + e−ω), as described in the Confidence Intervals section in the main text. The estimated confidence intervals are conditional on the estimated population size parameter N. Both Figure D1 and Table D1 indicate that the underlying model parameters β, α, i0, and N were reasonably well conserved after the original data were multiplied by sample mean-preserving noise. As anticipated, the confidence intervals surrounding the parameter estimates were considerably wider.
Appendix E. Accounting for Different Variants of SARS-CoV-2: New York City
The Omicron BA.1 variant of the SARS-CoV-2 virus was far-and-away the most important contributor to the massive surge of reported infections observed in Figure 1 of the main text. Still, the initial phase of the surge overlapped the tail end of the prior Delta wave, while the terminal phase saw the gradual emergence of the Omicron BA.2 variant. In this Appendix, we repeat our SIR analysis on the data for the Omicron BA.1 variant alone.
Figure D1 plots the estimated daily proportions of the three variants (Delta, Omicron BA.1, and Omicron BA.2) in New York City during the period of observation of our study. Estimates of the average weekly proportions, represented by the solid datapoints, were available from a compilation of genomic tests of viral samples maintained by the New York City health department [109]. We employed the Stata pchipolate (piecewise cubic Hermite interpolation) routine [110] to estimate the proportions for the intervening days, as represented by the connecting curves.
Figure E2 below compares the SIR estimates derived from the Omicron BA.1 data alone (shown in orange) with the corresponding estimates reported in Figure 3 of the main text (shown in green). Elimination of the Delta cases resulted in a discernable attenuation of the initial upswing. By contrast, elimination of the Omicron BA.2 resulted in only a small absolute decrease in the tail of the wave during February and early March of 2022. By the time Omicron BA.2 began to assume a significant proportion of cases in February, as shown in Figure E1, the massive wave of cases in New York City had already dissipated.
Table E1 compares the original parameter estimates shown in Table 1 of the main text with the corresponding estimates derived from the Omicron BA.1-only sample. With the attenuation of the initial upswing in cases, the estimated i0 drops by nearly half, while the basic reproduction number ℛ0 declines from roughly 4 to 3. What’s more, the estimated mean duration of infection falls to 10.3 days, but its 95% confidence interval still does not contain the U.S. Centers for Disease Control estimate of 5.5 days [111].
While the Omicron BA.1-only estimates might appear more reasonable, dropping the other variants from the database is hardly innocent. To see why, consider the hypothetical case where an infection by Virus X consistently caused a false positive test for SARS-CoV-2 but conferred no protection against COVID-19. In that case, it would be appropriate to downwardly adjust the total case counts by the estimated fractions of Virus X infections.
Here, however, we cannot assume that a Delta infection confers no protection at all against a subsequent Omicron BA.1 infection. To the contrary, the Delta infections seen at the onset of the wave in Figure E1 likely reduced the proportion of individuals susceptible to Omicron BA.1 [112]. Dropping the Delta cases would downwardly bias the model estimates of St and thus upwardly bias the resulting estimates of the parameter β.
To avoid these potential biases, we estimated our SIR model instead on counts of all reported SARS-CoV-2 cases during the truncated 65-day window from 12/17/2021 through 2/19/2022 when the estimated proportion of Omicron BA.1 cases exceeded 80 percent. Figure E3 graphs the observed and predicted counts, while Table E2 compares the parameter estimates from the original analysis in the main text to the estimates based on the truncated data set. The results reveal strong concordance between the two sets of estimates.
Appendix F. Omicron Wave, December 2021 – March 2022, Los Angeles County, U.S.A
We studied the daily incidence of COVID-19 during the SARS-CoV-2 Omicron/BA.1 wave of December 2021 – March 2022 in Los Angeles County, CA, excluding the cities of Pasadena and Long Beach, a jurisdiction with population 9.26 million, as reported by the Los Angeles County Department of Public Health [113]. As in the case of New York City [80], the date of report was intended to be the date when a positive test was performed or when the diagnosis of COVID-19 was otherwise made.
Figure F1 displays the raw data for Los Angeles County, derived in the same manner as the data shown in Figure 1 of the main text for New York City. To account for systematic variation in case counts by day of day of the week, we converted the raw case counts ct into centered 7-day moving averages, that is, , where t indexes the date of report. Figure E1 shows the raw counts of reported cases per day ct (connected gray datapoints) as well as the daily case counts yt adjusted for the day of the week (red datapoints). As in the main text, we relied upon the adjusted daily counts yt to estimate our SIR model parameters.
Figure F2 shows the predicted values of the output variable Xt as a connected curve superimposed on the observed datapoints yt for the Omicron wave in Los Angeles County. The fit of the data to the four-parameter SIR model is not as tight as for New York City, undershooting the peak reported incidence in early January 2022 by about 8 percent. The computation time to achieve convergence of the Newton-Raphson algorithm was 1.04 sec.
Table F1 summarizes the resulting parameter estimates. At t = 0, corresponding to December 3, 2021, the infected proportion i0 was an estimated 0.2 percent, as compared to 0.8 percent for New York City. The estimated population-size N was about 14.5 percent of the total county population excluding the cities of Pasadena and Long Beach. The estimated basic reproduction number ℛ0 was on the order of 2.5, considerably small than that estimated for New York City. The estimated mean duration of infectivity 1/(1 − α) was on the order of 9 days, likewise considerably smaller than the New York City-based estimate.
Appendix G. University of Wisconsin-Madison COVID-19 Outbreak, September 2020
In Figure G1, we display our SIR analysis of the data on an outbreak of SARS-CoV-2 at the University of Wisconsin-Madison during September 2020, originally reported in a study of the potential super-spreader influence of a nearby cluster of local bars [79].
Table G1 displays the estimated parameters and their conditional 95% confidence intervals. The estimated number infected at the start of the wave on 8/9/2020 was I0 = i0N = 0.7 (0.3, 1.1), which would imply that single infected individual (“Student Zero”) initiated the outbreak. The estimate of N is remarkably close to the combined census of 2,392 students living in the two residence halls that had to be locked down as a result of a high prevalence of infection.
Appendix H. Diagnostic Plot for the KM Data on Deaths from Plague, Isle of Bombay
Based upon the KM data on deaths from plague in the Isle of Bombay during 1905–1906, as shown in Figure 2 of the main text, Figure H1 above plots the least squares criterion V and the gradient ∂V/∂β as functions of the parameter β. As in corresponding Figure 4A in the main text, where the remaining parameters α, i0 and N were held constant at their optimum values.
Figure H1 shows that the criterion function V is minimized at β = 0.610, as reported in the main text in connection with Figure 6, at which point the first partial derivative ∂V/∂β equals 0. The function V(β) has an inflection point at β = 0.595, where the partial derivative ∂V/∂β reaches a minimum. In the interval β > 0.595, the function V(β) is convex.
Footnotes
↵§ See Author Declarations at the end of the manuscript for additional details.
Additional results have been reported in multiple appendices. Minor errors identified in Version 1 have been corrected.
References
- 1.↵
- 2.
- 3.
- 4.↵
- 5.↵
- 6.↵
- 7.
- 8.
- 9.
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.
- 17.↵
- 18.↵
- 19.↵
- 20.
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.
- 26.↵
- 27.
- 28.
- 29.↵
- 30.↵
- 31.↵
- 32.
- 33.
- 34.
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.
- 43.
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.
- 75.
- 76.
- 77.
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.↵
- 94.↵
- 95.↵
- 96.
- 97.↵
- 98.↵
- 99.↵
- 100.↵
- 101.↵
- 102.↵
- 103.↵
- 104.↵
- 105.↵
- 106.↵
- 107.↵
- 108.↵
- 109.↵
- 110.↵
- 111.↵
- 112.↵
- 113.↵