Abstract
We developed an elaborated susceptible-infected-recovered (SIR) individual-based model (IBM) with pathogen strain drift, waning and cross immunity, implemented as a novel Java Runtime-Alterable-Model Platform (J-RAMP). This platform allows parameter values, process formulations, and scriptable runtime drivers to be easily added at the start of simulation. It includes facility for integration into the R statistical and other data analysis platforms. We selected a set of parameter values and process descriptions relevant to the current COVID-19 pandemic. These include pathogen-specific shedding, environmental persistence, host transmission and mortality, within-host pathogen mutation and replication, adaptive social distancing, and time dependent vaccine rate and strain valency specifications. Our simulations illustrate that if waning immunity outpaces vaccination rates, then vaccination rollouts may fail to contain the most transmissible strains. Our study highlights the need for adaptive vaccination rollouts, which depend on reliable real-time monitoring and surveillance of strain proliferation and reinfection data needed to ensure that vaccines target emerging strains and constrain escape mutations. Together with such data, our platform has the potential to inform the design of vaccination programs that extirpate rather than exacerbate local outbreaks. Finally, our RAMP concept promotes the development of highly flexible models that can be easily shared among researchers and policymakers not only addressing healthcare crises, but other types of environmental crises as well.
Significance Statement Effective COVID-19 vaccine development has been unprecedented, but vaccinations are being delivered at contrasting rates across the globe. Here, using an innovative epidemiological Java runtime alterable modeling platform (J-RAMP), we demonstrate that seemingly reasonable vaccination programs may exacerbate rather than mitigate regional outbreaks when immune waning outpaces vaccinations and reinfection promotes escape mutations. Our simulations suggest that adaptive vaccination programs, requiring adequate strain and serology monitoring and surveillance, are needed to extirpate outbreaks. Our platform provides policymakers with an easy-to-use tool for designing effective vaccination programs. Our RAMP concept points the way to the development of highly flexible models that are easily shared among researchers and policymakers in all areas of systems analysis.
Classification Major: Biological Sciences; Minor: Biophysics and Computational Biology
Introduction
Vaccines have either eliminated, or all but eliminated, diptheria, measles, polio, rubella, small pox, and tetanus within developed countries, such as the USA [1]. The global COVID-19 pandemic has amply illustrated the overwhelming economic importance that vaccines can play in bringing an epidemic under control [2]. Unlike the above mentioned diseases and more inline with influenza, however, the SARS-CoV-2 pathogen may re-emerge with vengeance due both to waning immunity and proliferation of mutant strains or variants with increased transmissibility that escape both naturally-acquired and vaccine-induced immunity [3–8]. To investigate how effective vaccination programs may be in preventing future large-scale outbreaks of COVID-19 [9, 10], we built a Java Runtime-Alterable-Model Platform (J-RAMP) of an SEIVD (Susceptible, Exposed, Infectious, V=Immune & Vaccinated, Dead) individual-based model (IBM) [11–14] that includes the following processes (Fig. 1; see Materials and Methods, as well as Appendix A for details):
pathogen strain-specific shedding [15], environmental persistence [16], with-host replication [17] and mortality rates [18] (some prefer the term variant rather than strain)
pathogen strain drift during transmission and within-host replication [20]
an adaptive contact rate [21]
a time-dependent, uni- or multivalent vaccine rollout [22, 23]
Our SEIVD-IBM J-RAMP, is an example of our novel runtime alterable-model platform (RAMP) concept, which includes panels, windows and sliders that allow users to specify and manipulate model parameter values in real time, as well as modify process function descriptions, even during the course of a simulation, while protecting the integrity of the underlying code. Thus our SEIVD-IBM J-RAMP has epidemiological applications well beyond COVID-19, including diseases such as influenza [24], Ebola [25, 26], and any other directly transmitted disease [27]. To illustrate its application, we used our J-RAMP to explore the relative efficacy of uni- and multivalent vaccines applied at various time-dependent rates, where choice of valency may switch in response to realtime monitoring and surveillance data. Such adaptive vaccination programs may be required to combat the evolutionary arms race between vaccine efficacy and the evolution of new pathogen strains [20, 23, 28]. We demonstrate the possible con-sequences of vaccinating too few individuals in the context of the emergence of more transmissible strains, as well as the comparative efficacy of non-adapative versus adaptive vaccines responsive in realtime to current strain valency. We simulate plausible COVID-19 outbreaks over a three year period in which vaccination rollout commences at the start of year two and continues until the end of the year three.
Our study is meant to highlight issues related to vaccination rollout rates [10] and strain valency that may turn out to provide insufficient coverage of the population or being outrun by waning immunity in vaccinated hosts or those that have survived COVID-19. We undertake these simulation with the knowledge that though a typical set of parameters may cover most outbreaks of COVID-19 in various regions around the world, regional differences in the contact behavior of individuals, with additional contact rate time dependency in each locality due to social distancing regulations and behavior, have led to markedly different outbreak patterns across regions [29–31]. Thus the result presented here are meant to inform us of potential pitfalls and dangers in rolling out inadequate vaccination programs and extract general principles, where possible, as well as illustrate an approach to its local application rather than prescribe a universal program: adaptive programs need to operate in response to local surveillance data and contact rate conditions.
The local nature of contact rate patterns is well established as an important driver of outbreak patterns [30]. The definition of a contact where transmission may occur is rather challenging though. Effective or “close” contact may be defined, as in [32], in terms of direct physical contact or close conversation, or it may involve concepts of a threshold distance and time period [33]. Taken a step further, it may relate to infectious dose levels and the relationship between dose and severity of the infection of individuals [34]. This effective contact rate issue, however, is side-stepped by fitting models to local incidence data to obtain a concatenated contact-and-transmission rate parameter value that best fits local outbreak patterns, as discussed in depth elsewhere [30].
Our results also suggest that adaptive vaccination programs that closely monitor the emergence of new strains and adapt vaccine valency to match pre-dominating strains are much more likely to extirpate local epidemics than non-adaptive policies, though our simulation results are greatly influenced by assumptions regarding rates of waning immunity and strain cross-immunity characteristics as well. Accurate forecasting of the future course of an epidemic in terms of strain composition and vaccination program efficacy require data on the rates at which immunity wanes, how such immunity depends on particular strains, and the levels of cross immunity among strains [4, 6] that are more comprehensive than currently available. It also requires better data on within-host replication and mutation rates. Thus, in truth, we present our model and results fully aware that answers to questions relating the design of optimal region-specific vaccination programs await more reliable data on immunity processes, strain mutation rates [35], and the transmission potential and virulence of dominant strains [36]. We hope, however, that our results and subsequent investigations using our model provide the kinds of quantitative analyses that can help formulate highly effective local or country level vaccination programs that avoid many of the pitfalls revealed by our analysis, as well as encourage the adoption of rapidly adaptive vaccination programs.
Results
Single strain simulations
Our first simulation, which we regard as a “baseline run” has J = 0 (i.e., only one strain is possible), assumes a constant contact rate (κ(t) = κ0 for all t; which implies no social distancing or effective contact reducing actions taken) in an a homogeneously, well-mixed population of ten thousand individuals (N0 = 10, 000), with an immune waning half-life of one year (thalf = 365), and no vaccination rollout. The resulting simulation depicts a classic outbreak pattern (Fig. 2) in which incidence rises to a peak of about 250 cases around 96 days after “patient zero” has been introduced, and then burns out around day 166 days because an insufficient number of susceptible individuals are now available to keep the epidemic going. In the upper left-hand panel of our SEIVD-IBM J-RAMP dashboard in Fig. 2, we see that around 3/4 of individuals have been infected at the point where the epidemic burns out (V (166) = 7700, which is 77% of N0): this suggests that herd immunity may well be in the ball park of 75-80% (as supported by additional runs presented next), although this would be an underestimate if more transmissible strains come to prevail.
Repeated runs of this first simulation with different runtime seeds (viz., 10 runs with runtime seeds ranging from 0 to, where each seed generates its own, essentially unique, but repeatedly sequence of pseudo random numbers; Fig. 2 is for the case runtime seed = 0) produces the following means and standard deviations as a percentage of the population size (of S, V and D), as well as the length of the epidemic (days) (see Fig. C.1, Appendix C): Sfnl = 21.9 ± 0.6%, Vfnl = 76.4 ± 0.6%, Dfnl = 1.6 ± 0.1%, and tfnl = 167 ± 9 days. These results are almost the same when the initial population size is increased threefold to N0 = 30, 000 individuals, the primary difference being a small expected decrease in variability (around 1/3rd smaller) and slightly longer epidemic period (18% longer) (see Fig. C.1 for details). To obtain full plots of the incidence, prevalence, and daily-death trajectories from many runs of our SEIV-IBM, we exploited the R-platform integrability of our J-RAMP to generate mean and standard deviation plots of incidence, prevalence, and daily mortality, as described in Appendix B (and Fig. C.2).
Our second single-strain (J = 0) simulation (Fig. 3A) illustrates that, for the selected set of parameter values, if the rate of waning is tripled from that of our first (i.e., thalf is reduced from 365 days to 120 days) then herd immunity is still reached. Our third single strain simulation, however, shows that herd immunity is eluded for the selected set of parameter values when this rate is quadrupled (thalf = 90 days; Fig. 3B): viz., the population enters a periodic outbreak mode because previously infected individuals are sufficiently susceptible after several months post-initial infection to be reinfected and thus keep the epidemic going. Herd immunity in this caricatured population (i.e., relatively small, isolated, homogeneous) would also be reached within a year if the waning immunity half-life is increased back to thalf = 180, but not if an adaptive contact rate is in operation (Fig. 3C & D; k(t) specified by Eq. A.10). More specifically, for the case , that is the “effective” contact rate κ(t) drops from κ(t) = κ0 = 4 when I/N0 = 0 to κ(t) = κ0/2 = 2 when I(t)/N (t) = 0.05, our fourth single strain simulation indicates that prevalence peaks at roughly 400 individuals (i.e., just around 0.4% of the population) around day 85 (Fig. 3C), drops to around 50 between days 210 and 240, and then rises back to over 200 by the end of one year. In our fifth single strain simulation (Fig. 3D), an additional 5-fold reduction in our adaptive contact rate (i.e., κ(t) = κ0/2 = 2 when I(t)/N (t) = 0.01) produces a prevalence of just under 100/10,000=0.01% that appears to persist with stochastic noise from day 100 onwards.
We carried out two additional single strain simulations, this time for three years, and applied a vaccination rollout program that vaccinated individuals drawn at random in the second and third year at rates v = 0.001 and v = 0.002 (0.1% and 0.2% of the population vaccinated each day; Fig. 4A & B), respectively, from a pool consisting of individuals not currently infectious or previously vaccinated. In these runs, we set thalf = 365 days, (reflecting US case percentages after 1 year), with a population size of N0 = 100, 000. In the case v = 0.001, the vaccination rate was insufficient to overcome the rate at which infected individuals were becoming reinfected due to waning immunity. In the case v = 0.002, the vaccination rate was sufficiently high to bring the epidemic under control. In the latter case, however, if we no longer allow individuals who had previously been infected to be part of the vaccination pool as well (i.e. we confine vaccinations to the S class only), the vaccination rollout rate v = 0.002 is now insufficient to extinguish the outbreak and a resurgence occurs around the end of the second year (Fig. 4C).
Multi-strain simulations
In all our multi-strain simulations we set . This latter value, as previously mentioned, produces a cumulative number of cases of around 10% in the first year in a population of size N0 = 100, 000 (Appendix C, Fig. C.3), which is in the ballpark of 8.5% for the US outbreak during its first year.
In our first set of multistrain simulations, we set J = 4 (i.e., permitting 16 strains) and compared the situations where cross immunity is absent (c = 0; Fig. 5A) to where it has the intermediate value c = 0.5 (Fig. 5B). The remaining parameters were the same for all strains and used the same basic epidemiological parameters, as in our previous single strain simulations, although mutation rates were now added to enable new strains to proliferate (see caption to Fig. 5 for details on parameter values).
In the no cross immunity (Fig. 5A) case, we see that the initial surge is driven by strain 0 ≡ (0, 0, 0, 0) (see discussion in Appendix A for use of binary representation), which is first replaced by strain 8 ≡ (1, 0, 0, 0) around the end of year 1, which in turn is somewhat displaced by strains 7 ≡ (0, 1, 1, 1) and 6 ≡ (0, 1, 1, 0), with strain 1 ≡ (0, 0, 0, 1) coming to dominate at the start of year 3, only to be finally challenged by strain 3 ≡ (0, 0, 1, 1) towards the end of year 3. In the nearest-neighbor cross immunity case (Fig. 5B), given the retarding effect of cross immunity on the proliferation of strains that differ in only one allele from the initial dominating strain 0 ≡ (0, 0, 0, 0), this initial strain was displaced by strains 3 ≡ (0, 0, 1, 1) and 11 ≡ (1, 0, 1, 1) around the end of year These latter 2 strains were then displaced by strain 7 ≡ (0, 1, 1, 0) during the first half of year 3, with strain 13 ≡ (1, 1, 0, 1) dominating at the end of year 3 somewhat supported at a lower level by strain 14 ≡ (1, 1, 1, 0). We also noticed a moderate resurgence of strain 0 ≡ (0, 0, 0, 0) in the middle of year 3.
In a second set of multistrain simulations, we set J = 7 for a total of 128 possible strains. Unlike the previous set of multistrain simulations, we now provided a strain specific set of transmission parameters by selecting these at random on the interval [0, 0.3], apart from a dominant sequence of transmission values that steps in increments of 0.25 from 0.3 in strain 0 to 0.475 in strain 127 along the sequence mentioned in the caption to Fig. 6 and documented in Fig. C.4. The remaining parameters are the same as for the simulation depicted in Fig. 5B (i.e., the cross-immunity parameter c = 0.5). Unlike the case, J = 4, only two strains now played any serious role in the outbreak in the non-vaccination scenario (Fig. 6A): strain 0 ≡ (0, 0, 0, 0, 0, 0, 0) for the first year and strain 127 ≡ (1, 1, 1, 1, 1, 1, 1), the most transmissible, for the remaining 2 years.
Vaccination rollout simulations
Following our no-vaccination simulation of our 128-strain case (Case A; Fig. 6A), we explored three different vaccination rollout programs applied in years 2 and 3 with the following vaccination rates and vaccine strain valencies: Case B, rate v = 0.002 (i.e., 0.2% vaccinated per day), strain valency 0 (Fig. 6B); Case C, rate v = 0.002, strain valency 127 (Fig. 6C); and Case D, v = 0.01, strain valency (0,7,31,127) (Fig. 6D). Each of these vaccination programs begin with the initial strain 0 ≡ (0, 0, 0, 0, 0, 0, 0) being replaced by strain 127 ≡ (1, 1, 1, 1, 1, 1, 1) towards the end of year 1 or start of year 2.
Alarmingly, the first vaccination rollout scenario (Case B) made the epidemic worse rather than better when compared with the no vaccination scenario (number of deaths were 1366 versus 1122; compare Figs. 6A and B) because the vaccination rollout with its strain 0 valence vaccine, though delaying the emergence of strain 127, led to an increased prevalence of this most transmissible strain (β127 = 0.475 versus β0 = 0.300). Using a strain 127 valence vaccination (Case C) had some effect in reducing the 3-year death total back to 1109, but the efficacy of this rollout was thwarted by the emergence of strain 48 ≡ (0, 1, 1, 0, 0, 0, 0) (β48 = 0.295, Fig. 6C).
Given the inability of these two monovalent vaccination rollout programs in years 2 and 3 to control the outbreak, we simulated the much more aggressive rollout program of a quadravalent (0,7,31,127) vaccine applied at a rate of 1% per day (v = 0.01; a five-fold increase over the previous two monovalent programs) in a system with increased levels of cross immunity (cascading versus simple) and including revaccination of previously vaccinated individuals to combat the effects of waning immunity. Even this aggressive vaccination rollout program failed, although it did knock the incidence way down at the start of the rollout. The efficacy of this rollout, however, was thwarted by the rise of strain 23 ≡ (0, 0, 1, 0, 1, 1, 1) (β23 = 0.214) during the mid to latter part of the second year, which was then supplanted by strain 3 ≡ (0, 0, 0, 0, 0, 1, 1) (β3 = 0.350), which dominated the third year (Fig. 6D).
Unsurprisingly, the success of any vaccination program depends on the relative robustness of the immunity characteristics of hosts to the various strains. Thus, in regard to the vaccination programs represented by cases B and C (Fig. 6, we simulated the rollout of a bivalent vaccine with strain valency (0,127) implemented at the same vaccination rate of v = 0.002, but in a system with much more robust immunity characteristics: a waning half-life increased from thalf = 365 to 1825 days (a five-fold increase) and a cascading cross immunity system with c increased from 0.5 to 0.9. In this case, the outbreak was rapidly extirpated after 257 days (i.e., from start at day 365 to extirpation at day 622).
Given the failure of our first three 128 strain vaccination rollouts (cases B to D in Fig. 6) to extirpate the simulated epidemics, we investigated whether rollout programs that adapted vaccine valencies to track in real time the emergence of dominant strains might be more effective. Thus, we used a JS scripting window to write a JavaScript driver (as described in Appendix B; see Fig. B.4) that adapted (i.e., switched as necessary) the valency of the vaccine every 90 days after the start of the vaccination rollout to match the top two strains prevalent at time of switching, or only the top strain if no other strain exceeded a prevalence of 10 individuals. The results of these two simulations, one for the nearest-neighbor cross-immunity case (Eq. 1) and one for the cascading cross-immunity case (Eq. A.7), are provided in Fig. 7. They should be compared with the non-vaccination case depicted in Fig. 6A and the non adaptive vaccination rollout depicted in Fig. 6D. In contrast with the latter—and despite having the same vaccination rates, the same strategies for selecting non-infected individuals at random, and employing vaccines with fewer strain valency—both of these adaptive vaccination rollouts extirpated the epidemic, the first within half a year (Case A) and the second within 1.4 years (Case B) from the start of vaccination rollout.
Discussion
The amount of structure and data needed in complex biological systems’ models, depends on the questions that these models have been formulated to address [26, 37]. In this paper, we steered away from making specific predictions— because universal solutions are not always locally applicable. Rather, we focused on questions relating to the potential efficacy of different vaccination rollouts (both vaccination rates and valencies of vaccines) in the context of strain emergence and the potential for vaccination programs to go awry. To make specific predictions requires localized data, particularly as it may relate to social distancing behavior or other factors that affect contact rates over time [30, 38, 39]. Additionally, we cannot expect models to make strain-specific informative predictions, unless they have been designed to do so and are supported by location specific data regarding the relative transmissibility of strains and other strain specific data. In the absence of such data, models become a tool for anticipating pitfalls and avoiding unintended consequences of well-intentioned actions. Equally important, however, is the implementation of our model as a RAMP (runtime alterable model platform), because this greatly facilitates the use of our model by ourselves and others in testing out different hypothesis about the process generating observed population level strain transmission dynamics.
The illustrative simulations we present demonstrate that if vaccination roll-out rates and vaccine valencies are applied in blanket manner—that is, without monitoring strain emergence and obtaining some assessment of the epidemiological characteristics of those strains (e.g., relative transmissibility, virulence, waning and cross immunity characteristics)—then the possibility exists for a vaccination program to do more harm than good. This of course, is not to say that vaccination programs should ever be avoided; only that once a vaccination program is embarked upon, it should be implemented at a sufficiently vigorous rate to ensure that it is not outrun by relatively fast waning immunity rates. Additionally, when escape mutations emerge, generally associated with reinfection cases [8], then the valency of vaccines should be switched rapidly enough to thwart escape mutations, rather than aiding such mutations by thwarting less transmissible competing strains instead.
At this time, the primary value of our SEIVD-IBM J-RAMP is in testing various vaccination strategies as they relate to strain emergence [40] and identifying pitfalls related to the unintended promotion of more transmissible or virulent strains. Strain emergence has both a local and global component. Particular strains may first be identified as occurring in specific geographical regions, as in the strains that appear to have originated in the UK (B.1.7.7 variant), South Africa (variants B.1.1.54, B.1.1.56 and the C.1 lineage) and Brazil (P.1 lineage) [41]. Thus more geographically focused application of our J-RAMP may require more specific strain related information. Such information would then be used to get estimates of the relative values of transmissibility βj, virulence αj, and even of shedding and environmental persistence . But equally important in evaluating the impacts of vaccination strategies on local outbreaks is obtaining strain-specific cross-immunity data (for characterization of the elements cj𝓁 of the cross immunity matrix C), waning rates (which we have not made strain specific, but our model could be generalized to include strain specific values ), and the relative within host competition values (i.e., λj, which we have set all equal to 1 at this time in the absence of data to the contrary). Models are sorely needed to explore multistrain dynamics, particularly the epidemiological properties with regard to shedding, environmental persistence, transmission, mutation, and within-host replication rates, which acting together determine the relative success of different strains and their actual impact on the severity of epidemics and the nature of vaccination programs needed to suppress them.
The process of making our model both location and strain specific could be undertaken using methods designed to enhance the relevancy of models, such as Appropriate Complexity Modeling (ACM: [26, 37]). Further, in some cases it may be useful to add spatial or age-structure information to our SEIVD model IBMs or include a contact network [42], which itself may contain spatial or refined class category (e.g. age or work category) information. In addition, our current implementation represents strain differences in terms of J loci with two alleles (denoted by 0 and 1 respectively) at each locus. A more realistic representation of the genetic basis of strain differences may involve a more involved genetic representation in which several alleles are possible at each locus.
Although cross-immunity and immune waning are entangled in our immunity modifier functions (i.e., ϕij; see Eq. A.11), cross-neutralization data can be used to estimate the cross and waning immunity parameters using appropriate methods [43, 44]. Such data are becoming more widely available through the application of rapid PCR methods [3, 45, 46]. Strain or variant cross neutralizing studies bring up a much neglected issue, which is the effect of dose (number of pathogens involved in the initial infection, also know as viral load) on the severity of the infection. This differs from the questions of the number of doses—usually one or two—constituting and vaccine regimen versus the viral load (or antigen load, or virus-like particle load when only antigens or parts of viruses are administered) in each dose [4]. In the context of vaccination, both these issues and the technology used to produce the vaccine [23] may well have an impact on waning immunity half-lives and cross-immunity values. Thus, the parameter values used in the model should ultimately be vaccine specific. Further, when it comes to determining waning and cross immunity parameter values, our model does not distinguish between individuals that have been infected with a particular strain or vaccinated with a valency related to that strain.
In the coming year, as we obtain more information on the nature of immunity to SARS-CoV-2, it will become more apparent to us whether or not COVID-19 will settle into global endemicity [19, 47]. If this is the case, then constant vigilance and a well designed vaccination program with respect to vaccinating the young and implementing booster vaccinations with appropriate strain valency will become the order of the day. The J-RAMP presented here, with appropriate elaborations that will become evident through its future application, such as being able to compute the best time to administer booster shots of the same or different strain valencies to individuals, should play a decisive role in the rational design of effective and efficient COVID-19 vaccination programs worldwide. This is made apparent from the simulations we present here that, depending on the immunological characteristics of hosts and mutational rates of SARS-Cov-2, it may be harder than currently anticipated to extinguish COVID-19 in the next few years through non-adaptive vaccination programs. Thus, we should endeavor to adapt the strain valency of our vaccines as rapidly as possible to the predominating SARS-Cov-2 strains, identified from ongoing monitoring and surveillance programs.
Finally, the concept of runtime-alterable, modeling platforms (RAMPs), with driver script and other coding platform integration, in our case developed in Java with R platform integration and a JavaScript simulation driver window, provides the first example of a new concept in model implementation that facilitates model sharing and easy modification by users other than the original developers. We believe such platforms can come to play an important role not only in disease modeling, but in all fields of research that rely on models for comprehensive analyses of the behavior of systems of interest.
Materials and Methods
Our SEIVD-IBM in a nutshell
We constructed an individual-based model (IBM) of a susceptible-exposed-infectious-recovered (i.e., an SEIVD model, where removed R are split into V=immune/vaccinated, and D=dead) epidemiological process [11, 12] in a homogeneous population with a random encounter contact rate parameter κ0 > 0. Our formulation includes a host immunological waning process [6, 19]. It also includes the emergence of pathogen strains due to a mutational process that impacts both transmission of mutant strains from the infectee and genetic drift [3, 5, 48] of strains within the infector, with rates impacted by cross immunity effects. We allowed for variation in pathogen strain transmissibility (i.e., in the β > 0 parameter of the frequency dependent transmission function βSI/N [49, 50]) and pathogen virulence as represented by the disease-induced host mortality rate sensu Anderson and May [51] (and often represented by a parameter α ≥ 0 [49]).
The detailed formulation of our model and its algorithmic implementation is provided in Appendix A. In a nutshell we:
defined a set of 2J pathogen strains (user selected value for J ranging from 0 to 8; pathogen index j = 0, …, 2J −1) with a genetic-relatedness topology of a J -dimensional unit cube—i.e., each pathogen has J -loci that can take on one of two allelic values at each locus with immediate neighboring strains differing from each other by exactly one allelic value (0 or 1) at only one of the J loci
defined a population of N0 hosts as belonging at time t to either an epidemiologically naïve set of susceptible individuals S of size NS(t), a set A of NA(t) identified agents Ai (i = 1, …, NA(t)) whose epidemiological histories are known, or a set D of ND(t) individuals that have died from the disease
allow pathogen strain-specific transmission “force” and virulence (αj ≥ 0) parameters to vary in value among one another within a defined range and αj ∈ [αmin, αmax]
kept track of the total prevalence NI as the sum of the prevalences of the individual strains —i.e.
introduced a random contact rate function κ(t) with a constant parameter κ0 that is Poisson distributed on [t, t + 1), t = 0, 1,…, multiplied by an adaptive response function that reduces the contact rate with increasing disease prevalence, such that the κ(t) is reduced to κ0/2 when the —see Eq. A.10 in Appendix A
update the epidemiological state of the agents Ai with respect to each of the strains j = 0, …, 2J −1 where the state with respect to particular strain j at time t is represented by
(a) 0: has never been infected with this strain
(b) Ej(t, τij): infected at time τij ≤ t with this strain, but not yet infectious for an expected period of σE units of time
(c) Ij(t, τij): infectious at time t with this strain, for an expected period of σI units of time, having been most recently infected (reinfections with the same strain may occur) with this strain at time τij < t
(d) Vj(t, τij): has now recovered from its most recent infection at time τij with this strain and has some level of waning immunity to it
where we assume that agent Ai can be infected at time t with at most one dominant strain (denoted by the index j), although it will have different levels of waning immunity to all of the strains to which it has been infected in the past
included waning immunity functions ωij(t) (symbol is omega: Eq. A.6) used to compute the level of immunity that agent Ai has to its most recent infection by strain j
included cross immunity constraints (a J 2-matrix C) that apply both to the infector transmitting the pathogen and the infectee being invaded (inv) by the pathogen, both of which reduce the likelihood of infection and strain drift by strain j compared with closely related strains 𝓁
computed an infection probability that agent Ai infected with strain j infects agent Ah with strain 𝓁 in terms of a concatenation of infector viral shedding (ζi𝓁), viral persistence in the environment (ε𝓁), and viral transmission (βh𝓁) processes (Fig. 1)
computed an invasion probability that an agent Ah infected with strain 𝓁 becomes infectious with strain 𝓁′ as its major strain, in terms of the multiplicative effects of viral mutation (µ) and viral replication (λ𝓁) processes ongoing within an infectee Ah during this infectees exposed and infectious stages (Fig. 1)
computed the overall probability πih,j𝓁′ that that an infector Ai infected with major strain j results in an infectee Ah expressing 𝓁′ as its major strain
implemented a discrete time individual-based stochastic SEIVD (here V represents individuals that have either recovered from infection or have been vaccinated, D represents cumulative dead) multi-strain model that includes specifiable time-dependent univalent and multivalent vaccination applications
Our J-RAMP implementation
Models of systems process can be coded as singular implementations model formulations using: i) highly efficiently compilable computer languages (e.g., C++, FORTRAN, Java); or ii) less efficiently, but more easily coded, scriptable (e.g., JavaScript, Python, Perl) computer languages. More conveniently and expeditiously, they can be coded up, as discussed in [52], using a systems modeling platform, such as Matlab’s SIMULINK, Mathematica, Stella, AnyLogic, Numerus, or Berkeley Madonna. Advantages of the latter include more rapid and accurate model development, though simulations may be slowed down by platform overhead. Between these extremes, we propose a more general approach to specific classes of systems’ models, where the basic system structure is fixed, but implementation of some elements can be easily and safely altered so that optional implementations are presented at runtime. We call such a design runtime alterable-model platforms. (RAMPs); and here we present a Java RAMP implementation of the SEIVD-IBM described in the previous subsection.
The characteristics we envision for a model platform to be a RAMP are:
RAMPs include a set of model parameters (constants) whose values can be selected or specified (sometimes within a predefined range of values) at simulation runtime using a switch, nob, slider, or text-entry window accessed via a platform graphical interface or dashboard (e.g., see Fig. 2 and Table 1 which apply to our SEIVD-IBM J-RAMP).
RAMPs include a specific set of runtime alternative modules, (RAMs), where the original can be redefined in a graphical interface window, and the unaltered original and the alternative routines are stored as a (preferably open-ended) numbered set. The original or any one of the alternatives can be selected for use at runtime (for a list of functions in our RAMP see Table 2).
RAMP implementations also provide an API for both remote and onboard scripting. This API enables control of all user aspects of the simulation, including the parameter set, run management, RAM options, and data retrieval. Script logic can alter parameter settings and RAM options as the simulation progresses. A Nashorn-based Javascript interpreter enhanced with API methods is provided.
The API can be accessed remotely using operating system facilities by external applications running concurrently with the simulator. Of particular interest is the ability to control the simulation from the R statistical platform. An R routine can be formulated to both manage the simulation run and to retrieve and process the resulting data. The RAMP simulation becomes a “virtual package” to the controlling R logic. See Appendix B.
We implemented our RAMP using Java and made ample use of all of the features described above. Use of the RAM facility permitted experimentation with the several versions of cross immunity presented in this paper. A Javascript program was used to control an adaptive vaccination strategy. A small R package serving as a driver was used in an R program that ran the simulation multiple times, extracted results into R data structures, and produced graphs showing statistical mean and standard deviation. More details on the graphical structure and implementation of our SEIVD-IBM are both implied in the presentation of results below, and in additional details provided in Appendix B.
COVID-19 model assumptions and parameter values
The first variable that needs to be determined is the unit of time we use for our simulations because all process rate parameters are scaled by its selection. Since the time resolution of empirical COVID-19 incidence and mortality data is daily, we selected are unit of time t to be days. Additionally, based on various sources including a metapopulation study of COVID-19 parameter estimates [53], we set the latent and infectious periods to be 4 and 7 days respectively. Basic SEIR epidemiological models do not separate out the processes of contact and transmission-per-contact, so we had some leeway on what values to choose for contact rates and transmission rates per contact because it is the value of the product of these that is important in determining the reproductive value, commonly referred to as “R0” for COVID-19. Hussein et al.’s [53] meta analysis of COVID-19 zeroed in on R0 = 3.14 as a mean value across a range of studies (95% confidence interval [2.69, 3.59]) though these authors suggested that the most recent studies they looked at suggested that R0 was more likely between 2 and 3.
Assuming no seasonal effects (i.e., δ = 0 in Eq. 3), in our single strain runs we kept R0 ≈3 by setting κ = 4 (effective contacts per day, which nominally involves being with 6 feet of one another at least 15 mins) thereby requiring β = 0.3, because in the notation of the continuous time model introduced in Eq. A.29, we have an inverse latent period value of γ ≡ 1/σE = 1/4 and an inverse infectious period value of ρ ≡ 1/σI = 1/7 days (Table 1) for Eq. A.29 to yield
In our multistrain runs, we did allow for the emergence of strains that were more transmissible than this single strain value (in these runs new strains were in the range [0,0.475] (see table depicted in Fig. C.4). Also, although the incubation period is typically thought to have a median of 5 days, it can be much faster [54].
In reality, the contact rate κ is not constant, but is time varying. It is influenced by the social distancing behavior of individuals who may be responding to fears of contracting the disease, or regulations imposed by local authorities to reduce effective contact rates. To account for this, we formulated an adaptive contact rate function κ(t) (Eq. A.10) in terms of a function that declines adaptively from a non-epidemic (baseline maximum) value κ0 to κ0/2 when the prevalence (the percentage of the population infectious at time t) increases to . In the illustrative simulations presented here, we explore the effect of varying on the size of the shape of epidemic, but set (i.e., a prevalence level of 0.2%) as a baseline value, because in our simulations this baseline value, along with the parameter values selected above, results in a cumulative number of cases to just over 11% in a population of N = 100, 000 (Appendix C, Fig. C.3). This puts our illustrative baseline case in the ballpark of the US outbreak which Worldometer reported to have a cumulative case level of 8.5% (≈284.5 × 106/332.9 × 106) after one year, but this is likely to be a substantial underestimate of the actual number of cases [55].
To keep things simple in applying our model to a multistrain setting, we assumed that the immune effects on infector shedding and mutation and replication within the infectee through a cross-immunity processes, can be tweaked through a single constant c ∈ [0, 1) (making the elements cj𝓁 of the matrix C strain-dependent will obviously require considerable supporting data). One scenario is to assume cross immunity only applies to nearest neighbors, i.e.: For other formulations, such as “Cascading C” or including escape mutations in the mix see Eqs. A.7 and A.9 in Appendix A.
Additionally, we expect this type of evidence to become increasing available with whole genome sequence of pathogen strains [56] collected during host shedding. For simplicity’s sake, however, we assume that infectee with major strain j will shed minor strains in the immediate neighborhood of j at comparative rate ζ ∈ [0, 1) and be major strain-independent: i.e., we assume Third, again in the absence of evidence to the contrary, we assume that seasonal fluctuations are strain independent. In this case, for strain-independent constants δ ∈ [0, 1) and θ ∈ [−π, π] that set seasonal fluctuations related to environmental persistence, Eq. A.13 in the Appendix A simplifies for a single constant to: In our multistrain simulations, for purposes of illustration, we set both the shedding parameter ζ in Eq. 2 above and µ in Eq. A.16 to be 0.001. Obviously, it would be of considerable interest to know how the values of these two parameters impact strain proliferation, but this is left to future studies when data are available to guide simulations of realistic scenarios.
Data Availability
Not applicable
Acknowledgements and Funding Sources
This work was funded in part by NSF Grant 2032264 (PI: WMG).
APPENDICES
A Model Construction
Here we formulate an individual-based or agent-based (ABM) SEIR epidemiological model to include host immunological waning and pathogen genetic drift with variation across strain transmissibility and virulence. [1]
Assumptions, definitions, and states
The population consists of a well-mixed pool of N0 individuals that is homogeneous except for the fact that some are uninfected (denoted S), some currently infected (E: exposed and not yet infectious; I infectious and asymptomatic or symptomatic) or have been infected and are now either dead (D) or recovered/vaccinated with some level of immunity (V) to one or more of 2J pathogen strains. This immunity wanes over time and its current level, augmented by specified levels of strain cross-immunity, factored into an agent specific time-dependent strain-resistance function that impacts the shedding of mutant strains by infectors and the within-host replication rates of mutant strains in infectees.
At the start of the epidemic, all individuals are assumed to encounter, on average, κ0 > 0 other individuals during each time period [t, t + 1], but this “effective contacts” rate adaptively decreases with increasing prevalence of the disease due to the implementation of non-pharmaceutical interventions (social distancing, hand washing, mask wearing, and other hygienic precautions). In our selection of epidemiological parameter values, a unit of time is taken to be a 24-hour day. Other scalings of time would then require appropriately adjusted epidemiological parameter values. Refined versions of the model could include age-related parameter values and contact rates, as well as contact tracing, quarantining, and isolation of infected individuals; but these will not be considered here.
Initially, at model time t = 0, all individuals are considered SARS-CoV-2 naïve susceptible apart from one individual who is considered to have just entered the infectious stage, infected by a pathogen designated as pathogen strain 0 (wildtype). Throughout the model simulation, the N0 agents in the population are partitioned into three disjoint sets: the set of SARS-CoV-2 naïve individuals, S(t), containing NS(t) (the susceptibles); the set of identified agents, A(t), containing NA(t) individuals who are either currently infected (time t) with a particular strain of SARS-CoV-2, or have some level of waning immunity to one or more strains of SARS-CoV-2; and the set of dead individuals D(t), currently of size ND(t). Only the individuals in A(t) are uniquely identified as they become infected for the first time and make the transition from set S(t) to set A(t), where they are sequentially labeled using the index i = 1, …, NA(t). The single infected individual at time zero will be designated Agent 1 (also known as patient zero and denoted by A1). Thus at time t it follows that We note that individuals in set A(t) can be in a disease state E or I with respect to pathogen j, but simultaneously can be in multiple immune states if they have been infected with more than one pathogen strain in the past. We also note that the distinction between symptomatic and asymptomatic individuals in state I will not be considered here; and only need be incorporated if testing, quarantining, and treatment processes are included in the model.
The total number of pathogen strains is set by a parameter J > 0, where each pathogen is represented by a J -bit binary number. Thus, there are 2J possible strains indexed by j = 0, 1, 2, …, 2J −1 where j is the decimal equivalent that corresponds to a given binary string. The initial strain, j = 0 is the binary string of J zeros.
Sets of stochastic epidemic events (i.e., transitions from classes S to E, E to I, I to V or D) are implemented at consecutive integer points in time (one set of events for each point in time). Events will only be considered on individuals that have been infected by at least one of the pathogens at some time after t = 0 (this means that initially the epidemic computation proceeds rather rapidly, but becomes more computationally intensive for each time step as time proceeds).
Pathogen set
At the start of the simulation (t = 0), the set of potential pathogens indexed by j = 0, …, 2J −1 is generated along with its associated environmental persistence , transmission , within host replication (λj) and disease-induced mortality rate (probability of dying from the disease ) parameters. These may be specified or drawn from underlying distributions (e.g., the uniform distributions β ∼ Uniform[βmin, βmax] and so on). Also, our model includes two 2J × 2J matrices of constants that are associated with pathogen mutations during strain shedding (elements ζjm) and cross-immunity (elements cjm) processes and thus involve but are conditioned on either the major strain that an infector is harboring or on immunological state of the agents involved. These are the shedding and cross-immunity matrices with elements j, m = 1, …, 2J −1, Thus we generate the following list of parameters associated with our 2J pathogen strains:
Agent states
In accordance with the above set of assumptions, each agent has the following basic disease states at time t, where disease states in agent Ai are referenced by the time τj > 0 at which the most recent infection with strain j has occurred (an individual may be re-infected after immunity to the strain has waned to relatively low levels):
S(t): An individual who at time t has not been infected with any strain of the pathogen up to time t. All these individuals belong to set S(t)
Ej(t, τij): An agent Ai who was infected with strain j at time τij, but has not yet become infectious (this is an individual in the latent stage that lasts for σE units of time). All these individuals belong to set A(t)
Ij(t, τij): An agent Ai who is currently infectious with strain j, after being infected with strain j at time τij (this is the infectious stage that lasts for σI units of time). All these individuals belong to set I(t) ⊆ A(t)
Vj(t, τij): An agent Ai who was infectious with strain j, having been infected with strain j at time τij, but is now non-infectious with regard to this strain—that is, recovered with some immunity to strain j, as well as some cross immunity to strains closely related to j. All these individuals belong to set A(t)
D(t): An individual at time t who has died after being exposed to and become infectious with some strain of the pathogen. In a refined version of the model, a record will be kept of the time of death and the strain that caused death. All these individuals belong to set D(t).
Since an agent Ai may be infected over time by more than one strain j, its complete epidemiological state is represented by a list If a living agent does not fall into any of the categories 2 – 4 with respect to pathogen j, we denote its epidemiological state at position j as ∅ (the empty set). Consequently, if an agent A is susceptible at time t (i.e., an element of S(t)), then we write However, while such individuals are omitted from the A list (hence we did not subscript the agent A above), they may be recognized as “virtual members” with this implicit state. Some other examples are:
If Ai(t) is infected, but not yet infectious, with pathogen strain j at time t but has not been infected with any other pathogen in its past history, then
On the other hand if Ai recovered from an infection with pathogen 0 at time τ0, and is now in infectious with pathogen j at time t, having become infected with this pathogen at time τj then we write
As we shall see, an agent history may contain at most one instance of either Ej or Ij, while possibly containing multiple instances of Vj.
Agent and index sets
At the start of each time period, we update the set of identified agents A by adding susceptibles that became infected with pathogens during the previous time period and removing agents that died during the previous time period. Thus if 𝕀A is the index set for non-empty elements of A, with new indices added for newly infected susceptibles and indices removed for individuals that died, then by definition: where the number of indices in the updated set 𝕀A(t + 1) is NA(t + 1) and the updated number of dead is ND(t + 1) at time t + 1.
For mathematical convenience all susceptibles S will also be referred to as A0:, i.e., there are NS(t) individuals referenced by A0 at time t. It will be useful to partition the set A(t) itself into three subsets at time t by identifying the sets E(t) and I(t) which respectively contain all agents that are currently in a state Ej(t) or a state Ij(t) at time t for some j = 0, …, 2J −1. We note the intersection of these two sets is empty—i.e., E(t) ∩I(t) = ∅—as will become apparent below from the transmission process rules set up below. We will use the notation to denote the set of agents in A(t) but not in E(t) or I(t).
We also identify the set of infectious agents with infectious strain j. If Aj denotes an agent whose epidemiological state contains an entry Ij(t, −), then where the number of such agents is denoted by , and its index set by
Epidemiological processes
Immunity
In compartmental SIRS and SEIRS models, a concept of waning immunity and its impact on epidemics is associated with the rates at which individuals in class R revert back to class S. In agent-based SIRS and SEIRS models, we have the opportunity to consider the immunological history of individuals and, hence, can take a more refined approach to the complex process of how pathogens in an infector Ai are passed on the an infectee Ah. Here we model this as a probability generated from a concatenation of rates that include pathogen shedding by Ai, the survival of pathogens in the environment, whether contained in feces, urine, sweat, mucosal secretions or water droplets excreted by an infector, and a process whereby pathogens gain access to a host (entering through wounds, mucosal membranes or other membranes in the pulmonary or alimentary systems). We then characterize pathogen within-host strain replication rates in terms of pathogen mutational and reproductive processes. The final outcome in our model is either host recovery with some immunity or host death. We also consider the induction of host immunity through vaccination and make the assumption that waning immunity is the same, whether it stems from natural infection or vaccination. Of course, these may be modelled in different ways should data become available to make this distinction an important modeling consideration.
Waning immunity
Recall that we use A0 to denote an anonymous (generic) member of S and that Ai for i > 0 refers to a specific individual with an associated state list/vector. If some specific Ai is in immune state Vj having been infected with this strain at time τij, we assume that the level of relative susceptibility of agent Ai to reinfection by strain j is given by (noting that the existence of the value τij implies that infection of individual i by strain j at time τij ensures that the Vj is no longer “null”) We note the following: 1.) the first case implies that τij has yet to be defined; 2.) the second case is equivalent to the statement that τij ≥ 0 now exists for strain j, since this occurs at time t = τij (through the invocation of state Ej(t, τij)); 3.) ωij(t, τ) ranges from 1 (i.e. full “on”) at t = τ + σI + σE and decays to 0 as t > τ + σI + σE → ∞; 4.) agent i cannot be reinfected with its current major strain or with any other strain while it is currently itself in any state Ej or Ij for any j = 0, …, 2J −1; 5.) the larger the value of σ the steeper or more abrupt the switch is from full immunity (equal to 1) at time τ through 1/2 at time to approach 0 as t →∞ (we set σ = 4 as providing an intermediate level of abruptness).
Cross immunity with escape mutations
A somewhat more general implementation of cross-immunity may be based on the number of alleles by which strains j and 𝓁 differ. If they differ in k positions, then the level of cross immunity can be set to ck. In this case In addition to this, one may also designate certain changes in certain alleles as “escape mutants” with respect to the progenitor strain. For these, the level of cross-immunity would be set to zero. Under these assumptions, a generalization of Eq. A.7 would be In particular, for the case J = 7, if strain (1, x, x, x, x, x, x) is an escape mutation with respect to some level of immunity to strains (0, x, x, x, x, x, x), then Eq. A.8 becomes
Vaccination
A vaccine may be designed to give immunity to one or more particular identified strain j. Vaccination strategies include vaccinating at a fixed rate v(t) (percent of individuals vaccinated at each time period) over a fixed period that begins at and ends at and can focus on drawing only on: i) individuals in the set S, ii) any non-infectious individual in S or A, or iii) any non-infectious, not previously vaccinated individual in S or A. The vaccine itself can be designed as follows:
Dominant strain vaccination at time τvac. An individual S or Ai vaccinated with the dominant strain, say j, at time serves to add the disease state Vj(t, τvac) to that individual’s list. If the individual is already in state Vj(t, τ ′) at time t > τvac, then its status is updated so that at time t > τvac it is now Vj(t, τvac) rather than Vj(t, τ ′)
Multi-strain vaccination at time tvac. An individual S or A vaccinated with a multi strain concoction at time , say with strain j1, …, jν, will have their disease status updated with regard to all these strains, as in the dominant strain case.
Infectious contacts
Infectious individuals are assumed to make effective contacts each time period; where effective contacts are those that are sufficiently close and of a sufficiently long duration to constitute a “risk of transmission.” This rate is either a constant κ0, or in stochastic implementations drawn from a Poisson distribution with mean κ0, or in adaptive formulations (e.g., under social distancing behaviour) is a function of the severity of the ongoing outbreak. We also assume at time t that under a random contact process, proportion and of these contacts will respectively be with susceptible and with uniquely identified agents, although only of those will be susceptible to infection with a new strain or reinfection with the same strain.
In the adaptive case, we assume κ(t) decreases from κ0 as the proportion of infectious individuals, NI(t)/(N0 −ND(t)), increases such that κ(t) = κ0/2 when . For convenience of implementation, however, we define the following “switching” (as apposed to hyperbolic) function even though, from a continuity point of view, the top part of this expression implies that κ(t) → 0 at .
Probability of infection
In deriving a probability of an agent Ah being infected with strain 𝓁 by and agent Ai who is infectious with major strain j, we concatenate (i.e., multiply together) several process, each of which involves nominal constants. Thus, in all but one of these processes, the scaling of these constants can be normalized and given a relative set of values across strains though one set of constants though relative, will ultimately all be scaled by fitting the model to real data. In our treatment below, constants associated with shedding and persistence will be scaled while those associated with within-host replication will be kept unscaled to be ultimately fitted to data. In particular, the parameters associated with pathogen transmission (i.e., from contact to the start of within host replication—see Fig. 1) will be scaled by fitting to epidemiological data, while the relative values for the different strains regarding pathogen shedding and environmental persistence can be fitted to experimental data collect to set values of these processes when considered on their own.
Pathogen shedding
We assume that shedding is affected by the immune state of the infector Ai and thus posit the shedding rates below for this individual when its major infectious strain is j. In general, we have a matrix of shedding rates before accounting for immunity and cross immunity that is specific to agent Ai. Immunity and cross-immunity act to reduce shedding rates through functions ϕij(t) ∈ [0, 1] that are computed in terms of Ai’s waning functions ωim with respect to strain m and a matrix of cross-immunity values cjm that have been normalized so that cjj = 1 for j = 0, …, 2J −1 and cjm ∈ [0, 1] for j, m = 0, …, 2J −1. Specifically, we define agent-specific immunity modifying functions and assume that the shedding rates can be expressed as
Environmental persistence
The persistence of pathogens in the environment are known to be impacted by humidity, temperature, airflow, and the surface properties of fomites [57]. This, and other factors relating the effects of weather on contact rates and efficacy, may result in overall pathogen transmission having a seasonal component to it [58]. In particular, viral persistence indoors may be much greater than outdoors, with a greater proportion of indoor contacts taking place during cold or wet weather. Thus the most appropriate place to introduce seasonal effects into epidemic processes is through contact rates and environmental persistence cycling over time with a period of one year (or even half-a-year if two comparatively spaced rainy seasons occur, as in some in tropical locations [59]) and an amplitude obtained by fitting parameters to data. Thus, in our model, we introduce constants , δ ∈ (0, 1), k (appropriately scaled, depending on the units of time) and θ and assume that The case δ = 0 corresponds to constant values for all t, while if δ = 1 we get the largest possible fluctuation between 0 and . The constant k relates to the time units so we get one cycle per year, and θ shifts the cycle to set the points in time at which the maximum and minimum values of η𝓁 (t) occur.
Strain transmission
In the context of a standardized dose (which will be modified by multiplying the strain effective contact and transmission by both pathogen shedding and environmental persistence functions), the differential rates of strain transmission, which we denote by βh𝓁, will depend on a constant strain transmission rate parameter modified by a function that represents the immune state of the infectee at time t: viz., recalling Eq. A.11
Probability of infection
Using a competing rates formulation [60] to compute the probability of infection as a concatenation of the process of infector shedding (ζ), environmental persistence (η) and transmission rates (β), we obtain
Within-host processes
If after receiving an initial infectious dose of pathogen, an individual is infected primarily with strain 𝓁, then we expect this strain to dominate unless intrinsic mutational processes are high (which is not the case for COVID-19) or the individual has some immunity to this dominant strain. In the latter case the situation is ripe for an “escape mutation,” that is one that evades the immune system, to arise.
If we nominally set the relative rate at which an individual invaded by strain 𝓁 has an infection dominated by strain 𝓁 (i.e., 𝓁 in the terminology of [61] is the major strain of the infection) to be (1 −µ), then the probability that one of the other strains 𝓁 ′ ≠ 𝓁 (in the case of COVID we assume that µ > 0 is very close to 0—e.g. of order 10−3 to 10−6—while for viruses lacking error correcting machinery it can be considerably larger and of the order 10−1). We can partition the latter probability according to a set of comparative strain within-host replication rates λ𝓁′, each moderated by its immune state function ϕh𝓁′ and a normalizing factor to obtain
Pathogen progression equations
Probability that infector Ai with major strain j will result in infectee Ah express 𝓁 ′ as its major strain is
Single strain case
In the single strain case (J = 0), the waning immunity equation Eq. A.6 reduces to (dropping the redundant index j = 0, and noting that the existence of a value τi implies Ai has been infected at time τi in the past) and the modifying immunity functions ϕij (Eq. A.11) collapse to 1, which implies that the pathogen shedding functions (Eq. A.12) collapse to 1. Without loss of generality, we can also assume that single strain value η = 1 in Eq. A.13, which implies that the probability of infection (Eq. A.15) reduces to Further, since in the single strain case there are no mutations to consider, it follows from Eq. A.16 that for all h and we finally have that (Eq. A.17) for all h.
Simulation algorithm
Parameters selected at the start of a simulation
N0: Number of individuals in the population. Assumed to be fixed over time (i.e., the population is closed), but partitioned into sets S, A and D with respectively NS(t), NA(t) and ND(t) individuals in each set and satisfying Eq. A.1.
J: The log2 of the number of possible strains indexed by j = 0, …, 2J −1
: Strain dependent transmission parameters (the process between contact and the start of strain replication and nominally equivalent to transmission in SEIR models—see Fig. 1) for pathogen strain j
: The time it takes for immunity to strain j to have waned by half.
: The time it takes from initial infection for an infected individual to become more likely to become infectious than remain infected without being infectious.
: The additional time it takes beyond for an infectious individual to more likely transition beyond being infectious than remain infectious.
: The proportion of individuals leaving the infectious category that die, which implies that is the proportion that become immune.
Initialization
Set up pathogen list (see Eq. A.2)
Initialize the simulation by setting t = 0 and creating the agent list A(0) one infectious and N0 − 1 susceptible agents.
Time t: vaccination loop.
Carry out the vaccination process before going into the rest of the loops with the updated S and A sets after the vaccinations.
Time t: contact loop. Set up contacts for the current round of encounters at time t (i.e., the inner agent-driven contact loop within the outer time-driven loop) and tag for outer loop update of disease status, as follows:
Numbers in various sets and associated index sets. Identify the number of individuals NS(t), NA(t) and ND(t) in sets S, A and D at time t respectively, as well as the number of exposed (but not yet infectious) agents NE(t), infectious agents NI(t) and identified noninfected agents . Break down the infectious agents tally into the number of agents infectious with strain j = 0, 1, …, 2J − 1. We will also need the index sets and , j = 0, …, 2J − 1 at time t.
Infectious contacts with each group. The rate at which any individual contacts other individuals per unit time is given by the contact rate parameter κ > 0. Assuming random contact events over one unit of time, the actual number of individuals that agent Ai contacts at time t is then given by Of these, proportions and are expected to come from susceptibles in the sets S(t) and AS(t) (see Eq. A.5) respectively. Thus the actual number of contacts in set S(t), AS(t), and E(t) ∪ I(t) are We note that only and are of interest because individual in states E and I cannot be reinfected. Also, we make the assumption below that the first infection that an individual in set A contracts in this contact loop, is the one that counts (i.e., there will be no simultaneously infections with multiple strains). Finally, since contacting individuals is tantamount to sampling with replacement, the number of unique contacts (i.e., all multiple contacts are counted as a single contact) that agent Ai has with individuals in the set S is reduced by excluding multiple contacts (which under a random contact model is a negative exponential correction) to obtain Thus if is expected to be very close to the upper value . On the other hand, if , then is expected to be around . Additionally, after dealing with each agent i reduce in the size of NS(t) to take account of those agents that had been infected by agent Ai and had now entered the ranks of the set A.
Identify all infectious agents and their pathogens strains. Among all agents in the set A(t) (Eq. A.4), identify those that have an infectious strain Ij for some j = 0, …, 2J − 1. Thus, if the number of infectious agents with infectious strain j is then consider the set with index set Initially, most of these sets will be empty, but will fill in over time.
Susceptible contacts. The probability that an agent Ai with a strain j major infection infects a susceptible (nominally denoted by individuals of type A0) who then becomes infectious with dominant strain 𝓁 ′ is given by the probability πi0,j𝓁′ computed in Eq. A.17, which itself relies on expressions Eq. A.10-A.16. The actual number of individuals in the set S will make effective contact with one more infectious individuals is obtained using Eq. A.23. Thus, from a multinomial drawing, we can now generate the number of newly exposed individuals, (the “+” is used to denote these are newly added and the “0” that they are coming from the set S), with major strain 𝓁 ′ at time t + 1, have been infected by agent Ai with major pathogen strain j on the time interval [t, t + 1): These individuals will be used to update list of currently infected individuals in the sets , j = 0, …, 2J − 1 at time t+1, which is computed in the outer loop computation, as presented below. We also note that the probabilities in the above multinomial add to less than 1, so that at the end of the drawing a proportion of the individuals remain uninfected.
Agent contacts. The number of agents that come into contact with agent Ai over the interval (t, t + 1) is given by Eq. A.22. This number is drawn from the set 𝕀A\(E∪I) with replacement and the following multinomial computation is used to determine how to update agent Ah at time t + 1 when coming into contact with agent Ai on the interval (t, t + 1) using the probabilities of transmission given in Eq. A.17. Specifically, agent Ah will become infected with major strain 𝓁 ′ at time t + 1 is determined by the multinomial drawing We note here that since the agents Ah, h ∈ 𝕀A\(E∪I) are drawn with replacement as the computation proceeds and the agents Ai, i ∈ 𝕀 are cycled through, if a previously drawn Ah is drawn again, but has already been infected in the current round then we ignore the latest event, but keep the previous infection event intact. To obviate bias in this procedure, we need cycle through the agents Ai, i ∈ 𝕀 at random rather than in numerical order.
Time t: disease progression loop.
Individuals in AE at time t. An individual Ai ∈ AE at time t and in state Ej(t, τi), j = 0, …, 2J − 1, becomes either an individual in state Ej(t + 1, τi) with probability or transfers to state Ij(t +1, τi) with probability thereby entering class AI at time t + 1.
Individuals in AI at time t. An individual Ai ∈ AI at time t and in state Ij(t, τi), j = 0, …, 2J − 1, becomes either an individual in state Ij(t + 1, τi) with probability or leaves the set Ij(t + 1, τi) with probability . In this latter case, the individual either dies with probability or enters the state Vj(t + 1, τi) at time t + 1 with probability The total number of individuals dying over the interval [t, t + 1) is noted as having a value ΔND(t).
Time t + 1: outer loop update. The outer loop records all the events that took place in the contact and disease progression loops and updates the agents state at the next time step. It also updates all other states as follows.
Individuals in AS at time t. For the NS(t) individuals in AS at time t, we have enter set and we update where Eq. A.23 ensures that NS(t + 1) ≥ 0
Individuals in AS that are infected again over [t, t + 1). These individuals can become reinfected as calculated in the contact loop. Those that become reinfected with strain j, j = 0, …, 2J − 1 enter state Ej(t + 1, t + 1) at time t + 1.
Updating the immunity of individuals in AS. Every individual within AS at time t must have its immunity status updated so that for j = 0, …, 2J − 1, if Ai is in state Vj(t, τij) at time t then it transfers to state Vj(t + 1, τij) at time t + 1, even if reinfected, as in b.) above.
Transfer from S to A. The computed in Eq. A.24 become newly listed members of the set A by entering state Ej(t + 1, t + 1), j = 0, …, 2J − 1. This involves updating the equations for NS(t) and NA(t), including taking account of the number of individuals ΔND(t) that died from the disease in the immediate time period, i.e.:
Along with input parameter values tvac_on ≥0, tvac_off and pv ∈ [0, 0.1], we also need to specify the valency of the vaccination by selecting 1 to 4 numbers that take on values 0, …, 2J −1 (if more valencies are needed than 4, then the platform needs to be modified accordingly). We also need specify whether Nselect will just be individuals in the set S(t) (Nselect = NS) or will be any individual other than those in the set AI(t) (Nselect = NS + NA − NI).
In Algorithm 1 we summarise the steps of the simulation algorithm, as described in this section. On the right we report the name and numbering of the subsections while in the for loops we list the various steps respecting the item letters. Note that technical steps not explicitly described in the text (e.g. store updates, store set progression) do not present letters or numbers. The time set is defined with T while to describe temporal progression of set S, A and D we use the symbols 𝒮, 𝒜, 𝒟 respectively.
Estimation of R0
In a continuous time, SEIR model, when κ is folded into an all encompassing frequency-dependent transmission rate parameter βκ = βκ > 0 (i.e., total is transmission given by [49, 50]), γ > 0 is the transition rate from E to I and ρ > 0 is recovery rate from I to R, then R0 corresponds to [62]
B J-RAMP Details
General description
The J-RAMP design augments a desktop simulation platform with several novel features that increase flexibility and expressiveness, and promote experimentation and interoperability with other platforms. These include an API (“application programming interface”) fully supporting remote operation and direct retrieval of data for external processing on other platforms, such as Python, Javascript or the R statistical platform. The API can also be accessed by an onboard scripting interface that uses the Nashorn Javascript engine.
Additionally, using a novel design, elements of the internal algorithm are exposed for possible reprogramming in a secure fashion that will not damage the overall system. These runtime alternative modules (RAMs) may also be controlled from the API to facilitate selective algorithm redefinition during the run of the simulation.
Use of the J-RAMP features require some experience with scripting and/or Java coding, however the resulting modifications to the algorithm can be of great significance. The RAM platform is implemented to support program redefinition with no risk to damaging the underlying code base. It should be accessible to anyone with moderate scripting experience.
A major goal of the J-RAMP project is to pre-package these functionalities so that they can be readily deployed as part of simulation system design. This goal has been partially realized with respect to the RAM platform: annotations can be added to the simulator’s source code that direct the automatic generation of Java code to integrate into the simulations’s source and provide the functionality.
The following discussion assumes some familiarity with script or program development.
Runtime alternative modules
Figure B.1 shows the RAM redefinition frame. The available RAMs appear as radio buttons along the bottom of the frame. Each RAM is a set of options for defining a relatively short Java method implementing some key aspect of the simulation algorithm. For example, included in this simulation are the implementation for cross immunity given in Eq. 1; the implementation for β given in Eq. A.14; and the implementation for ϕ given in Eq. A.11; etc. Each RAM initially contains only a single option, Option 0, the default, internally defined implementation. Option 0 cannot be edited and appears for reference purposes only.
Additional options may be added to each RAM containing code redefining the method. Two editor panes and one console pane are stacked in the frame and display the code and output of the RAM. These panes show the content associated with the currently selected RAM and option. The top editor pane contains the code for the method being redefined. The second editor pane contains definitions of any new help functions required by the definition in the top pane. The console pane contains messages and output that are useful during the development of the option. For convenience, a “Load Default” button initializes the editor to an editable version of the Option 0 default to use as a starting point.
Figure B.1 shows the Option 0 default definition for the cross immunity matrix function, as described by Eq. 1. Clicking the “+” button produced two new options, which appear in the insets. These option implement the alternative cascading cross immunity schemes presented in Eqs. A.7 and A.9. Option 2 code appears below:
double crossImmune(int j, int k) { if (l == j) return 1; if (j < pathSize()/2 && l > pathSize()/2) return 0; return Math.pow(Params.cImmune(), Tools.hdist(j, l)); }Note that we have substituted the function pathSize for a hard-coded value of 63. pathSize returns the number of pathogens, allowing us to use this formulation for any choice of entropy. Documentation for pathSize is at the top of the window in the list of available help functions and parameters. There is also more extensive documentation in a separate user guide (see Fig. B.2).
The platform duplicates a mini-development environment for building alternative definitions. Once code has been entered the “Compile” button checks the legality of the code and makes it available for use at runtime. Legally compiled code will produce a “Compilation Successful” message. Errors will appear with line numbers if they occur. Once the code is legal, the “Test” button can be used with actual parameters entered into the small text fields to determine correctness of the code. It is also possible to include print and println statements in the code during development to further check correctness. Output from print statements will appear in the bottom console window. The entire RAM set can be saved and will reappear during subsequent launches of the simulator platform.
To use an alternate RAM definition at runtime simply select the desired option. (Selected options will be restored from a saved RAM set during subsequent launches.) The system will compile any uncompiled code the first time it is accessed. If an error occurs during a runtime compilation an alert will notify the user that the system is returning to the default definition of that RAM. At no time is the internal logic of the program overridden.
Finally, RAM option selection is part of the API described in the next section. This means that a script may run a simulation selecting different options at different points in time, using logic that considers the state of the model. For example, such an adaptive protocol might be appropriate for determining the contact rate κ.
Application programming interface
The API is a simple bytecode1 called BPL (Blackbox Programming Language) that addresses all available user interactions with the simulator. Instructions fall into three categories: parameter assignment and retrieval; simulator operation; and data retrieval. A complete list of instructions is shown in Fig. B.3. Instructions are comprised of opcodes (e.g., reset, step, get) followed by 0 or more arguments. Every BPL operation returns a result, even if empty, for synchronization purposes. A string consisting of a sequence of opcodes and arguments may be submitted to the BPL interpreter, an example of which is shown in the notes in Fig. B.3.
Parameter assignment and retrieval
Every user-configurable element (including random number generator seeds) is addressed from BPL using a unique three-letter “airport code” (see Table. B1). Additionally, pathogens are addressed by their id number (0 to 2J −1) and agent states using identifiers S, E, I, V, DI+ and DD (the latter two represent ΔI and ΔD, respectively). RAM options are addressed in setOption and getOption using the name of the RAM (e.g., “crossImmune”). Get and set operations can be used on each of these with the exception of ENT (strain entropy), which is read-only.
Simulator operation
Simulation runs begin by executing the BPL reset instruction, followed by step, run for or run. The BPL interpreter operates synchronously with the simulator by waiting to process subsequent commands during a simulation run. Operational instructions can be interspersed with parameter set/get or data retrieval to use in runtime decision-making. Note that the reset operation restores the simulator to its state at the time of the last reset, so that no parameter changes made during a run are persistent.
Data retrieval
Operations to obtain the current population in each state, and to retrieve the runtime population history of each state and pathogen are also included. These can be easily transformed into R data frames, for example, for further analysis.
Scripting can be deployed using either one of the two on-board script interpreter interfaces, or remotely from another platform using drivers provided with the simulator. The remote drivers use an operating system channel called named pipes or fifos that are part of all MacOS, Windows, Linux, and other Unix™ -based systems.
On our main dashboard, we provide two scripting windows that are opened using the “S On” and “JS On” buttons (see button second and third from left at bottom of Fig. 2A). The former allows the user to write simulation driver scripts directly as command strings. (The commands listed in Fig. B.3 are accessed by pressing the “Command Reference” button in the “S On” window.) This window is used primarily to test and monitor scripts intended to be deployed on a remote platform. The JS window contains a Nashorn Javascript interpreter enhanced to accept and execute BPL operations. Scripts can developed, saved, and used to drive the simulator from this interpreter. For example, Fig. B.4, lists the code used to implement the adaptive vaccination programs. The SEIV object referenced in this code contains methods corresponding to the BPL operations detailed in Fig. B.3.
R Integration
As previously mentioned, the API supports remote control of the simulator from independent platforms using the operating system’s indigenous fifo channels. Of particular interest is integration with the R statistical programming environment. An R-package called “seiv” acts as a driver by synchronously issuing BPL command strings and waiting for results. Consequently, a simulation can be driven entirely from within the R platform, treating the simulator as a “virtual package”.
Fig. B.5 shows the code used to run the simulator multiple times with different random number generator seeds. Following each run, the time history of the population in the I, DI+ and DD states is extracted directly to an R data frame (without the need to save, for example, in a comma-separated list). At the end of the run sequence the data frame is used to build the plots shown in Fig. C.2.
R could be used in a more direct way by analyzing data at various points throughout a single run and adjusting parameters programmatically, similar to the adaptive vaccination strategy carried out in Javascript, only taking advantage of the R environment’s powerful toolkit.
Footnotes
Competing Interest Statements: none
↵1 a bytecode is computer source code that is processed immediately by a program, usually referred to as an interpreter or virtual machine.
References
- [1].↵
- [2].↵
- [3].↵
- [4].↵
- [5].↵
- [6].↵
- [7].
- [8].↵
- [9].↵
- [10].↵
- [11].↵
- [12].↵
- [13].
- [14].↵
- [15].↵
- [16].↵
- [17].↵
- [18].↵
- [19].↵
- [20].↵
- [21].↵
- [22].↵
- [23].↵
- [24].↵
- [25].↵
- [26].↵
- [27].↵
- [28].↵
- [29].↵
- [30].↵
- [31].↵
- [32].↵
- [33].↵
- [34].↵
- [35].↵
- [36].↵
- [37].↵
- [38].↵
- [39].↵
- [40].↵
- [41].↵
- [42].↵
- [43].↵
- [44].↵
- [45].↵
- [46].↵
- [47].↵
- [48].↵
- [49].↵
- [50].↵
- [51].↵
- [52].↵
- [53].↵
- [54].↵
- [55].↵
- [56].↵
- [57].↵
- [58].↵
- [59].↵
- [60].↵
- [61].↵
- [62].↵