Summary
Estimation of infectiousness and fatality of the SARS-CoV-2 virus in the COVID-19 global pandemic is complicated by ascertainment bias resulting from incomplete and non-representative samples of infected individuals. We developed a strategy for overcoming this bias to obtain more plausible estimates of the true values of key epidemiological variables. We fit mechanistic Bayesian latent-variable SIR models to confirmed COVID-19 cases, deaths, and recoveries, for all regions (countries and US states) independently. Bayesian averaging over models, we find that the raw infection incidence rate underestimates the true rate by a factor, the case ascertainment ratio CARt that depends upon region and time. At the regional onset of COVID-19, the predicted global median was 13 infections unreported for each case confirmed (CARt = 0.07 C.I. (0.02, 0.4)). As the infection spread, the median CARt rose to 9 unreported cases for every one diagnosed as of April 15, 2020 (CARt = 0.1 C.I. (0.02, 0.5)). We also estimate that the median global initial reproduction number R0 is 3.3 (C.I (1.5, 8.3)) and the total infection fatality rate near the onset is 0.17% (C.I. (0.05%, 0.9%)). However the time-dependent reproduction number Rt and infection fatality rate as of April 15 were 1.2 (C.I. (0.6, 2.5)) and 0.8% (C.I. (0.2%,4%)), respectively. We find that there is great variability between country- and state-level values. Our estimates are consistent with recent serological estimates of cumulative infections for the state of New York, but inconsistent with claims that very large fractions of the population have already been infected in most other regions. For most regions, our estimates imply a great deal of uncertainty about the current state and trajectory of the epidemic.
Introduction
Nearly 3 million confirmed cases of the novel coronavirus SARS-CoV-2 infection have been reported worldwide as of late-April, 2020 (“COVID-19 Map” 19). However, it is presumed that many infections remain unreported due to either mildness of symptoms or inadequate testing. Knowledge of the full extent of the COVID-19 pandemic is required to evaluate the effectiveness of mitigation strategies such as social distancing, which have a major economic and social cost.
In the absence of universal testing, a proportion of the infected either do not cross the diagnostic threshold for COVID-19 testing or are unable to acquire medical attention and thus remain undetected. Consequently, the fraction of infected who are reported as cases may be substantially less than 1. Estimating the total case ascertainment ratio (CARt), defined as the ratio of the total number of confirmed cases to the total number of individuals infected with the novel coronavirus SARS-CoV-2 on day t, is thus important for constraining the initial reproduction number R0 of COVID-19, as well as recovery and fatality rates due to the disease.
Estimating the true magnitude and dynamics of the fractions of the population who are infected, susceptible, or recovered is a difficult and open problem. Serological and molecular diagnostic tests may not have perfect specificity and sensitivity (a substantial problem if the true positive rates are also low), may not be widely available, and may continue to suffer from ascertainment bias. Recent modeling work has attempted to estimate the true infected population, often relying on reported deaths since there is less ambiguity in the definition (Lourenco et al.; Flaxman et al.). Flaxman et. al 2020 focused on death data from the European Centre of Disease Control and suggest that there are orders of magnitude more infected than detected in confirmed cases. This claim was supported by a separate SIR model fitting cases, case recoveries, and case deaths that predicted a CAR of 1/63 for Italy (Calafiore et al.). Recent, preliminary, serological test results have been mixed. A study from Benavid et. al (Bendavid et al.) is consistent with this high number of undiagnosed cases. However a study of women admitted for delivery estimated that 1 in 8 cases are symptomatic (Sutton et al.), which is consistent with a recent press report by New York state based on sampling grocery store customers (New York Times). Debates continue on the adequacy of these tests.
Here, we develop a set of Bayesian, mechanistic, latent-variable, SIR models (Kermack and McKendrick; Wo and Ag, “Contributions to the Mathematical Theory of Epidemics--II. The Problem of Endemicity.1932”; Wo and Ag, “Contributions to the Mathematical Theory of Epidemics--III. Further Studies of the Problem of Endemicity. 1933”) that respect the uncertainty in the underlying data. We explore multiple specifications and find general consensus among the models. Our models try to account for the effects of mitigation (such as social distancing), the possibility that CARt (either through changes in definition or in detection, for example via ramp-up of testing) can vary in time, excess variability due to reporting and heterogeneity in the regional population, and incomplete quarantine of reported active cases. A similar model assumed complete quarantine on the basis of debilitating effects of illness or strict adherence to policy (Pedersen and Meneghini), which we relax to quantify the effectiveness of such actions. Using the model, we predict the total number of infected individuals from the confirmed numbers of cases, case recoveries, and case deaths. These three data time-series are sufficient to constrain time-varying estimates for CARt as well as for the fraction of unobserved infected individuals from which we can obtain the total reproduction number Rt and total Infection Fatality Ratio (IFRt) as functions of time.
Methods
Our overall approach uses latent-variable Bayesian models with underlying dynamics given by a family of SIR models (Kermack and McKendrick). We implement our models in the Bayesian inference software language Stan (Carpenter et al.; Hartikainen et al.) to estimate the model statistics within each region independently, using data from The Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE; Dong et al.) and from The COVID Tracking Project (“The COVID Tracking Project”). Collectively, these data consist of daily counts for the reported new cases, deaths, and recoveries (as well as key regional mitigation dates, where available (Flaxman et al.; Mervosh et al.)). We only consider regions that report all three of these quantities, which eliminates several US States and dates beyond April 15 for which the complete data was unavailable.
In the SIR models that we use to quantify the regional population dynamics of COVID-19, interactions between susceptible individuals (S) and infected individuals (I) induce an increase in I and a concomitant decrease in S. The unobserved members of I can either transition into a reported SARS-CoV-2 case, C, or become no longer infectious (either by recovering or dying). Confirmed cases C can then either recover (RC) or die (DC) and also infect more S at a rate that can differ from I. Mitigation efforts (including any social or policy changes aimed at reducing transmission rates) are modeled by a time dependence on the rate of infectious transmission. This rate changes from an initial value to a final value on a specified day with a 5 day transition period. We also include a similar time dependence in the rate of transition from I to C to account for changes in testing regimens, clinical classification, and awareness of infected individuals to seek medical attention. We consider six variations of the latent-variable SIR model of varying complexity. Besides the standard nonlinear SIR models, we also fit linear approximations that assume that S is approximately fixed (i.e. the epidemic is small enough that herd immunity is not significantly affecting transmission). The linear models have exponential growth with growth rates that can vary in time including becoming negative.
We tie each underlying SIR model to the available data using the generative probabilistic scheme shown in Figure 1. We assume a Negative Binomial likelihood, where the variance is a fitted parameter, for the observable data given the model predictions. This formulation accounts for the inherent variability of an underlying stochastic birth-death process (which usually obeys a Poisson distribution), disease progression effects, heterogeneous mixing of population clusters, and errors in classification and reporting inherent in the available data. We modeled prior probabilities for recovery and death rates using inverse gamma distributions derived from ref. (Verity et al.). For the other unknown distributions, we used weakly-informative priors for satisfying positivity and providing numerical and inferential stability. Priors for the day of mitigation application, taken from ref. (Flaxman et al.) and (Mervosh et al.), were used when available. We assessed models by balancing the likelihood of observing the data (given each model) against model complexity using a standard Bayesian model comparison metric such as Leave-One-Out cross validation (LOO) and WAIC (Vehtari et al.), to avoid overfitting. On the basis of this and other Bayesian metrics, we also report model-averaged statistics that take into account model specification uncertainty (Yao et al.).
Computer code written for this project is available at https://github.com/nih-niddk-mbs/covid-sicr.
Results
Global estimates for Rt, CARt, and IFRt, for 25 representative countries (selected based on case counts) and US states with the prerequisite data, are shown in Figures 2–4 (see Supplementary Material Table S1 for every region analyzed). Overall, we found a consensus between the different model predictions, including between linear and nonlinear models (Supplementary Material Figure S2). Results presented are Bayesian LOO weighted model averages. CARt can be greater than 1 in rare cases where more cases were introduced by migration at outbreak onset than disappeared from the infected pool without being detected. Credible intervals were broad for most region specific estimates (see Supplementary Material Table S1).
Global average quantiles across all regions and models are shown in Figures 2–5. The models capture the dynamics of Rt (Fig. 2), CARt (Fig. 3), and IFRt (Fig. 4). The global median R0 across all regions was 3.3 (C.I (1.5,8.3), Fig. 5). In the first week of the onset of COVID-19 appearance in a region (which differs by date for each region), we found that for every 1 case identified there are approximately 13 other infections unreported (CARt (week 0) of 0.07 (C.I (0.02,0.4)). The infection fatality ratio during the first week from regional onset has a global median of 0.0017 (C.I. (0.0005,0.009)) or 0.17%. As the infection spreads these estimates evolve over time such that the Rt on the week of April 15th, 2020 was 1.2 (C.I (0.6,2.5)). While Rt fell, the global median CARt and IFRt increased over the same time period to 1 in 10 for CARt (April 15th, 2020) = 0.1 (C.I. (0.02,0.5)) (a larger fraction of infections ultimately detected) and 0.008 (C.I. (0.002,0.04)) or 0.8% for IFRt (April 15th, 2020). IFRt will be lower than the true mortality rate since it does not account for unreported deaths due to COVID-19. We have also not corrected for the fact that some of the current cases are critically ill and may die in the future. We should also note that since the onset date of COVID-19 varies regionally, the global median as of April 15 is an aggregate over different stages of regional pandemic progression.
We also evaluated the effects of quarantine and excess variability. Across models we observed a substantial decrease in infection rate by active cases. However, the residual infection rate was still above zero with broad uncertainty. The infectiousness of active cases (the known infected) across all regions was much lower than that of the remaining (unreported) infected (Figure 6), with a median global q = 0.03 (C.I. (0.001,0.16)). While the median global quarantine attenuation was large, there was a broad distribution with some areas having smaller degrees than others. Within Europe, Italy had the smallest median case infection rate consistent with strong quarantine measures (Pedersen and Meneghini) (see Supplementary Figure S3 for regional details).
Variability in reported data likely leads to overdispersion, i.e. to counts that are more variable than would be expected from a Poisson distribution. We thus accounted for and analyzed overdispersion by using a negative binomial distribution in which the excess variance is modeled as (fλ)2 where λ is the mean rate for that day and f is an overdispersion factor that we estimate. As shown in Supplementary Material Figure S4, f was approximately 1.5 across all regions.
Discussion
We use our Bayesian mechanistic models to estimate the extent of COVID-19 prevalence, from publicly-available data. Importantly, we quantify the uncertainty of our estimates, finding them to be large in some regions. Our models perform this estimation task under simplified but plausible mechanisms for infection transmission, case detection, recovery, death, and disease spread mitigation.
The advantage of mechanistic models over non-mechanistic curve extrapolation (Murray and IHME COVID-19 health service utilization forecasting team) is that the model parameters have direct mechanistic interpretations that can be validated independently. More mechanisms and details can also be added incrementally.
Estimate of the Case Ascertainment Ratio (CAR)
We show that the unaccounted cases exceed the reported by 10 to 20 fold, which is consistent with two independent groups in New York (one using PCR testing) (Sutton et al.; New York Times) but not with a similar test in Santa Clara (Bendavid et al.). However, factors such as false positive rates of the serological tests and selection bias could affect outcomes. Furthermore, the studies cited were not drawn from random samples of the population. Until wide-spread serological testing is implemented, the true proportion of cumulative infections will remain unknown, but we believe that our results provide a credible range for this proportion.
Our model could provide guidance on the effectiveness of various policies for reducing infectiousness until more complete data is obtained. Similar attempts have been made to model these factors. However, they have relied on priors for R0 based on confirmed cases (Lourenco et al.; Flaxman et al.; Ferguson et al.), restricted data to reported deaths or cases only (Flaxman et al.; Pedersen and Meneghini; Murray and IHME COVID-19 health service utilization forecasting team), made strong assumptions about quarantine (Pedersen and Meneghini), or ignored the underlying mechanisms that guide the dynamics of epidemics (Murray and IHME COVID-19 health service utilization forecasting team). We are able to improve identifiability of key parameters compared to previous results as in (Lourenco et al.) because we use more information. The motivation to use only case death data is understandable as it may be a more reliable measure than case counts but it is insufficient to constrain the dynamics alone and thus has much less predictive power. However, as shown in the Supplementary Material (Sec. S2), models that includie cases, case recoveries, and case deaths, provide sufficient information for identifiability of undocumented infections although the uncertainty can be large.
Estimate of the time-dependent reproduction number (Rt)
Our model predicts the time dependent reproduction number Rt for each region, which can give an estimate as to how well mitigation policies are working. Our results are consistent with social distancing, or other policies reducing transmission, as shown by the decrease in Rt relative to R0 at outbreak onset. Our directly inferred region-wide R0 estimate of 3.3 is larger than the mean of priors chosen for some other models (Lourenco et al.; Flaxman et al.; Ferguson et al.), but it is consistent with estimates from other work (Cao et al.; Sanche et al.) including several reviewed in (Liu et al.). Our estimate is close to the upper estimates for Italy by Pedersen and Menegheni (Pedersen and Meneghini) of 3.5 when removing quarantine on active cases. Using a similar linear SIR model with complete quarantine lowered their R0 estimate to 2.6. Our estimate of R was robust to assumptions on the effectiveness of quarantine (models without this factor gave similar estimates for R0 see Supplementary Material Figure S2). Our estimate of parameter q near zero for Italy implies near perfect quarantine. There was more incomplete quarantine in other regions.
Policy implications
We found a decrease in Rt over time with the global estimate currently near 1, though many regions are still well above this as of April 15. However, Rt does not need to be below 1 for herd immunity to stop the spread of COVID-19. When the fraction of infected individuals reaches 1-1/Rt regionally, then in the absence of reinfection from outside, herd immunity will quench the pandemic locally (Fine et al.). Although there are regional variations, if the current estimate of Rt persists then the pandemic will begin to extinguish itself when 17% (C.I. (0%, 60%)) of the susceptible global population has achieved resistance due to either recovery from prior infection or vaccination (0% means we have herd immunity now, under the current mitigation measures). However, if mitigation measures are lifted and Rt returns to R0, then that fraction rises to 70% (C.I. (33%, 88%)) to guard against further waves of illness upon reinfection.
The time-varying models give other notable results. On average CARt has increased, indicating that a larger proportion of infections are being detected. This might be explained by recognition of symptoms increasing during the pandemic coupled with broader availability of testing. Our estimate of IFRt also increased during the spread of the infection. This is probably due to the fact that early in the pandemic those fated to die would have been less likely to be identified as COVID-19 deaths, or more generally from changes in how deaths are reported over time. Because the IFR that we report uses the conventional definition, it is “right censored” -- at any given time point the number of deaths from the current number of infections will continue to grow for some time -- it could reflect an underestimate of the true mortality rate. Because our models explicitly estimate the rate at which both case and non-case infected either die or recover, one can derive a closed form expression for the true mortality rate.
Assumptions and Limitations
Driven by mechanistic considerations, we make several simplifying modeling assumptions that we believe to be reasonable. Chiefly, we make the standard assumptions of SIR models. For instance, we ignore heterogeneity within populations and the effect of connectivity between populations. We subsume heterogeneity of populations themselves into the model parameters, which are to be interpreted as population-averaged quantities. While mitigation measures have greatly reduced travel, cross-population transmission may still be occurring at low levels. Although our models account for mitigation and for diagnostic changes, our rigid implementation of these factors is unlikely to fully capture the true time course of their effects. Finally, biologically, our model considers an extremely simplified version of disease progression within individuals, ignoring time inhomogeneous mechanisms that may be significant (Bottcher et al.).
Motivated by statistical considerations, we also consider reduced linear approximations to the SIR models, where changes in the size of the susceptible pool are ignored. Empirically, we find these approximations to be more stable for fitting the pandemic in the early stages than the standard nonlinear SIR model. A complication in using the standard nonlinear SIR model lies in the knowledge of the size of the susceptible population. In addition to the fact that it is not known what fraction of the population has innate immunity, the inherent heterogeneity of the spread of disease in the early stages renders the susceptible population estimate problematic if not impossible. (In our nonlinear models we fit it as an adjustable parameter). The linearized model circumvents this difficulty by scaling out the population, and to some degree ameliorates the homogeneous assumption, by averaging over the initial growth of the local clusters. Hence, for the linearized model during the early stages of an epidemic, heterogeneity in infectiousness has less of an impact on the underlying aggregate state variables, and so does not provide a worse fit than a well-mixed population with the same observed data. Only later in the epidemic when the effects of herd immunity are important will the limitations of a linear model with homogeneous mixing be revealed, although the time dependence in the infectiousness we introduce could also be capturing the effects of herd immunity.
Outside of our modeling assumptions, issues of data quality may affect the accuracy of our estimates. Disparate criteria exist across regions for defining cases and case deaths, oftentimes changing within regions themselves as each epidemic progresses.
Additionally, data (including death counts) reported by many regions is likely incomplete or unreliable, as evidenced in large spikes of unexplained deaths not officially recognized as Covid-19 deaths (Covid-19 Data - Tracking Covid-19 Excess Deaths across Countries | Graphic Detail | The Economist); while we believe that systematic under-testing can be overcome up to a point, totally inconsistent or fraudulent reporting cannot be.
Future Extensions
Additional factors need to be explored in future models. For example, a more robust and flexible implementation of mitigation, as well as heterogeneity of disease progression within individuals and in their social interactions, is essential. Rather than fitting regions independently it should be possible to fit a hierarchical global model in which region-specific parameters are partially pooled to achieve additional regularization and to narrow credible intervals. The linear model is robust to uncertainty in the size of the initial susceptible population but if the disease progresses such that we begin to reach population saturation, we must use a nonlinear model that explicitly accounts for population size to make reliable parameter estimates although the time dependence in infectiousness can partially mimic this nonlinear effect. We are optimistic that such a transition will be seamless since we find consistency in model predictions between linear and nonlinear models in the current work.
Data Availability
All data and computer code used is publicly available
Supplementary Material
S1 Latent variable SIR Model
We consider a latent variable SIR model where the observables are cases C, case recoveries RC, and case deaths DC. The latent variables are the numbers of susceptible S and undetected infected I individuals. Assuming probability of contact between people is completely random and uniform and ignoring the possibility of births, migration, or deaths due to other causes, the mass action equations for the model are where N is the population of the region in question, β is the infection rate per contact, σC is the transition rate from I to C, σU is the disappearance rate of I not transitioning to C, and σ = σR + σD is the disappearance rate for cases. The parameter q accounts for possible differences in virus transmission rate from non-cases. It can be less than one due to say lower social contact or higher due to a larger viral load. If q = 0 then the model is identical to an SEIR model.
The equations obey the conservation condition where
The population U represents the portion of I that either recover or die but remain undetected.
If initially S = N, I << N, and the parameters are such that the right hand side of (S2) is positive, then I will grow until it reaches a peak where , which is given by the condition where σ = σR + σD. The fraction of the population remaining susceptible at the peak of the epidemic is the inverse of the reproduction number
(R0 can also be computed using the next generation method). The pandemic will spread if R0 > 1. The total number that becomes infected and thus are no longer susceptible is NI = N − S. The peak of this number is given by
I will decrease after NI passes . Thus, the pandemic can be mitigated by reducing , which can be achieved by either decreasing R0 or decreasing N. We model the effects of mitigation with a time dependence in β via β → βt where where t = 0 is the day of the first case. βt transitions from its initial value to a new value βm∞ on day m50 with a transition time of 5 days. We also account for the possibility that the case detection rate can change in time with with σC → σt where
For example, the method of classification of cases changed in China on a single day leading to spike in the number of cases and this time dependence accounts for situations such as that.
The mean field SIR model assumes a well mixed population where the probability of interactions are homogeneous that does not reflect actual human interactions. An actual epidemic, rather than growing homogeneously throughout the population, will be seeded locally and spread within clusters that then propagate to other local clusters. Each cluster may locally saturate before spreading to another cluster and thus the average dynamics of the epidemic over a large region is not reflected by the actual spread at the local level. Hence, the population size N in the SIR model is not simply the regional population but a complex aggregation over many interacting local clusters, which may be difficult to estimate. This is a major limitation of mean field models since N is a major factor in determining the eventual rise and fall of the pandemic. However, this limitation can be circumvented by noting that in the initial stages of the epidemic where I is small and less than even the cluster population, we can fix S = N and the system (S1), (S2), and (S3) reduces to the two dimensional linear system
N is scaled out of the dynamics.
The eigenvalues for the linear system (S10) and (S11) assuming that βt and σCt are constant in time are where Tr = βt − σCt − σU − σ and Det = −σ(βt − σCt − σU) − qβσCt. For q = 0 the eigenvalues simplify to
For most parameter values r+ will dominate the temporal dynamics. The pandemic will grow when it is positive and begin to extinguish only when βt falls below σCt + σU.
We can use the eigenvalues for a lower bound on the rate for the pandemic to extinguish. In the nonlinear equations, we can consider an effective infectiousness β* = βtS(t)/N = βt(1 − nI), where nI is the fraction of the population that is immune, either innately or by recovering from the infection.
From (S7) we can set for q = 0 to obtain
Thus the pandemic will extinguish at the rate of infection disappearance moderated by the residual spread.
S2 Parameter identifiability
Here, we investigate whether the parameters in (S10) and (S11) are in principle identifiable from the observable data. We first consider the case without mitigation in which there are six parameters, β, σC, σR, σD, σU, and q together with the dynamic variables I(t) and C(t). The observable data consists of the time series of cases, case recoveries, and case deaths, which we denote by λC(t), λR(t),and λD(t), respectively. For this analysis, we assume perfect information with no uncertainty.
From the data, we can immediately construct the following conditions
From these conditions we immediately obtain C(t), σR, and σD. Dividing (S10) by (S11), we obtain where α1 = β − σU − σC. Given that the epidemic is initiated with I > 0 and C = 0, then prior to the first case arising we can assume dI/dC = α1/σC, from.which we can integrate to obtain σCI(0) = α1C(0), where t = 0 marks the day of appearance of the first case. This then gives α1 = λC(0)/C(0).
Multiplying (S10) by σC gives and thus σCqβ = L/C. Finally, we can also rewrite (S10) as from which σC can be inferred. Given σC we can then infer I(t) from λC(t) from which we can derive conditions for β − σU and qβ, which are only two conditions for three unknowns. Thus, the observable data can at most specify five out of the six unknown parameters.
We can resolve this nonidentifiability in two ways. The first is by specifying one of the parameters. For example, we can specify σU given that we have a prior on the rate of recovery or death from the infection. We can also specify q since it is a measure of how effective the isolation of cases are. The second is to utilize the fact that mitigation acts as a time dependent perturbation on β assuming all the other parameters are fixed. The mitigation is set by two parameters, the day it is applied and the effectiveness in diminishing β. Essentially the model can be applied separately before and after mitigation where only β has a different value. If the day of mitigation is known then this results in four conditions for four unknowns, i.e. βbefore, βafter, q, and σU and the parameters can be identified. Thus in principle, with perfect information, the model can be identified if given some prior information.
S3 Parameter estimation
We estimate the parameters and their uncertainties using Bayesian methods, by putting prior distributions on each of the model parameters. A schematic of our overall Bayesian latent-variable model is shown in Fig. S1
We consider various versions of the model with varying number of parameters and weigh them using model comparison measures such as WAIC and LOO.
Considering the mean field SIR equations as a birth and death process implies a Poisson likelihood. However, in addition to this inherent stochasticity, errors will be introduced due to variability in criteria for measurement and recording. We account for this additional variance with a Negative Binomial distribution cf. where nC,R,D are the total number of new cases, recovered cases, and case deaths on day d, I(d), C(d) are the total number for day d, and ϕ is a fitted factor quantifying the extra variance where x ~ NegBinom(μ, ϕ) with E[X] = μ and var[X] = μ + μ2/ϕ. We use inverse gamma distributed priors on σU, σR and σD using parameters derived from ref X. We fit from the first day in which the daily case count is greater than one, which we call day 0. We then set C(0) to this count and I(0) = (β − σU − σC)/σCC(0).
The cumulative population that becomes infected and cumulative cases at time t are given by
The cumulative totals for recovered and dead are RC and DC. From these quantities we can estimate the case ascertainment ratio, NC/NI and the total infection fatality ratio, DC/NI.
S4 Data and Software
All data and code can be found at https://github.com/nih-niddk-mbs/covid-sicr. This repository contains several python scripts and Jupyter notebooks for replicating our findings, as described in the README file there. These can be used to obtain new post-publication estimates with additional data, provided that the data providers listed in the main text continue to provide a consistent API. We used Stan, which provides Bayesian inference using a heavily-optimized No-U-Turn sampler, a variant of Hamiltonian Monte Carlo, and each model described here is available as a .stan file. Python 3 was used for all other reported results, including interaction with Stan (via pystan). A subset of the work implemented in Julia is also available in the repository.
Acknowledgments
The authors would like to thank Christopher Kim for early assistance and Kevin Hall for critical comments. CCC and SV were supported by the Intramural Program of the NIH, NIDDK. RCG was supported by NIDCD, NINDS, and NSF. This work utilized the computational resources of the NIH HPC Biowulf cluster (http://hpc.nih.gov).