Abstract
Understanding and accurately estimating epidemiological delay distributions is important for public health policy. These estimates directly influence epidemic situational awareness, control strategies, and resource allocation. In this study, we explore challenges in estimating these distributions, including truncation, interval censoring, and dynamical biases. Despite their importance, these issues are frequently overlooked in the current literature, often resulting in biased conclusions. This study aims to shed light on these challenges, providing valuable insights for epidemiologists and infectious disease modellers.
Our work motivates comprehensive approaches for accounting for these issues based on the underlying theoretical concepts. We also discuss simpler methods that are widely used, which do not fully account for known biases. We evaluate the statistical performance of these methods using simulated exponential growth and epidemic scenarios informed by data from the 2014-2016 Sierra Leone Ebola virus disease epidemic.
Our findings highlight that using simpler methods can lead to biased estimates of vital epidemiological parameters. An approximate-latent-variable method emerges as the best overall performer, while an efficient, widely implemented interval-reduced-censoring-and-truncation method was only slightly worse. Other methods, such as a joint-primary-incidence-and-delay method and a dynamic-correction method, demonstrated good performance under certain conditions, although they have inherent limitations and may not be the best choice for more complex problems.
Despite presenting a range of methods that performed well in the contexts we evaluated, residual biases persisted, predominantly due to the simplifying assumption that the distribution of event time within the censoring interval follows a uniform distribution; instead, this distribution should depend on epidemic dynamics. However, in realistic scenarios with daily censoring, these biases appeared minimal. This study underscores the need for caution when estimating epidemiological delay distributions in real-time, provides an overview of the theory that practitioners need to keep in mind when doing so with useful tools to avoid common methodological errors, and points towards areas for future research.
What was known prior to this paper
Importance of accurate estimates: Estimating epidemiological delay distributions accurately is critical for model development, epidemic forecasts, and analytic decision support.
Right truncation: Right truncation describes the incomplete observation of delays, for which the primary event already occurred but the secondary event has not been observed (e.g. infections that have not yet become symptomatic and therefore not been observed). Failing to account for the right truncation can lead to underestimation of the mean delay during real-time data analysis.
Interval censoring: Interval censoring arises when epidemiological events occurring in continuous time are binned into time intervals (e.g., days or weeks). Double censoring of both primary and secondary events needs to be considered when estimating delay distributions from epidemiological data. Accounting for censoring in only one event can lead to additional biases.
Dynamical bias: Dynamical biases describe the effects of an epidemic’s current growth or decay rate on the observed delay distributions. Consider an analogy from demography: a growing population will contain an excess of young people, while a shrinking population will contain an excess of older people, compared to what would be expected from mortality profiles alone. Dynamical biases have been identified as significant issues in real-time epidemiological studies.
Existing methods: Methods and software to adjust for censoring, truncation, and dynamic biases exist. However, many of these methods have not been systematically compared, validated, or tested outside the context in which they were originally developed. Furthermore, some of these methods do not adjust for the full range of biases.
What this paper adds
Theory overview: An overview of the theory required to estimate distributions is provided, helping practitioners understand the underlying principles of the methods and the connections between right truncation, dynamical bias, and interval censoring.
Review of methods: This paper presents a review of methods accounting for truncation, interval censoring, and dynamical biases in estimating epidemiological delay distributions in the context of the underlying theory.
Evaluation of methods: Methods were evaluated using simulations as well as data from the 2014-2016 Sierra Leone Ebola virus disease epidemic.
Cautionary guidance: This work underscores the need for caution when estimating epidemiological delay distributions, provides clear signposting for which methods to use when, and points out areas for future research.
Practical guidance: Guidance is also provided for those making use of delay distributions in routine practice.
Key findings
Impact of neglecting biases: Neglecting truncation and censoring biases can lead to flawed estimates of important epidemiological parameters, especially in real-time epidemic settings.
Equivalence of dynamical bias and right truncation: In the context of a growing epidemic, right truncation has an essentially equivalent effect as dynamical bias. Typically, we recommend correcting for one or the other, but not both.
Bias in common censoring adjustment: Taking the common approach to censoring adjustment of naively discretising observed delay into daily intervals and fitting continuous-time distributions can result in biased estimates.
Performance of methods: We identified an approximate-latent-variable method as the best overall performer, while an interval-reduced-censoring-andtruncation method was resource-efficient, widely implemented, and performed only slightly worse.
Inherent limitations of some methods: Other methods, such as jointly estimating primary incidence and the forward delay, and dynamic bias correction, demonstrated good performance under certain conditions, but they also had inherent limitations depending on the setting.
Persistence of residual biases: Residual biases persisted across all methods we investigated, largely due to the simplifying assumption that the distribution of event time within the primary censoring interval follows a uniform distribution rather than one influenced by the growth rate. These are minimal if the censoring interval is small compared to other relevant time scales, as is the case for daily censoring with most human diseases.
Key limitations
Differences between right censoring and truncation: We primarily focus on right truncation, which is most relevant when the secondary events are easier to observe than primary events (e.g., symptom onset vs. infection)—in this case, we can’t observe the delay until the secondary event has occurred. In other cases, we can directly observe the primary event and wait for the secondary event to occur (e.g., eventual recovery or death of a hospitalized individual)—in this case, it would be more appropriate to use right censoring to model the unresolved delays. For simplicity, we did not cover the right censoring in this paper.
Daily censoring process: Our work considered only a daily interval censoring process for primary and secondary events. To mitigate this, we investigated scenarios with short delays and high growth rates, mimicking longer censoring intervals with extended delays and slower growth rates.
Deviation from uniform distribution assumption: We show that the empirical distribution of event times within the primary censoring interval deviated from the common assumption of a uniform distribution due to epidemic dynamics. This discrepancy introduced a small absolute bias based on the length of the primary censoring window to all methods and was a particular issue when delay distributions were short relative to the censoring window’s length. In practice, other biological factors, such as circadian rhythms, are likely to have a stronger effect than the growth rate at a daily resolution. Nonetheless, our work lays out a theoretical ground for linking epidemic dynamics to a censoring process. Further work is needed to develop robust methods for wider censoring intervals.
Temporal changes in delay distributions: The Ebola case study showcased considerable variation in reporting delays across the epidemic timeline, far greater than any bias due to censoring or truncation. Further work is needed to extend our methods to address such issues.
Lack of other bias consideration: The idealized simulated scenarios we used did not account for observation error for either primary or secondary events, possibly favouring methods that do not account for real-world sources of biases.
Limited distributions and methods considered: We only considered lognormal distributions in this study, though our findings are generalizable to other distributions. Mixture distributions and non-parametric or hazard-based methods were not included in our assessment.
Exclusion of fitting discrete-time distributions: We focused on fitting continuous-time distributions throughout the paper. However, fitting discretetime distributions can be a viable option in practice, especially at a daily resolution. More work is needed to compare inferences based on discrete-time distributions vs continuous-time distributions with daily censoring.
Exclusion of transmission interval distributions: Our work primarily focused on inferring distributions of non-transmission intervals, leaving out potential complications related to dependent events. Additional considerations such as shared source cases, identifying intermediate hosts, and the possibility of multiple source cases for a single infectee were not factored into our analysis.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
SF was supported by Wellcome Trust (210758/Z/18/Z).
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
Disclaimer: The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the CDC, U.S. Department of Health and Human Services.
Data Availability
All code used in the present study are available on https://github.com/parksw3/epidist-paper