Reviewer #3 (Public Review):
This manuscript wades into a research area that has risen to prominence during the COVID-19 pandemic, namely the estimation of time-varying quantities to describe transmission dynamics, based on case data collected in a given location. The authors focus on the interesting and challenging setting of low-incidence periods that arise after epidemic waves, when local spread of the virus has been contained, but new cases continue to be seeded by travelers and local spread potential can change as control measures are relaxed. There are important questions that arise in this context, such as when it is safe to declare the pathogen locally eliminated, and how to detect a flare-up quickly enough to stamp it out.
The authors propose a new framework, made up of a smoothed estimate of the local reproductive number, R, and another quantity they call Z, which is a measure of confidence that the local epidemic has been eliminated. They apply this framework to three public data sets of COVID-19 case reports (in New Zealand, Hong Kong and Victoria, Australia), each spanning multiple waves of infections interspersed with quieter periods when most cases arise from importation. They show how the smoothed R estimates align with the reported case data, and accurately capture periods of supercritical (R>1, so epidemics can take off) and subcritical (R<1, so epidemics wane) local transmission. They also show how the Z metric fluctuates through time, rising to near 100% at a few points which correspond closely to official declarations of elimination in the respective settings. The authors draw some parallels between their inferred R and Z metrics and the changes in control policies on the ground. They also highlight a number of points where the R and Z metrics seem to anticipate changes in the epidemiology on the ground, which are interpreted as advance 'signals' or 'early-warning' of ensuing waves of cases. This interpretation seems to underlie the manuscript's overall framing in terms of 'early-warning signals' that can be used 'in real time'.
Taken at face value, these are exciting claims that could form the basis of a useful public health tool. However I was not convinced that the framework was actually making these predictions in real time, i.e. strictly prospectively using available data. The approach would still have value if applied retrospectively, particularly with regard to understanding the impact of interventions applied in each setting. To this end, a more formal analysis of the relation between control measures and the R and Z metrics would benefit the paper.
Strengths
The paper is exemplary in clearly delineating the roles of importation versus local transmission in shaping case incidence during these low-incidence periods. This is a crucial distinction in this context, which is too often blurred.
The authors also innovate by bringing a suite of Bayesian filtering and smoothing techniques to bear on inferring R from these data, with the goal of extracting the cleanest signal possible from the noisy data. These approaches are well contextualized relative to more standard techniques in epidemiology, and appear to bear fruit in terms of smooth and stable estimates. However, it is important to note that this manuscript is not the primary report on these methods; the authors have written up this work elsewhere (ref. 16) and it is not described with sufficient detail for this manuscript to stand alone.
It is an interesting and valuable idea to derive a metric (Z) that explicitly estimates the degree of confidence that the pathogen has been eliminated locally. Again, the present manuscript builds closely on prior work by the authors (ref. 15), with the innovation of blending the earlier theory with the new Bayesian smoothed estimates of R.
The selected data sets are perfectly suited to the problem at hand, and analyzing three parallel case studies allows for the behavior of the R-Z framework to be observed across contexts, which is valuable.
Weaknesses
As presented, the manuscript does not seem to show real-time early-warning signals, as I understand those terms. The forward-backward smoothing algorithms that form the backbone of the study estimate R_s (i.e. the value of R at time s) using case data from both before and after time s. That is, the algorithm relies on knowledge of future events and so it cannot be said to provide early warning in any practical sense. Similarly, the estimates of Z draw upon the same 'smoothing posterior' q_s, so they also rely on future knowledge. (I doubted my understanding of this point, given the strong framing of the manuscript and limited methodological details, but the full explication of the method in ref. 16 is quite clear that the 'filtering posterior' p_s is suitable for real-time estimation, but the smoothing approach is retrospective and requires knowledge of the full dataset.)
Viewed in this light, the 'early-warning signals' in the Results are actually just smoothing of the yet-to-occur case data, and thus sadly are much less exciting. It did seem too good to be true. If I have understood correctly, then the current framing of the work seems inappropriate - unless the authors can show that R and Z metrics estimated strictly from past data can provide reliable signals of coming events.
An alternative approach would be to use the framework as a retrospective tool, and use it to build quantitative understanding of the impact of control measures and to revisit the timing of declarations of elimination. Table 1 goes some distance toward describing the relationships between R and Z values and these policy shifts and announcements, but I struggled to pull much value from it. The table and associated text mostly come across as a series of anecdotes where R fell after NPIs were imposed, or rose again when local transmission occurred, but there is no analysis that takes advantage of the more refined estimates of R the authors have obtained with their smoothing approach. One issue is that the time windows included in the table are not contiguous, so all the vignettes feel disjointed.
As presented, while the concept of the Z metric is attractive, it was hard to discern any conclusions about how to make use of its value. In two of the datasets it rose to near 100%, which is a clear signal of elimination, but as noted these were periods when the WHO rule of thumb (28 days without new cases) was sufficient. At some other points, the authors emphasize the implications of Z dropping close to 0% (e.g. at the top of page 7: on July 5 in Hong Kong, Z 0% despite 21 days without local cases, and the authors highlight the contrast with the WHO rule). However these findings clearly arise from the smoothing of future data mentioned above (i.e. on July 5 in Hong Kong, R is rising to supercritical levels based on advance knowledge of the rapid rise in cases in the next few weeks). Thus these findings are not relevant to real-time decision support. Finally, there are several periods where Z fluctuates around 20-50% for reasons that are hard to discern (e.g. July in New Zealand, or April-May in Hong Kong). The authors write in that the Z score may exhibit a peak due to extinction of a particular viral lineage in Hong Kong, while other lineages continued to circulate. It is hard to grasp how this interpretation could apply, given the aggregated nature of the data; more evidence, or more refined arguments, are needed for this to be convincing.
In the big picture, the proposed framework is based on two quantities, R and Z, but there is no systematic analysis of how to interpret these two quantities jointly. It would be valuable, for instance, to see how these metrics perform on a two-dimensional R-Z phase space.
The authors acknowledge a number of assumptions and data requirements needed for this approach, as presented. These include perfect case observation, no asymptomatic transmission, perfect identification of imported versus locally infected cases, and no delays in reporting. The authors state that the excellent surveillance systems in their case-study locales minimize the impact of these assumptions, but the same cannot be said of most other places around the world. Digging deeper into the epidemiology, the distribution of serial intervals (a crucial input to the algorithm) is assumed to be invariant, even though it's been demonstrated to change when interventions are imposed (ref. 26), i.e. exactly the conditions of interest. Finally, superspreading is a prominent feature of the COVID-19 epidemiology (as nicely documented for Hong Kong, by one of the authors), but is not addressed by this model beyond allowing subtle fluctuations in R from day to day. Taken together, these strong assumptions and omissions raise questions about the real-world reliability of this framework. Given that the point of the manuscript is to develop more refined quantitative metrics, and that most of these assumptions will be violated in most settings, it would be valuable to demonstrate that the framework is robust to these violations.