Abstract
The number of positive diagnostic tests for SARS-CoV-2 is a critical metric that is commonly used to assess epidemic severity and the efficacy of current levels of control. However, a proportion of individuals infected with SARS-CoV-2 may never receive a diagnostic test, while many of those who are tested may receive a false negative result. Consequently, cases reported through testing of symptomatic individuals represent only a fraction of the total number of infections, and this proportion is expected to vary depending on changes in natural factors and variability in test-seeking behaviour. Here we combine a number of data sources from England to estimate the proportion of infections that have resulted in a positive diagnosis. Using published estimates of the incubation period distribution and time-dependent test sensitivity, we estimate SARS-CoV-2 incidence from daily reported diagnostic test data. By calibrating this estimate against surveillance data we find that approximately 25% of infections were consistently reported through diagnostic testing before November 2020. This percentage increased through the final months of 2020, predominantly in regions with a large presence of the the UK variant of concern (VOC), before falling rapidly in the last two weeks of January 2021. These changes are not explained by variation in rates of lateral flow device or PCR testing, but are consistent with there being an increased probability for the VOC that infection will result in an eventual positive diagnosis.
Testing for SARS-CoV-2 in the UK aims to accomplish two things -first, to rapidly confirm suspected cases of COVID-19 disease via symptomatic testing in order to contain outbreak clusters, and second, to establish the overall burden of infection by taking a random sample of the population. So long as the number of false positive results is sufficiently low, the number of positive symptomatic tests provides a lower estimate of the number of people exposed to the virus. In contrast, random testing can provide an accurate estimate of prevalence, but is an inefficient way to rapidly identify infection clusters, and may also have biases depending on the extent to which a positive test indicates the true infection status of the individual.
Both types of data are available in England: the number of positive tests from people with suspected infection are published daily by the UK government [1], and the Office for National Statistics (ONS) regularly publishes estimates of the population prevalence based on unbiased sampling [2]. The existence of these sources creates an opportunity to answer an important question question: what proportion of all infections are being reported through diagnostic testing? Knowing this can help to estimate true incidence rates, a quantity central to modelling the spread of the virus, and determine the infection fatality rate of the disease [3].
Here we describe how diagnostic case numbers can be used to estimate the proportion of the population testing positive, and, by calibrating this against ONS surveillance data, estimate the proportion of infections that were reported through diagnostic testing. Our method uses published estimates the time-dependent PCR test sensitivity, however, daily case data also include rapid in situ tests conducted with lateral flow devices (LFDs), typically used for the screening of non-symptomatic people in high-risk environments (e.g. schools, care homes). There is concern that LFDs have lower sensitivity than PCR tests, particularly when administered by members of the general public [4].
Further complications come from new variants of SARS-CoV-2, which have been shown to result different clinical symptoms [5,6]. New variants might have different pathological characteristics, a difference in incubation period or rate of viral shedding for example, that effect the likelihood of receiving a positive test. One such difference might be a different rate of test-seeking behaviour, which we expect to directly affect the proportion of infections being reported through diagnostic testing. Examination of regional and temporal variation in the proportion of cases reported provides a novel way to consider these developments, and to enrich our understanding of the epidemiology of the virus.
1 Data
We are primarily concerned with daily Pillar 1 and 2 case data [1], hereafter referred to as diagnostic test cases, shown in Fig. S2. We use the following notation: These cases counts come from lab-based PCR tests and lateral flow device (LFD) testing, as performed in many community settings [7]. We assume the PCR tests have consistent sensitivity across all uses. Our analysis uses the 7-day rolling average number of people teted via PCR, and the number of LFD tests conducted, shown in Fig. S1. At the time of writing, the number of cases detected by LFDs were not available. Population counts for the 9 regions of England were taken from the ONS [8].
From the ONS infection survey [2], we use the reported percentage of people in England testing positive for coronavirus, and apply it to the population of the area to give where t is the 4th day of the 7-day period during which the samples were taken. Addition-ally, for the 9 regions of England we use estimates that differ from the national percentage due to additional modelling to interpolate between time points and correct for biases [9]. Since the latest release only contains the most recent time point for these data, the time series we use is compiled from the archived data releases. These are plotted in Fig. S2. We note a potential inconsistency in these data due to the selection of different modelling methods at different points in time.
Finally, the infection survey provides an estimate of the proportion of tests that achieve different testing targets using the TaqPath test [10]. We consider tests that are negative for the S target gene and positive for the two other targets to be a proxy for infections that derive from the B.1.1.7 lineage of the virus, hereafter referred to as the variant of concern (VOC). Tests that are positive for only one of the other targets may indicate the new variant or another lineage [11], and so we define as the estimated proportion of tests that are negative on the S target after removing those that are negative on the S target and one other target. Since the S dropout can occur by chance, this characterisation is not appropriate when the prevalence of the VOC is very low. We assume the proportion to be 0 before November 1st. Additionally, the mean cycle threshold of PCR-positive tests used in the ONS survey are provided, shown in Fig. S3.
2 Estimating the time of exposure from the time of positive test
We define two random variables, x and t+, representing the time an individuals was exposed and the time they received a positive test, respectively. Assuming discrete time steps, we express the probability that an individual who received a positive test from a sample taken at time T was first exposed to the virus at time X, The joint probability distribution P (x = X & t+ = T) can be pieced together from various sources by considering the sequence of events that result in an individual testing positive.
First, we consider the time the individual was exposed to the virus and acquired the infection. We denote the prior probability that the infection was acquired at time X with B(X). Next, we consider the time between exposure and the time that they received a test. For symptomatic cases we assume that the test occurs shortly after symptom onset, i.e. the time since exposure τ = T −X is equal to the sum of the incubation period and a delay parameter δ that we assume to be a fixed quantity. The probability of a test on day T, is thus R(T − δ − X) where R(i) is the probability that the duration of the incubation period is i, which we assume to be Log-normal with a mean of 5.5 days and dispersion parameter 1.52 [12].1
Lastly, once the individual has acquired the infection and has had a test, the test must be positive to become a recorded infection. The probability of testing positive varies as a function, S(τ), of the time since exposure τ = T − X; here we use the function provided by Hellewell et al. [13] and shown in Figure 1. The curve is similar in shape to the shedding profile found in other studies [14–16] and is largely consistent with the literature on viral shedding according to the most current review on the subject [17], which reports viral load peaks at day 3-5 and a mean viral shedding duration of 17 days. Variation is associated with age and severity of illness. Studies that test for a difference between asymptomatic and symptomatic infections do not report consistent results [15, 18], we therefore assume the same sensitivity profile for all infections.
We express the probability that an infected individual was exposed on day X and tested positive on day T by multiplying the probabilities mentioned above, which can be substituted into Eq. (1).
1 Estimating the reporting rate
We define the reporting rate as the proportion of SARS-CoV-2 infections that result in a PCR-positive test through the government Pillar 1 and 2 testing programmes. To account for the unreported infections we assume a multiplicative scaling factor where θ accounts for both test sensitivity and the proportion of infected individuals who seek testing.
To estimate the incidence, I(t; θ), defined as the number of newly acquired infections on day t, let us first consider that the number of cases exposed on day t and tested positive on day, t + j is approximately P (x = t | t+ = t + j)C(t + j). The first part of this expression is given by Eq. (1) which in turn depends on Eq. (2). To resolve this, we assume an uninformative prior B(t) = β where β is a constant whose value is inconsequential to the final result,2 and assume a 1-day delay between symptom onset and receiving a test (δ = 1). Finally, to account for the unreported cases we multiply by 1/θ. Summing over all possible days in which the exposure could be reported gives Next, we map the individuals exposed on each day to the number of reported cases that would test positive on a given future day t, For a given point in time, t, solving for θ gives a time-dependent estimate for θ, This equation combines the daily diagnostic case counts, the population positivity from surveillance, the incubation period distribution, and the time-dependent test sensitivity, to provide an estimate of the proportion of infections being reported at time t.
3.1 Time-independent reporting rates
To get a measure of the reporting rate that does not change over time we take θ to be a constant. Motivated by the suggestion that the VOC may have a different pathological characteristics to other variants, we in choose to use two reporting rates corresponding to the usual reporting rate, θOLD, and the reporting rate for cases caused by the VOC, θVOC. Recalling that V (t) is the proportion of infections caused by the VOC, we consider a revised estimate of M (t) using the convex combination We can then estimate by taking the value that minimizes the absolute error between M (t) and Specifically, the time-independent reporting rates are estimated by numerically solving
Results
Reporting rates calculated using Eq. (5) are presented for the 9 regions of England in Figure 2. There is an apparent difference when comparing the four southern-most regions (inc. East of England) to the others. Until the beginning of December, these regions show a general increase in the percentage of infections being reported, while the others show a decrease. While the number of PCR tests conducted increased steadily in all regions and LFD testing had not yet begun (Fig. S1), the correlation between testing and the reporting rate over the whole time period is positive in southern regions and negative for those in the north (Table S1).
Reporting rates increased during November and December 2020 in all regions. While there may be various behavioural factors causing this increase, we note that the VOC increased considerably in proportion to other variants during this time period. Since there is some evidence suggesting that the VOC has a higher severity of clinical symptoms [5,6], it is reasonable to suggest it may also have a reporting rate distinct from the average of other variants.3 Secondly, over the same time period LFD testing, typically used for screening non-symptomatic individuals, became widely accessible. Since asymptomatic infections make up an estimated 20% of all infections [20], we would expect this scheme to have increased the reporting rate.
To test both possibilities, we compute the time-independent rates for the VOC and other variants, calculated using the method described in section 3.1. We include the effect of LFD tests by estimating the number of LFD positives and subtracting these from the total Pillar 1 and 2 cases before repeating the analysis. In doing so, the reporting rates θOLD and θVOC become an estimate of the proportion of infections that are reported with a positive PCR test (for the old variants and new variant, respectively). In doing so, we are assuming independence between the testing regimes, i.e. ignoring any interaction between the two. Since at the time of writing the number of LFD positives are not published, we explore the impact of having a fixed percentage of all LFD tests being positive (Tables 1 and S2).
We find that larger LFD positivity rates result in lower reporting rates for the VOC, however, the conclusion that reporting rate is higher for the new variant compared to older variants holds for a wide range of assumed LFD positivity rates. The best fit to the surveillance data occurs when 0% of all LFD tests are positive. This does not necessarily imply that all LFD tests are negative, only that the new variant has considerably more power to explain the increases in reporting rate observed since November 2020, unless the proporton of LFD tests that are positive changes substantially over time. In all cases, the estimated reporting rate in England for variants that are not the VOC is 25%, with regional variation shown in Figure 3. In the best fitting case (LFD positivity = 0) the reporting rate for the VOC is 32%. Estimates of θVOC are smallest in regions where V (t) provides a relatively small amount to the overall infections, having little impact on the outcome compared to other causes of variation.
In addition to a change in reporting rate, it is possible that the test-sensitivity for the VOC is substantially different from the assumed S(τ). For example, infections could be more likely to have been detected more than 3 weeks after exposure; a higher proportion of these “historical”infections would then be included in the ONS population positivity estimates. If we assume that a higher mean cycle threshold implies a higher probability of being a historical infection, it follows that the mean cycle threshold for the PCR tests used in the ONS infection survey indicate the abundance of historical infections in the data. We observe that the times for which our estimate is lower than the ONS figure are times when ONS cycle threshold values are highest, and reporting rates are negatively correlated with the ONS cycle threshold values in several regions, adding support to this hypothesis.
Conclusion
In this analysis, we have used a combination of two important data streams to estimate the true incidence of SARS-CoV-2 infections in England over time. This approach would allow us to directly estimate other quantities of epidemiological importance for COVID-19, such as the infection fatality rate, and, as a methodology, could be used in future to provide early estimates for novel emerging pathogens provided estimates of incubation periods and test sensitivity over time are available.
Our analysis reveals considerable variation across regions of England in the proportion of infected individuals testing positive. Since this variation is unlikely to be the result of biological differences, we have reason to suspect differences in test-seeking behaviour, with regions in the south of England and those in the North showing markedly different patterns. A particular concern is the evidence of a decline in test-seeking behaviour in the Northern regions from the autumn onwards, that might compromise the ability to control the spread of the virus in these regions. The reasons why may be a combination of differences in attitudes, messaging, accessibility of tests, and natural epidemiological phenomena. Confirming this observation and, if confirmed, understanding why will be important for assessing future directions of control.
There is some evidence to suggest that the new variant has a higher reporting rate than other variants, however other explanations for the increase in θ(t) should also be considered. If we assume that it is only the VOC causing the observed change, then approximately 32% of new variant infections are reported with a positive test compared to 25% for other variants. These results may be change over time as new data continues to become available. Nonetheless, the sudden deviation from the reliable pattern raises questions about other possible changes that have occurred that have not been observed until now. The impact of the VOC on testing data is perhaps a warning that what we think we know about SARS-CoV-2 may become redundant over time as the virus continues to evolve.
Data Availability
All data are taken from publicly available sources referenced within the manuscript
Footnotes
↵* ecolman{at}ed.ac.uk
↵† rowland.kao{at}ed.ac.uk
↵1 To get a probability distribution expressing the length of the incubation period in discrete days, we integrate over consecutive intervals of length 1
↵2 This assumes that the probability of exposure is constant over the time period for which the individual could feasibly have been exposed given they tested positive at time t
↵3 One study of self-reported symptoms did not find support for this [19].