Abstract
Background Wrong Covid-19 prevalence measurement can cost lives or economic output. A number of countries offer random Covid-19 tests and report daily positivity rates. However, since virus testing has to be voluntary, all tests done in the field, even if supposedly random, suffer from selection bias, which is not limited to having a representative sample, and thus cannot be corrected by the usual methods. The issue is that people who feel they have symptoms (or other reasons to suspect they have Covid-19), are more likely to volunteer to get tested, and testing stations cannot readily correct this by oversampling.
Methods We used a controlled, incentivized online experiment with over 600 subjects of all ages in a European country.
Results People under 30 with symptoms are 1.532 times more likely to test when there is no waiting time, compared to those without symptoms. This figure increases to 2.882 when there is a short wait of 5-15 minutes; 4.423 with a 15-30 minute wait; 15.5 with a 30-60 minute wait and 38 with a 1-2 hour wait. The ratio for 30-50 year-olds rages between 1.517 for no wait and 16 for a 1-2 hour wait. For over 50-year-olds, the ratio ranges between 1.708 and 11.333.
Conclusions “Random” tests in the field inflate infection figures by many times. We suggest ways to correct the bias of the testing stations and a cleaner way to sample the population to avoid the bias altogether. Our methodology is relevant to Covid-19 and to any other epidemic.
1. Background
Tackling the covid-19 pandemic is of paramount importance for obvious health and financial reasons. Over 2 million deaths have been confirmed globally (Johns Hopkins 2020), and there are reports of excess mortality over and above officially reported Covid-19 deaths,1 as well as indirect health effects and deaths (undiagnosed/undertreated diseases2-3 and suicides.4-5). The pandemic per se, but also the containment measures against it, have crippled economic activity, leading to increasing unemployment rates and shrinking income worldwide6 – in turn leading to further deterioration of health.7-8 The design of optimal interventions to fight the disease, in time and space, requires efficient and accurate prevalence measurement, preferrably in real time. How to measure prevalence for infectious diseases? In this paper we claim that existing, commonly used measurement methods are flawed because testing is voluntary, leading to heavy self-selection bias. We present calculations how self-selection translate into biased prevalence estimates generally, and estimate the likely size of the prevalence bias for different age groups, employing incentivised controlled experiments with a sample of about 600 subjects of all ages in Greece, using standard experimental methods (such methods have been used, to estimate the demand for HIV testing in an influential paper by R. Thornton9, but have not been used for disease prevalence estimation before). Our estimates can be then used to calculate the prevalence bias for countries with different demographics. Further we present the parameter estimations necessary to debias the current prevalence estimates in the field, but, crucially, suggest a novel method to bypass the self selection bias altogether with an estimation procedure that is at the same time faster, more accurate and more feasible.
To understand the relevance of these results, start by noting that suggested policy responses and their implementation (e.g. social distancing rules) will inevitably be inefficient if we are not aware of the real number of active cases, and in which areas and age groups these occur. Observing mortality rates or the number of hospitalisations and patients in ICU are not real time measurements; they only provide an estimate of how many people caught Covid-19 weeks earlier (and estimating the fatality rate is also challenging).10 This time lag is very important when trying to evaluate interventions. Without real time data, measuring the effect of a vaccine will take months, on top of the time the vaccine takes to have a medical effect. Understanding the full effect of other events on the disease, like the Christmas holidays (which led to more interaction and possibly higher transmission) similarly takes months (see excellent work on the effectiveness of NPIs, which uses death counts, lagging by several weeks)11. On the other hand, knowing the current number of actual cases, allows the design of optimal policy response, and also provides a forward looking estimate of hospitalisations and mortality. The health systems get warning several weeks ahead, gaining invaluable time for necessary adjustments.
Community testing, often conducted in the high street and in neighbourhoods, is widely considered a useful tool to monitor incidence and trends (e.g. the ECDC12 lists “[to] reliably monitor SARS-CoV-2 transmission rates and severity” among five objectives of testing). It also publishes weekly testing data and “positivity rates” by EU State.13 However, we show that such testing cannot provide accurate estimates of Covid-19 prevalence, and the main problem is not related the typical issues that arise in population sampling, such as age group structure. Testing has to be voluntary, and people are likelier to self-select into testing if they have reasons to believe they might be having Covid-19 (such as, e.g. if they have symptoms or if they are exposed to a high-risk environment). This self-selection bias increases non-linearly with waiting times and any other cost associated with testing. To make prevalence estimation harder, the bias is time-varying, because it depends non-linearly on time varying parameters. For example, when cases rise steeply, people might be more likely to want to test out of fear – but at the same time this will also affect their behaviour and thus the likelihood of catching Covid-19.
To summarise, the objective of this paper is to examine whether and to what extent bias occurs in Covid-19 testing, to offer a debiasing solution to accurately estimate Covid-19 prevalence in the field using existing procedures, and lastly to propose a better procedure, both more economical and more accurate.
2. Data and Methods
Standard theoretical arguments allow the precise calculation of the prevalence bias (see Appendix B). However, the size and direction of the prevalence bias in the field is an empirical issue, and crucially depends on the self-selection testing bias based on symptoms. Are people who believe they have symptoms more likely to seek testing, and if yes, by how much? In order to measure this, we ran incentivised controlled experiments.
Data collection took place over a week, from 11 till 18 December 2020. The majority of the responses were collected online, via the QualtricsTX platform. To enable greater representativeness of the sample, 94 responses (16%) from elder people (median age = 63) were collected using phone interviews. Out of 608 participants starting the online study, 24 (4.7) dropped out mostly after the first few questions, resulting in the final sample of 578 observations.
Median age for the sample was 39 years (median for Greece 45.6), and the age distribution is shown in Figure 1.
Subjects were invited to participate in a study on Covid-19 and related behaviours. Upon signing a consent form, the participant was first asked about general and Covid-19-related health. We then elicited hypothetical willingness to wait (WTW) to take a rapid test for Covid-19, conditional on (i) feeling healthy, (ii) having flu-like symptoms, (iii) having Covid-19 like symptoms. For all three hypothetical scenarios, the test was being offered by the national health authority (EODY) while the participant was walking down the street. This was done to reduce the (hypothetical) travel costs and reliability-related concerns.
After eliciting the hypothetical WTW, we asked the subjects several control questions, including exposure to Covid-19 risky environments (e.g. taking public transport or working fate-to-face with many people) and socio-demographics. After completing the compulsory part of the study, the participants were offered an optional task for which they were randomly allocated to one of the two prize treatments. In treatment Book, the participant would enter a 1/30 chance lottery for a voucher for the local large-scale bookshop chain (“Public”), worth €80. In treatment Test, the participant would enter the same 1/30 chance lottery for a voucher for a home-administered Covid-19 test. For both prizes, the delivery was guaranteed within the next 36 hours. All 578 participants completed the hypothetical elicitation and the control questions (left part of Figure 2).
As was partly expected, a substantial part of the sample (n=174) did not continue to the optional task. A major part of it (n=78) was the elder people subsample. We are not very concerned that the inconvenience of the waiting task over the phone was the issue, since the participants came from the sample that previously participated in a study involving a real effort task over the phone14. For n=38 participants, a software glitch in Qualtrics, in the first five hours of the study resulted in missing recording of the treatment allocation, so we had to drop their data despite completion of the optional task.
The participants then read the description of the optional task. They learned that it involved waiting in front of their screen for some time (target) that would be revealed in the next screen, and the lottery draw for the prize would take place right after the wait. They also learned that to ensure that they are waiting, a button would appear at random times and they would need to press it within 4 seconds to avoid being disqualified. Among the 303 participants who read the description of the optional task, 241 continued to the next screen which revealed the waiting target. At this stage, they were randomly allocated to one of the four waiting target conditions {300, 600, 900, 1200}. Upon learning the wait time, further 59 participants dropped out instantly (median target time 900 seconds). Among the 241 waiting, 69 dropped out before completing the full wait (median target time 900 seconds). In total, 172 participants completed the waiting target (median target 600 seconds).
Upon completing the waiting task, each participant was randomly allocated to one of the four Cash conditions, {€20, €35, €50, €65}. The participant was offered a choice to enter the lottery for: (a) the original prize (Book, Test), or (b) the displayed Cash amount. Out of the 172 participants, 112 chose to swap the original prize for the cash amount, whilst 60 chose to stay with the original prize (median cash value €35 for both). A total of 7 participants won the lottery.
3. Results
Table 1 shows the ratio of willingness to test between people with symptoms and those without. The figure ranges between 1.5 and 38, depending on the age group and waiting times. People under 30 with symptoms are 1.532 times more likely to test when there is no waiting time, compared to those without symptoms. This figure increases to 2.882 when there is a short wait of 5-15 minutes; 4.423 with a 15-30 minute wait; 15.5 with a 30-60 minute wait and 38 with a 1-2 hour wait. The ratio for 30-50 year-olds rages between 1.517 for no wait and 16 for a 1-2 hour wait. For over 50-year-olds, the ratio ranges between 1.708 and 11.333. Overall, there is a bias even for no waiting time at all, which increases steeply for long waiting times in all age groups. Note that the bias also varies by other observable characteristics, for example, for waiting times of two hours and more, it is 84% higher for men than for women. Also, the propensity bias is 50% higher for obese people than for people in the healthy range, which indicates that people at risk not only have higher propensity to test (as is to be expected) but also react stronger to symptoms.
The propensity to test bias translates to a biased virus prevalence estimate (β) which is also time varying. Crucially it depends on symptom prevalence, which, given the exponential spread of Covid-19, can change massively in a short period of time. This means that the estimate depends on symptom prevalence, but the bias itself also depends on it – so the bias is time variant.
Apart from waiting times, self-selecting into testing also depends on the cost associated with it (if applicable – costs can vary from time to monetary value, travel etc). We found that the bias is associated with willingness to pay for the test (Table A2 in Appendix A). Of those who won a test voucher, 83.8% swapped it for cash, as opposed to 48.9% of those who won the book voucher, indicating that the majority of subjects would not be willing to pay to receive a test. However, the scope of this article is to correct bias for free tests subject to different waiting times, and further experiments are needed to reach concrete conclusions on willingness to pay.
We have launched an online calculator that provides estimates on the testing bias (available at http://georgana.net/sotiris/task/atten/covid.php). The bias calculations that lead to the formula on which the calculator is based is provided in Appendix B. The estimates on the testing bias depend on (a) the percentage of tests yielding positive results; (b) the percentage of the general population that reports symptoms; (c) the relative likelihood of having Covid-19 for those with symptoms compared to those without symptoms; and (d) how more likely are people with symptoms to self-select into testing than those without symptoms. According to our methodology, it is possible to calculate these figures and thus estimate the bias. (a) is provided by the results of community testing; (b) is provided by surveying; (c) can be obtained by asking people a simple question before testing them for Covid-19; and (d) is provided by surveying.
A simple example is the following: Assume community testing led to 10% positive results, and 5% of the population reported symptoms. Without waiting time, if those with symptoms are 5 times more likely to have the virus than those without symptoms, then the results of community testing exaggerate by 27.71%, and the true prevalence in the population is 7.83% (instead of the reported 10%). At a 30-60 minute waiting time, the bias increases to 106.95%, meaning that the true prevalence in the population is 4.83%.
To further illustrate our results, Figure 1 depicts our best estimate of the virus prevalence bias, i.e. the ratio between reported prevalence and actual, depending on symptoms prevalence and waiting time, for the three age groups.
Based on these estimates, we can simulate how different demographic structures would affect the prevalence bias. In the following graph we depict the results from 3 million draws from the plausible parameter space (we assume symptoms prevalence of 5%, and allow the testing bias parameter to vary uniformly within the 95% confidence interval gained from the experiments in Greece) applied to three countries, with different demographic structures: Nigeria (with one of the youngest populations globally), Italy (heavily ageing population) and the USA (between the two extremes).
The simulation shows that demography matters: a young country like Nigeria could have a substantially higher prevalence bias than Italy. However, it is also clear that the waiting times are more important than demographics. Lowering waiting times would result in a low bias for all countries.
4. Discussion
Using a survey-based experiment, we found that the probability of taking a Covid-19 test for those who have symptoms (or believe they are more likely to have caught the virus) is many times higher than those who do not. Our results show that people who feel they have symptoms (or other reasons to suspect they are carrying the virus), are up to 38 times more likely to volunteer to get tested. In our sample, this testing propensity bias ranged from 1.5 times (for people under 30 years with no waiting time) to 38 times (for people under 30 and a 2-hour waiting time). The bias becomes larger with longer waiting times, and any cost associated with taking the test. Testing stations cannot readily correct this by oversampling (i.e. selecting people without symptoms to test).
Demographics also influence the testing propensity bias, which means that different areas (or countries) will have different biases depending on the age composition. Furthermore, there have been reports of very long waiting times in community testing, which greatly exacerbates the bias. It is important to note that the bias is time-variant, and depends strongly on the actual virus prevalence.
Our findings suggest that results from community testing sites are heavily biased, inflating actual prevalence by up to five times as opposed to conventional wisdom in the community that suggested the bias would be downward, if anything. Importantly, the prevalence bias goes beyond the issues of age group or location selection usually considered. Rather, it relates to self-selection into testing for those who (believe they) are more likely to have Covid-19. This makes the aggregate results of community testing unreliable, when it comes to drawing conclusions on the prevalence of Covid-19 in the population.
We recognise the importance of giving people the opportunity to test, as this identifies positive cases, thus allowing them to self-isolate and stop spreading the disease. If the goal of street testing is just to allow random people to have a quick and free test, then this possibly meets its goal. Note, however, that random testing is not efficient, economically or epidemiologically: subsidising tests specifically for populations with a high risk of getting infected and infecting others would probably save more lives at lower cost (say, tests for young people working in service industries and living with their parents). These questions remain open for future research.
What we have shown is that “random” voluntary testing is not really random. As such, it does not provide accurate information on disease prevalence, which is important to design and implement urgent policy responses to the pandemic, in terms of type, intensity and geographic area. Since voluntary testing is always biased, aggregate results on prevalence should be corrected. Debiasing can be performed using our methodology, as long as there are good estimates for four parameters, namely (a) the percentage of tests in the field yielding positive results; (b) the percentage of the general population that reports symptoms; (c) how more likely are people with symptoms to be carrying the virus than those without symptoms; and (d) how more likely are people with symptoms to self-select into testing than those without symptoms. Obtaining estimates for the above parameters is of varying difficulty: (a) is obtained in any country doing “random” street testing, (b) can be estimated with standard polling and (d) can be estimated with our experimental methodology. Estimating (c) would require asking subjects at testing stations to self-report their symptoms before testing.
We suggest a more economical and accurate alternative for prevalence estimation though. The important parameter to estimate is the probability of having covid-19 conditional on having symptoms, and on not having symptoms, similar to parameter (c) above. This can be done by asking a simple question at existing testing sites (indeed we have ongoing parallel work underway to obtain these estimates in cooperation with testing centres in Greece). These parameters could be country-specific and time-variant, but we do not expect differences to be large and changes to be fast, which means that obtaining one estimate in each virus season could suffice, and this estimate could be used for many similar countries. The next step is unusual and often misunderstood: poll a representative sample regularly to get symptoms prevalence. A common misunderstanding involves the argument that laymen cannot measure their symptoms properly. That is not a bug, but a feature of our procedure. Since the testing bias is based on self reported symptoms, we need to condition on subjects believing they have symptoms, not on actually having. Using both steps above can yield accurate prevalence estimates in real time at very low, comparatively, cost.
Our methodology is not limited to correcting the results of community testing. Confirmed cases reported daily are also biased, as some people might not test because of costs, or the inconvenience of going to a testing site, or even due to being afraid of losing income. According to our results, even at no monetary cost and no wait, 3.98% of people with symptoms would not get tested – which increases to 9.51% even for the slightest waiting time, rising even further when tests have a non-negligible cost to the citizen. Using polling results from a representative sample can correct this error.
This paper contributes to the larger literature on testing regimens.15 Mass testing, extending to a very large part of the population, is useful as it can provide more accurate figures, and also identifies positive cases. It has been used, among others, in Liverpool, Slovakia and South Korea.16-18 However, mass testing is very expensive, and might be infeasible, especially at frequent intervals, due to capacity and technical constraints. In those cases, obtaining unbiased estimates is of paramount importance for health and the economy. Underestimating disease prevalence can trigger inadequate measures and further spread of disease, while overestimating can be detrimental to economic activity. We thus urge policy makers to redesign “random” testing as a matter of priority in the effort to tackle the pandemic.
As a final note, our methodology is applicable to the prevalence measurement of any epidemic, when carriers have informative private information about their health status. Fighting disease is hard, even without the added complication of not knowing the location and magnitude of the fight. Our work offers tools to measure prevalence in real time. Further work is needed, to estimate specific selection-bias parameters for every disease, as they are necessarily related to the health burden and life expectancy reduction caused by the specific pathogen.
Data Availability
The data used in this study were collected via surveys. We can make them available upon reasonable request
Authors’ contributions
All authors contributed equally to the study.
Conflict of interest
The authors have nothing to disclose. All authors have completed the ICMJE conflict of interest form.
Ethics
This study has received ethics approval from City, University of London
Conflict of interest
None
Funding
None
Ethics approval
This study received ethics approval from the Economics Research Ethics Committee at City, University of London. Ethics approval was given on 9 December 2020. Code: ETH2021-0749.
Competing interests
Authors declare no competing interests.
Data and materials availability
The data used in this study were collected via surveys.
APPENDIX APPENDIX A
APPENDIX B Bias calculations for “Accurate Covid-19 prevalence measurement in the field”
The aim is to infer the percentage of sick people in the population from the “random” testing in the field figures, as released by several Health Agencies worldwide. The problem is that testing is voluntary, which leads to selection bias. How large is this bias?
To start, some people believe they have symptoms, some don’t: call them S(ymptomatic) and H(ealthy). Note that the discussion below has to do with what people believe, not what they actually have. Also, we distinguish between people believing they have symptoms and those who do not, but the analysis readily extends to people having strong beliefs that they might be carrying the virus and those who do not.
Let the frequency of people who believe they have symptoms be ps, or just p, with 1-p being the frequency of people who do not think they have symptoms.
Of each group, some percentage turns out having the virus. Let vs be the virus prevalence for those who believe they have symptoms, vh for those who do not.
Of each group, some percentage are willing to take the test (for a given waiting time to take the test).
Assume this only depends on symptoms, but not on actually having the virus (this assumption is mostly innocuous, unless there is a very large number of people in hospital). Let then ts be the percentage of people who believe they have symptoms who actually take the test, and th for those who do not.
True prevalence is then Given parameters, what number shows up positive in the sample (assuming that the test itself is perfect) Divide by the total sampling rate to get the sample prevalence (or virus frequency in the sample population) φ
Note that if ts=th=t, then π = t (ps vs + (1-ps) vh) and φ = t (ps vs + (1-ps) vh)/t = ps vs + (1-ps) vh =τ which makes sense; if testing propensities are equal, there is no bias.
If on the other hand the testing propensities t are not the same, then the sample is selected leading to bias. Before we calculate the bias, express the propensities to test and be virus positive, for the people who believe they have symptoms, as a multiple of the propensities of those who do not: vs = a vh, ts = b th.
Then, using these equations, rewrite (1), (2) and (3).
τ = ps vs + (1-ps) vh = a ps vh + (1-ps) vh = vh (ap+1-p) π = ps ts vs + (1-ps) th vh = ab p th vh +(1-p) th vh = th vh (abp + 1-p) m = ps ts + (1-ps) th = b ps th + (1-ps) th = th (bp+1-p)Simlify notation by writing p for ps and calculate φ=π/m= th vh (abp + 1-p) / th (bp+1-p) = vh (abp + 1-p) / (bp+1-p)
Now, to find the size of the bias, divide φ/τ
vh (abp + 1-p) / (bp+1-p) / vh (ap+1-p) → The bias in estimates β= (abp + 1-p)/ (bp +1-p)/ (ap + 1 – p)Examples
Suppose a=1
(bp+1-p) /(bp1-p))/(a p + 1 – p)= 1/(p+1–p)=1So, both a and b are necessary for the bias to exist, which makes sense.
Suppose a=b>1
(conceptually it is not unlikely that the two propensities be of similar magnitude, since the higher the risk when I have symptoms, the more likely it should be that I seek testing)
The bias is then (a2 p + 1-p)/ (ap +1-p)2
Let a=b=3
9 p+1-p / (3p + 1 – p)2In this case, the p leading to the worst bias is around 0.3, β becomes 1.35.
Now suppose a=b=10
100p+1-p / (10p+1p)2 β=3 at p 0.1, αt p 0.05 it still is 2.7.Suppose p=0.1
Then β=(0.1ab +0.9) / (0.1b+0.9) /(0.1a+0.9)
This function is plotted in the next graph, with a in the x axis, and b in the y axis.
Meaning for a=b=20, street testing is overestimating the virus prevalence about 5 times.
Getting ps from φ
While we suggest to get ps through random (unbiased) polling of people about their perceived symptoms, it is also possible to calculate it using a, b, vh and φ as follows.
Start with the definition of φ=vh (abp + 1-p) / (bp+1-p)
→ φ(bp±-p)=v(abp + 1-p) → φbp+φ-φp= vabp+v-pv → φbp-φp+pv-vabp =v-φ → p(φb-φ+v-vab)=(v-φ) → p=(v-φ)/(φb-φ+v-vab)Note:
The denominator is negative
φb-φ+v-vab<0 →> φ(b-1)>v(1-ab) (since v(1-ab) is negative) →> φ>v(1-ab)/(b-1)Which is true since b-1 is positive.
For v<φ the numerator is also negative, meaning p is positive.
If φ= vh then symptoms prevalence is 0, all people in the sample have no symptoms, and vh show positive in the test.
If φ= vs =avh then (v-av)/(avb-av+v-vab)=1, meaning everyone has symptoms, p=1.
Obviously φ cannot be above vs (sample prevalence is highest if you only have people with symptoms in the sample, in which case not more than vs can be positive)!
So we can debias the health agencies’ numbers without knowing ps
Again, it is easier not to do street testing, but to use vh and vs and poll about p.
Examples
Suppose a=10, b=3 and vh=0.01
→ p=(0.01-φ)/(3φ-φ+0.01-0.3) = (0.01-φ)/(2φ-0.29)So, for example φ=0.1 yields p=-0.09/-0.09=1.
This makes sense. Everyone had symptoms, and vs=0.1 means that 10% had the virus, which is the proportion you will find in any sample. The interpretation is that with such a low true virus prevalence, the only way to get a relatively high φ is if there are only symptomatic people.
Suppose a=10, b=3 and vh=0.1
P=(0.1-φ)/(3φ-φ+0.1-3)= (0.1-φ)/(2φ-2.9)So, in this case, p is about half φ for many values.
Now, let a=3 b=10 and vh=0.1
The effect of a and b is not symmetric.
p=(0.1-φ)/(10φ-φ-2.9)Suppose that some national agency is asking about (perceived) symptoms before testing. It is then easier to find the symptom prevalence in the general population ps
Symptom prevalence in the test would be
χ= pts / (pts + (1-p)th)= bpth / (bpth + (1-p)th) => χ=bp/(bp + 1-p) =>χ = bp/(bp +1-p)So true symptom prevalence is For a relatively low b=3