Monitoring populations at increased risk for SARS-CoV-2 infection in the community ================================================================================== * Emma Pritchard * Joel Jones * Karina Vihta * Nicole Stoesser * Philippa C. Matthews * David W. Eyre * Thomas House * John I Bell * John N Newton * Jeremy Farrar * Derrick Crook * Susan Hopkins * Duncan Cook * Emma Rourke * Ruth Studley * Ian Diamond * Tim Peto * Koen B. Pouwels * A. Sarah Walker * the COVID-19 Infection Survey Team ## Abstract **Background** The COVID-19 pandemic is rapidly evolving, with emerging variants and fluctuating control policies. Real-time population screening and identification of groups in whom positivity is highest could help monitor spread and inform public health messaging and strategy. **Methods** To develop a real-time screening process, we included results from nose and throat swabs and questionnaires taken 19 July 2020-17 July 2021 in the UK’s national COVID-19 Infection Survey. Fortnightly, associations between SARS-CoV-2 positivity and 60 demographic and behavioural characteristics were estimated using logistic regression models adjusted for potential confounders, considering multiple testing, collinearity, and reverse causality. **Findings** Of 4,091,537 RT-PCR results from 482,677 individuals, 29,903 (0·73%) were positive. As positivity rose September-November 2020, rates were independently higher in younger ages, and those living in Northern England, major urban conurbations, more deprived areas, and larger households. Rates were also higher in those returning from abroad, and working in healthcare or outside of home. When positivity peaked December 2020-January 2021 (Alpha), high positivity shifted to southern geographical regions. With national vaccine roll-out from December 2020, positivity reduced in vaccinated individuals. Associations attenuated as rates decreased between February-May 2021. Rising positivity rates in June-July 2021 (Delta) were independently higher in younger, male, and unvaccinated groups. Few factors were consistently associated with positivity. 25/45 (56%) confirmed associations would have been detected later using 28-day rather than 14-day periods. **Interpretation** Population-level demographic and behavioural surveillance can be a valuable tool in identifying the varying characteristics driving current SARS-CoV-2 positivity, allowing monitoring to inform public health policy. **Funding** Department of Health and Social Care (UK), Welsh Government, Department of Health (on behalf of the Northern Ireland Government), Scottish Government, National Institute for Health Research. ## Introduction To 31st August 2021, there have been over 216·3 million SARS-CoV-2 cases worldwide.1 Disparities in COVID-19 risk and outcomes based on demographics and behaviours have been described in the UK2, 3 and globally,4, 5 but emerging variants6 coupled with varying control policies, including differential vaccine roll-out programmes, reinforce the need to monitor characteristics of individuals “at increased risk” for SARS-CoV-2 infection continuously. For example, identifying groups in whom newly identified variants of concern are spreading in the community may be vital in preventing widespread transmission. In England, since 26th March 2020, there have been three national lockdowns, a tiered system7 with varying restrictions in smaller geographical areas, and various other restrictions between these,8 all affecting behaviour and risk of acquiring and spreading SARS-CoV-2. Finding societal factors or specific behaviours where these restrictions are less effective may aid policy development. With restrictions being relaxed in many countries, rapidly identifying groups where positivity is rising in real-time can help monitor spread and target advice. High-quality surveillance is challenging, particularly given the large proportion of asymptomatic SARS-CoV-2-infected individuals,9 with a balance between missing important but potentially imprecisely estimated signals (false-negatives) and noise (false-positives). With large datasets containing many potential risk factors, multiple testing is inevitably problematic,10 but standard approaches to building regression models restricting to smaller numbers of hypothesised associated factors risks missing true signals with a rapidly evolving pathogen and societal responses. The cumulative effect of missing data across many risk factors can mean substantial proportions of the original sample are excluded from penalised regression or backwards elimination, losing power,11 and risking bias if missingness depends on outcome.12 A method allowing numerous variable parametrisations of many individual variables would therefore be useful, provided collinearity and confounding can be avoided.13 Using the Office for National Statistics (ONS) COVID-19 Infection Survey, a large community-based surveillance study, we therefore developed a process to monitor groups with highest SARS-CoV-2 positivity week by week. ## Methods ### Study design The ONS COVID-19 Infection Survey is a large household survey with longitudinal follow-up ([ISRCTN21086382](http://medrxiv.org/external-ref?link_type=ISRCTN&access_num=ISRCTN21086382); [https://www.ndm.ox.ac.uk/covid-19/covid-19-infection-survey/protocol-and-information-sheets](https://www.ndm.ox.ac.uk/covid-19/covid-19-infection-survey/protocol-and-information-sheets)). Private households are randomly selected on a continuous basis from address lists and previous surveys to provide a representative sample across the UK. Following verbal consent, a study worker visited each household to take written informed consent for individuals aged ≥2 years (from parents/carers for those 2–15 years; those 10–15 years also provided written assent). The study received ethical approval from the South Central Berkshire B Research Ethics Committee (20/SC/0195). Participants were asked about demographics, behaviours, work, and vaccination uptake ([https://www.ndm.ox.ac.uk/covid-19/covid-19-infection-survey/case-record-forms](https://www.ndm.ox.ac.uk/covid-19/covid-19-infection-survey/case-record-forms)). At the first visit, participants were asked for consent for optional follow-up visits every week for the next month, then monthly thereafter. At each visit, participants provided a nose and throat self-swab. ### Inclusion/exclusion criteria This analysis included visits from 19th July 2020-17th July 2021 with a positive or negative swab result, including one visit per participant within each discrete fortnight in this period, namely the first test-positive visit, otherwise the last (negative) visit. This mimics repeated point-prevalence surveys, similar to the English Real-time Assessment of Community Transmission (REACT) study.14 ### Outcome and exposures The outcome was any SARS-CoV-2 PCR-positive swab in each fortnight. For exposures, we identified eight non-missing key potential confounders (“core” variables): sex, ethnicity (white vs non-white as relatively small numbers in the latter), age (years), geographical region (12 levels; 9 English regions and 3 devolved administrations: Wales, Scotland, Northern Ireland), rural/urban classification (major urban area, urban town/city, rural town, and rural village), deprivation percentile (derived separately for each country15–18), household size, and whether the household was multigenerational (details in **Supplementary Methods**). We next defined 60 non-core “screening” variables that could dynamically identify those at increased risk of testing positive (**Supplementary Table 1**), from questions detailing participant’s current work/school status, including ability to social distance and patient-facing healthcare/social-care roles, current health status including COVID-19 vaccination and smoking, household and living environment, and contacts including with care homes, hospitals, and confirmed COVID-19 cases. Although participants are tested predominantly monthly, most behavioural questions relate to the last 7 days. As some participants already know/think they have COVID-19 (from symptoms or testing outside the study) this could affect behaviours reported immediately before study tests, leading to reverse causality. The screening variables were therefore grouped into those most plausibly preceding any current infection (47 variables), or potentially modified through knowledge of recent prior infection (13 variables, including social/physical contacts, frequency of shopping and/or socialising, time spent in others homes/other people spent in participants’ homes; **Supplementary Table 1B**). For the latter, rather than the self-report at the included visit, we considered the maximum reported value across all visits in the preceding 35 days, excluding the included visit, and included only participants with at least one negative visit in the preceding 10-35 days. ### Statistical analysis Within each fortnight, associations with the eight “core” characteristics were estimated using logistic regression (numbers included per fortnight in **Supplementary Table 2**). These characteristics were included in all subsequent models regardless of statistical significance. For geographic region, South West England was the reference as this had the lowest SARS-CoV-2 positivity across the study, facilitating identification of where infections were increasing. Given the large number of effect estimates over the 52-week study period (e.g. shown for urban/rural classification in **Supplementary Figure 1**), we summarised the importance of each characteristic over time using two properties simultaneously: 1) global (Wald) p-value and 2) overall effect size, the standard error-weighted mean effect estimate setting the reference to the level with lowest positivity in each fortnight19: ![Formula][1] To incorporate non-linear effects, a restricted natural cubic spline was used for age (details in **Supplementary Methods**); the overall effect size combined estimates at ages 10, 25, 40, 55 vs 70 years (reference category) as above. We tested interactions between the eight core variables individually in fortnights where positivity was >0·5% (arbitrary threshold to avoid small numbers), conducting backwards elimination on all with individual global heterogeneity p-value<0·001 (Bonferroni adjustment, 0·05/26 (number of interaction tests)), creating the “core model” (details in **Supplementary Methods**). An overall effect size was calculated for interactions as above, but taking the absolute coefficient values. Given missing data (**Supplementary Table 1**), we used forward selection to retain as many participants as possible when screening each non-core characteristic, first adding each of the 47 “screening” variables individually to the “core model”, thus estimating the total effects not explained by core characteristics. For all work-related variables, work status was included regardless of significance so that effects reflected additional effects of the characteristic for those currently employed and working. To monitor multiple testing, we plotted observed p-values (global per variable and individual level vs reference) against expected p-values assuming no difference (randomly distributed between 0 and 1 given the number of tests), creating a Q-Q plot, including 0·05, Bonferroni and Benjamini-Hochberg adjusted p-values (0·05/tests) as references. As the goal was to identify signals of “at-risk” populations, we included all characteristics with either global p<0·05 or any level with p<0·001 vs reference, and then used backward elimination (exit p=0·05) to identify a final “main model”. We used a similar process on the behavioural variables, also adjusting for variables identified from the main screen, regardless of significance. We categorised screening variables into five broad groups dependent on persistence of effects (details in **Supplementary Methods**). ### Sensitivity Analyses To assess the impact of small numbers of positives in some fortnights on power, we repeated the process using 28-day periods. Given logistic regression can have higher bias and variability with low rates, and hence lose accuracy and precision,20 we also compared the core variables effect estimates with those from ridge regression (see **Supplementary Results**). ### Role of the funding source The funder had no role in study design, data collection, data analysis, data interpretation, or writing of the report. All authors had access to all data reported in the study and accept responsibility for the decision to submit for publication. ## Results Analyses included 4,091,537 RT-PCR results from nose and throat swabs from 482,677 individuals in 240,490 households from 19th July 2020-17th July 2021. 29,903 (0·7%) swabs were positive. Overall, the median (IQR) age was 52 years (33-66), 300,208 (7%) visits occurred in those reporting non- white ethnicity, 2,165,833 (53%) in females, 1,463,624 (36%) in major urban areas and 1,746,530 (43%) in urban cities/towns, most (1,735,618, 42%) in two-person households, and with a median deprivation percentile of 60 (34-81) (1=most deprived, 100=least deprived) (**Table 1**; screened variables in **Supplementary Table 1A,1B**). The highest positivity was 1·9% (95% CI 1·9-2·0%) 20th December-2nd January 2020, and the lowest 0·05% (0·03-0·08%) 2nd-15th August 2020 (**Supplementary Figure 2A**). Numbers within each fortnight increased as the study expanded from August-October 2020,21 from 32,184 participants 19th July-1st August 2020 to a median 173,054 (IQR 168,171-195,031) from 27th September 2020 onwards (**Supplementary Figure 3**). View this table: [Table 1:](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/T1) Table 1: Characteristics of the core variables for visits included in analysis ### Core model From 19th July-1st August 2020, we found no evidence that any core variable was associated with positivity, potentially related to power given both low positivity (0·08% [95% CI 0·06-0·12%]) and sample size (32,184 swabs, 27 positive). The first characteristic associated with positivity was ethnicity, the only characteristic associated with positivity in the fortnights between 2nd-29th August 2020 (**Figure 1A**), with 3·3 (1·1-10·0; p-value=0·034) and 3·5 (1·5-7·9; p-value=0·003) higher odds of positivity in those of non-white ethnicity, respectively. ![Figure 1A:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/09/05/2021.09.02.21263017/F1.medium.gif) [Figure 1A:](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/F1) Figure 1A: **Overall effects of the 8 core variables across the 52 week study period** As positivity began to increase early September 2020, geographical region, rural/urban classification, and household size became independently associated with positivity, with odds of positivity highest in Wales, Northern Ireland, and northern English regions, in more urban areas, and those living in larger households (**Figure 1B**). For most subsequent fortnights, evidence of higher positivity persisted in participants living in more urban areas, and larger households. ![Figure 1B:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/09/05/2021.09.02.21263017/F2.medium.gif) [Figure 1B:](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/F2) Figure 1B: **Effects of the individual levels of the 8 core variables across the 52 week study period.** As positivity rates rose further through October 2020, age and deprivation became associated with positivity, with rates highest in those 16-30y, and living in more deprived areas. Positivity was also heavily concentrated in northern and then midland English regions until 21st November 2020. From 22nd November, positivity increased overall, particularly in southern England, with higher odds of positivity in London, East, and South East England, reflecting the rise of the Alpha variant.22 Age remained strongly associated with positivity, but with less excess risk at younger ages, and instead decreased odds of positivity in those over 60y (**Figure 1B, Figure 2**). This lower risk in older individuals persisted for most subsequent fortnights. During February-May 2021, as positivity decreased, associations between positivity and age, region, and deprivation persisted, but their strength attenuated. As positivity rose during 17th May-17th July 2021, reflecting the rise of the Delta variant23 and major sporting events, sex was associated with positivity in two consecutive fortnights for the first time in the study, with higher odds in males compared with females. Age again became strongly associated, with a large peak in those aged 16-30y (**Figure 2**). ![Figure 2:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/09/05/2021.09.02.21263017/F3.medium.gif) [Figure 2:](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/F3) Figure 2: **Adjusted effect of age (years) on positivity over the 52 week study period.** Few interactions between core variables were significant at the p=0·001 threshold, with no evidence of the same significant interactions in any consecutive fortnight (**Supplementary Figure 4**). For model comparability, none were therefore included in any fortnight for screening other variables. ### Screening process As positivity increased, the screening process identified more variables and at a greater significance than expected by chance (**Figure 3**; **Figure 4A**). Contact with anyone who had recently had COVID- 19, currently self-isolating and thinking one had had COVID-19 recently, strongly and consistently predicted higher positivity. As these characteristics are potential mediators of effects of other factors, they were not considered further. ![Figure 3:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/09/05/2021.09.02.21263017/F4.medium.gif) [Figure 3:](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/F4) Figure 3: **Global heterogeneity p-values per factor from the screening process over 4 specific fortnights** ![Figure 4A:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/09/05/2021.09.02.21263017/F5.medium.gif) [Figure 4A:](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/F5) Figure 4A: **Overall effects of additional factors from the screening process, adjusted for the core variables, over the 52 week study period** Work and employment were significantly associated with positivity throughout the study. Initially from 2nd August-12th September 2020, there was independently higher positivity for those working in care/nursing homes or patient-facing healthcare roles (**Figure 4A**). This effect returned from 25th October onwards, along with increased odds in those reporting working in healthcare sectors and specifically in person-facing social-care roles. From 25th October 2020-27th March 2021, we consistently observed higher positivity in those working outside compared with from home, with risk increasing as social distancing in the workplace became more difficult. Increased risk was also associated with all modes of travel to work (foot/bike, car/taxi, train/bus), compared with those not travelling to work (**Figure 4B**), with highest odds for car/taxi, then train/bus then foot/bike. Higher positivity was also observed in the teaching work sector during October/November 2020, while those working in IT had consistently lower odds (**Figure 4A**). ![Figure 4B](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/09/05/2021.09.02.21263017/F6.medium.gif) [Figure 4B](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/F6) Figure 4B **Effects of individual levels of factors from the screening process, adjusted for the core variables, over the 52 week study period Health status** From 16th August-7th November 2020, positivity was consistently higher in those who had travelled abroad in the last 28 days. This effect returned during 28th March-12th April 2021 and 9th-22nd May 2021. Contact with hospital and care homes increased odds of positivity, particularly from 3rd January-27th February 2021, when positivity rates were very high due to Alpha. From 27th September 2020-27th February 2021 (when positivity was consistently >0·3%), participants were more likely to test positive on enrolment visits (**Figure 4B)**, most likely reflecting identification of longer-term PCR-positives at these visits. Health-related variables varied in importance. Notably, there was no evidence of association between long-term health conditions and positivity. From 13th September 2020-13th March 2021, we consistently saw lower positivity in those who smoked tobacco products, compared with non- smokers. From 20th December 2020, we observed a very strong effect of COVID-19 vaccination, with lower positivity in those vaccinated, compared with unvaccinated (**Figure 4B**). Deprivation components and living environment characteristics (available only for England) had little impact on positivity after adjusting for overall deprivation index and household size from the core model, likely due to high correlations between individual components with overall deprivation (**Supplementary Table 3; Supplementary Figure 5; Supplementary Results**). Independently to the core model, we observed higher odds of positivity with increased social and physical contacts during periods when rates were high (**Figure 5; Supplementary Figure 6**). After also adjusting for variables identified from the main screening process and after backwards elimination, we observed higher odds of positivity with higher numbers of physical contacts with 18- 69 year olds between 20th December 2020-13th February 2021, and with higher numbers of physical contacts with those <18y between 14th February 2021-27th March 2021. As lockdown restrictions eased and Delta became prominent during 20th June 2021-17th July 2021, odds of positivity were higher in those with increasing time socialising outside home. ![Figure 5:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/09/05/2021.09.02.21263017/F7.medium.gif) [Figure 5:](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/F7) Figure 5: **Adjusted effects of behavioural variables from the screening process** After backwards elimination, of the 71 variables screened (47 in the main screen, 13 variables in the behavioural screen with 24 parameterisations across the latter), two (3%) effects were persistent, 13 (18%) had effects which came and went, nine (13%) had effects isolated to only two consecutive fortnights, 30 (42%) were associated inconsistently in fortnights, and 17 (24%) were never associated. ### Sensitivity analysis Similar key predictors of positivity were obtained using 28-day periods in the core model (**Supplementary Figures 7A,7B**,**8**). Notably, we saw a more consistent signal of higher positivity in non-white ethnicities from 11th October 2020-27th March 2021 (**Supplementary Figure 7A**), while this signal was more intermittent using fortnights (**Figure 1A**). We again did not see the same significant interactions in any consecutive 28-day periods (**Supplementary Figure 9A**). After backwards elimination, six interactions remained significant over five isolated 28-day periods (**Supplementary Figure 9B-G**). Three of these included household size, with a general pattern of stronger effects as household size increased in groups with higher positivity e.g. in younger ages (13th September-10th October 2020), non-white ethnicities (11th October-7th November 2020), and higher prevalence regions (6th December 2020-2nd January 2021). From 31st January-27th February 2021, compared with those living in non-multigenerational households, those of non-white ethnicities living in multigenerational households had increased odds of positivity, while those of white ethnicities had decreased odds. Similar key associations were also identified from the screening process (**Supplementary Figure 10A, 10B**). Of the 45 consecutive occurrences of effects with p<0·05 in fortnights, 25 (56%) would have been detected later in 28-day periods, 14 (31%) at the same time, five (11%) earlier, and one (2%) never detected (**Supplementary Table 4**). ## Discussion Over one year from 19th July 2020-17th July 2021, we estimated and summarised the key predictors of SARS-CoV-2 positivity in the UK, using a method designed to be run weekly in real-time to provide up-to-date information on changes in populations at increased risk. In the first fortnight from 19th July-1st August 2020, we had no evidence that any characteristic impacted positivity. As positivity rose through September-November 2020, they were independently higher in those of younger ages, living in Northern areas of England, in major urban conurbations, in more deprived areas, and in larger households. Additionally, rates were higher in those who had recently travelled abroad, worked in healthcare roles, or worked outside of home. As positivity peaked December 2020-January 2021, while we still observed strong effects of living in urban areas and large households, there was a major shift in high positivity to more southern geographical regions (reflecting the emergence of Alpha), with risk no longer concentrated in younger ages. Those working outside of home and in healthcare roles still had higher risk. As the national vaccine programme rolled out from December 2020, we saw large reductions in positivity in vaccinated individuals. From February-May 2021 as rates decreased, the impact of work on positivity decreased, while the effect of vaccination remained. As the Delta variant became prominent and positivity rates rose mid-May through July 2021, we observed higher odds of positivity in younger ages, in men, and in those not yet vaccinated. The screening process demonstrated here has several limitations. First, low event numbers and smaller sample sizes reduce statistical power, reducing the chance of detecting true associations (false- negatives) and increasing the likelihood that the magnitude of “true” effects are inflated (false- positives).24 Increased statistical power using 28-day periods rather than fortnights more consistently detected associations with ethnicity in the core model and found more evidence of interactions. The screening process, however, detected the same characteristics using both time-periods, with earlier detection in most cases using fortnights. As there were no major differences and we aimed to identify associations most relevant to current positivity, the benefit of more regular estimates may outweigh the power gained from evaluating longer time-frames, although this will depend on event numbers. When events numbers are low, logistic regression can be biased and/or imprecise.25, 26 Sensitivity analyses using penalised regression techniques showed most coefficients were within the logistic regression confidence intervals, suggesting that, while there was some attenuation of estimates, for example for geographical regions in a few fortnights, the logistic regression models were not substantially overfitting. Multiple testing is an unavoidable limitation of our screening process. Doing many multiple independent tests increases the risk of false-positives;27 however, a priori the questionnaire was based on potential risk factors so the “correct” degree of adjustment is unclear. We therefore used Q-Q plots with Bonferroni and Benjamini-Hochberg adjustments to monitor the potential for false-positives, rather than as strict thresholds.28, 29 Even using stricter Bonferroni criteria, many screening variables were associated with positivity. Considering sex as a “negative control” (no effect expected), we only found an association in one of 24 fortnights before 20th June 2021. The consistent association between sex and positivity from 20th June-17th July 2021 coincided with the European Football Championship, thus plausibly reflecting changes in social behaviour by sex, as observed elsewhere.30 Our results suggest more emphasis should be placed on effects that appear at least twice, interpreting effects that are inconsistent or appear sporadically with caution. The underpinning design, namely a large community-based survey including randomly selected private households, is a major study strength. Participants being regularly asked about behaviours, work, and health status provided a rich opportunity to identify associations between positivity and many important demographic and behavioural characteristics. As participants were tested regardless of symptoms, characteristics could be assessed in an unbiased population, thus avoiding selection bias through only observing those choosing to take a COVID-19 test, for example, in the England national testing programme31 or through presenting to hospital with severe disease. The study design also had limitations, particularly with individuals tested initially at weekly and then monthly visits. As fragments of virus can be detectable in the respiratory tract long after onset of infection, positives included in our outcome include both new infections and lingering PCR-positivity. Associations from the screening process may therefore not necessarily be related to new infections. Whilst we could have grouped positive tests into “episodes”, for example, considering only the first positive in 90-day periods,32 we chose to mirror other point-prevalence studies, such as REACT,14 also expecting that many characteristics would be reasonably stable over time and therefore even associations with ongoing PCR-positivity could still be relevant to the original infection. This may however dilute effects if participants with long carriage have different characteristics to those testing positive with new infections. Ongoing PCR-positivity may also reduce sensitivity to detect specific “at-risk” populations as new variants emerge. In conclusion, the screening process presented could be a valuable tool in understanding the characteristics driving current SARS-CoV-2 positivity, allowing us to provide enhanced up-to-date understanding of the pandemic across the UK. Looking forward, this could be used to target public health messages to detected groups to increased uptake of symptomatic and asymptomatic testing. We are using this method weekly to monitor the third wave of COVID-19 in the UK. ## Data Availability De-identified study data are available for access by accredited researchers in the ONS Secure Research Service (SRS) for accredited research purposes under part 5, chapter 5 of the Digital Economy Act 2017. ## Main Figures and Tables ![Figure8](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/09/05/2021.09.02.21263017/F8.medium.gif) [Figure8](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/F8) Work and employment ![Figure9](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/09/05/2021.09.02.21263017/F9.medium.gif) [Figure9](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/F9) Contacts ## SUPPLEMENTARY MATERIAL ### Supplementary Methods #### Laboratory testing Swabs were couriered directly to the United Kingdom’s national Lighthouse laboratories (Glasgow (from 16 August 2020 onward) and the National Biocentre in Milton Keynes (from 26 April 2020 to 8 February 2021)) where samples were tested within the national testing program using identical methodology. The presence of three SARS-CoV-2 genes (ORF1ab and the genes transcribing nucleocapsid protein (N) and spike protein (S)) was identified using RT-PCR with the TaqPath RT-PCR COVID-19 kit (Thermo Fisher Scientific), analyzed using UgenTec FastFinder 3.300.5 (TaqMan 2019-nCoV Assay Kit V2 UK NHS ABI 7500 v2.1; UgenTec). The assay plugin contained an assay-specific algorithm and decision mechanism allowing conversion of the qualitative amplification assay raw data into test results with little manual intervention. Samples were called positive if either N or ORF1ab, or both, were detected. The S gene alone was not considered a reliable positive but could accompany other genes (that is, one, two or three gene positives). ### Variable and model specifications #### Deprivation Deprivation was assessed using the index of multiple deprivation (IMD) in England, a score based on lower layer super output areas with average population of 1500 people and incorporating seven domains to produce an overall relative measure of deprivation (income, employment, education, skills and training, health and disability, crime, barriers to housing services and living environment) ([https://www.gov.uk/government/statistics/english-indices-of-deprivation-2019](https://www.gov.uk/government/statistics/english-indices-of-deprivation-2019)). These sub-components were also assessed in the variable screening process, restricted to England. Equivalent scores were used in the other three countries comprising the UK1–4. Each country’s scores were converted to a within country percentile. #### Age Age was including in the model as a natural cubic spline with 4 internal knots at 20, 40, 60, 80th percentiles of unique ages, and boundary knots at 5th and 95th percentiles. #### Vaccination status Participants were asked about their vaccination status at visits, including the type, number of doses and date(s). Participants from England were also linked to administrative records from the National Immunisation Management Service (NIMS). We used records from NIMS where available. Otherwise, we used records from the survey, since linkage was periodic and NIMS does not contain information about vaccinations received abroad or in Northern Ireland, Scotland and Wales. Where records were available from both NIMS and the survey, agreement on type was 98% and agreement on dates was 95% within ±7 days. #### Interactions Interactions between household size and multigenerational households, and region and rural/urban classification were not considered as, by definition, all those living in multigenerational households had a household size of 3 or more, and not all regions included major urban conurbations. #### Face covering variables Prior to 18th February, participants in the study were asked the following question regarding face coverings: “*Do you mainly wear any kind of face covering or mask when you are outside your home, because of COVID-19?*” with the options: * – “No”, * – “Yes, at work/school only”, * – “Yes in other situations only (including public transport, shops)”, * – “Yes, usually both at work/school and in other situations” * – “My face is already covered for other reasons (e.g. religious or cultural reasons)” As of 18th February this question was retired, and participants were instead asked the two following questions about face coverings: “*Do you wear any kind of face covering or mask when you are at work/your place of education, because of COVID19?”*, and “*Do you wear any kind of face covering or mask when you are in other enclosed public spaces, such as shops, or using public transport, because of COVID-19?*”, with the first options being either “Not going to place of work or education”, or “Not going to place of work or education”. This question caused similar issues with reverse causality as other behavioural questions, and hence these new questions were including in our behavioural screen, while the former question was included in the main screen. #### Approximate categorisation of variable effects We classified effects from each variable in both the core and screening model using the following broad categorisation: * **Never:** The effect is never significant at a p<0·05 threshold in any fortnight * **Inconsistent**: The variable is significant at a p<0·05 threshold in at least one fortnight, but never in with an odds ratio in a consistent direction in any consecutive fortnights * **Isolated**: The variable is significant at a p<0·05 threshold in two consecutive fortnight at most once, and “never consecutive” at all other times * **Comes/goes**: The variable is significant at a p<0·05 threshold in three or more consecutive fortnights, or two consecutive fortnights at least twice, and is not significant with a gap of at least three fortnights, or two gaps of two fortnights, if the effect appears again. * **Persistent**: The variable is significant at a p<0·05 threshold for the entire period after the first significant fortnight, with no more than one gap of two fortnights separating consistency of the effect. ## Supplementary Results While the deprivation score component reflecting education was consistently associated with positivity, as this effect was in the same direction as the main deprivation score in the core model, and was only available in England, it was not considered further (**Supplementary Figure 4)**. ### Ridge regression We found 43 of the 692 (6%) coefficients from the core models produced from ridge regression did not fall within the 95% of the equivalent coefficients obtained from logistic regression (**Supplementary Figure 10**). Of these, the majority (38 coefficients; 88%) were effects of geographical region. These were mostly in the first fortnights of the study period when event rates and sample size was smallest, and also during December 2020, where we observed strong regional effects due to the rise of the Alpha variant in the Southern regions of England. Many of the inconsistencies within geographical region occurred within the same fortnight i.e. either none or all of the effect estimates for geographic regions were within the confidence intervals. The differences observed between coefficients in December 2020 while the Alpha variant was rising suggest that the ridge regression penalised early signal for the regional effect, while logistic regression models picked this up. While often challenging to distinguish between signal and noise, through triangulation with other data sources, the regional effects observed in logistic regression model were accurate and representative of rises in Alpha variant in London and the South East, while ridge regression missed this effect, hence justifying our choice of method. ## Supplementary Tables View this table: [Supplementary Table 1A:](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/T2) Supplementary Table 1A: Characteristics of screening variables for visits included in the main screening process View this table: [Supplementary Table 1B](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/T3) Supplementary Table 1B Characteristics of screening variables for visits included in the behaviour screening process (B) View this table: [Supplementary Table 2:](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/T4) Supplementary Table 2: Count in each fortnight, including number not included in core model View this table: [Supplementary Table 3:](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/T5) Supplementary Table 3: Summary of individuals IMD components with combined index View this table: [Supplementary Table 4:](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/T6) Supplementary Table 4: Summary of p-values in 28-day periods for effects which occur in 2 or more consecutive fortnights ## Supplementary Figures ![Supplementary Figure 1:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/09/05/2021.09.02.21263017/F10.medium.gif) [Supplementary Figure 1:](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/F10) Supplementary Figure 1: Log odds ratios with 95% confidence intervals for the effect of rural urban classification across the 52 week study period ![Supplementary Figure 2:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/09/05/2021.09.02.21263017/F11.medium.gif) [Supplementary Figure 2:](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/F11) Supplementary Figure 2: Unadjusted percentage (95% CI) of positive swabs per fortnight (A), and positive swabs split by gene positivity pattern (B) ![Supplementary Figure 3:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/09/05/2021.09.02.21263017/F12.medium.gif) [Supplementary Figure 3:](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/F12) Supplementary Figure 3: Total number of participants per fortnight ![Supplementary Figure 4:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/09/05/2021.09.02.21263017/F13.medium.gif) [Supplementary Figure 4:](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/F13) Supplementary Figure 4: Summary of odds ratio and p-values for interactions between all of the core variables using fortnights. ![Supplementary Figure 5:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/09/05/2021.09.02.21263017/F14.medium.gif) [Supplementary Figure 5:](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/F14) Supplementary Figure 5: Global hetergeneity p-values per factor from the screening process for household and living enviroment characteristics ![Supplementary Figure 6:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/09/05/2021.09.02.21263017/F15.medium.gif) [Supplementary Figure 6:](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/F15) Supplementary Figure 6: Individual p-values per factor from the screening process for screening characteristics ![Supplementary Figure 7A:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/09/05/2021.09.02.21263017/F16.medium.gif) [Supplementary Figure 7A:](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/F16) Supplementary Figure 7A: Summary of odds ratios and p-values for the 8 core variables over 28 day periods ![Supplementary Figure 7B:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/09/05/2021.09.02.21263017/F17.medium.gif) [Supplementary Figure 7B:](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/F17) Supplementary Figure 7B: Summary of odds ratios and p-values for the individual levels of the 8 core variables over 28 day periods ![Supplementary Figure 8:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/09/05/2021.09.02.21263017/F18.medium.gif) [Supplementary Figure 8:](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/F18) Supplementary Figure 8: Adjusted effect of age (years) on positivity using 28-day periods. ![Supplementary Figure 9A:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/09/05/2021.09.02.21263017/F19.medium.gif) [Supplementary Figure 9A:](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/F19) Supplementary Figure 9A: Summary of odds ratio and p-values for interactions between all of the core variables for 28 day periods. ![Figure 9B:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/09/05/2021.09.02.21263017/F20.medium.gif) [Figure 9B:](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/F20) Figure 9B: Effect of interaction of age by household size in the 28-day period 13 September to 10th October ![Figure 9C:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/09/05/2021.09.02.21263017/F21.medium.gif) [Figure 9C:](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/F21) Figure 9C: Effect of interaction of ethnicity by household size in the 28-day period 11th October 2020 to 7th November 2020 ![Figure 9D:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/09/05/2021.09.02.21263017/F22.medium.gif) [Figure 9D:](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/F22) Figure 9D: Effect of interaction of region by deprivation score in the 28-day period 8th November 2020 to 5th December 2020 ![Figure 9E:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/09/05/2021.09.02.21263017/F23.medium.gif) [Figure 9E:](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/F23) Figure 9E: Effect of interaction of rural urban classification by age in the 28-day period 6th December 2020 to 2nd January 2021 ![Figure 9F:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/09/05/2021.09.02.21263017/F24.medium.gif) [Figure 9F:](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/F24) Figure 9F: Effect of interaction of region by household size in the 28-day period 6th December 2020 to 2nd January 2021 ![Figure 9G:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/09/05/2021.09.02.21263017/F25.medium.gif) [Figure 9G:](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/F25) Figure 9G: Effect of interaction of ethnicity by multigenerational households in the 28-day period 31st January 2021 to 27th February 2021 ![Supplementary Figure 10A:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/09/05/2021.09.02.21263017/F26.medium.gif) [Supplementary Figure 10A:](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/F26) Supplementary Figure 10A: Global heterogeneity p-values per factor from the screening process for 28- day periods for characetrics based on work, health status and contacts ![Supplementary Figure 10B:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/09/05/2021.09.02.21263017/F27.medium.gif) [Supplementary Figure 10B:](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/F27) Supplementary Figure 10B: Global heterogeneity p-values per factor from the screening process for 28- day periods for characteristics based on household and living environment ![Supplementary Figure 11:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/09/05/2021.09.02.21263017/F28.medium.gif) [Supplementary Figure 11:](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/F28) Supplementary Figure 11: Results from ridge regression and logistic regression ![](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/09/05/2021.09.02.21263017/F29/graphic-39.medium.gif) [](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/F29/graphic-39) ![](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/09/05/2021.09.02.21263017/F29/graphic-40.medium.gif) [](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/F29/graphic-40) ![](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/09/05/2021.09.02.21263017/F29/graphic-41.medium.gif) [](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/F29/graphic-41) ![](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/09/05/2021.09.02.21263017/F29/graphic-42.medium.gif) [](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/F29/graphic-42) ![](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/09/05/2021.09.02.21263017/F29/graphic-43.medium.gif) [](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/F29/graphic-43) ![](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/09/05/2021.09.02.21263017/F29/graphic-44.medium.gif) [](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/F29/graphic-44) ![](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/09/05/2021.09.02.21263017/F29/graphic-45.medium.gif) [](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/F29/graphic-45) ![](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/09/05/2021.09.02.21263017/F29/graphic-46.medium.gif) [](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/F29/graphic-46) ![](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/09/05/2021.09.02.21263017/F29/graphic-47.medium.gif) [](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/F29/graphic-47) ![](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/09/05/2021.09.02.21263017/F29/graphic-48.medium.gif) [](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/F29/graphic-48) ![](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/09/05/2021.09.02.21263017/F29/graphic-49.medium.gif) [](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/F29/graphic-49) ![](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/09/05/2021.09.02.21263017/F29/graphic-50.medium.gif) [](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/F29/graphic-50) ![](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/09/05/2021.09.02.21263017/F29/graphic-51.medium.gif) [](http://medrxiv.org/content/early/2021/09/05/2021.09.02.21263017/F29/graphic-51) Supplementary Figure 12: Global hetergeneity p-values per factor from the screening process over all 26 fortnights * Received September 2, 2021. * Revision received September 2, 2021. * Accepted September 5, 2021. * © 2021, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution 4.0 International), CC BY 4.0, as described at [http://creativecommons.org/licenses/by/4.0/](http://creativecommons.org/licenses/by/4.0/) ## REFERENCE LIST 1. 1.World Health Organization. WHO Coronavirus (COVID-19) Dashboard. 2021. [https://covid19.who.int/](https://covid19.who.int/) (accessed 26 July 2021. 2. 2.England PH. Disparities in the risk and outcomes of COVID-19. Public Health England 2020. 3. 3.de Lusignan S, Dorward J, Correa A, et al. Risk factors for SARS-CoV-2 among patients in the Oxford Royal College of General Practitioners Research and Surveillance Centre primary care network: a cross-sectional study. The Lancet Infectious Diseases 2020; 20(9): 1034–42. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F09%2F05%2F2021.09.02.21263017.atom) 4. 4.Zheng Z, Peng F, Xu B, et al. Risk factors of critical & mortal COVID-19 cases: A systematic literature review and meta-analysis. J Infect 2020; 81(2): e16–e25. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.jinf.2020.04.021&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32335169&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F09%2F05%2F2021.09.02.21263017.atom) 5. 5.Elimian KO, Ochu CL, Ebhodaghe B, et al. Patient characteristics associated with COVID-19 positivity and fatality in Nigeria: retrospective cohort study. BMJ open 2020; 10(12): e044079. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiYm1qb3BlbiI7czo1OiJyZXNpZCI7czoxMzoiMTAvMTIvZTA0NDA3OSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIxLzA5LzA1LzIwMjEuMDkuMDIuMjEyNjMwMTcuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 6. 6.World Health Organization. Tracking SARS-CoV-2 variants. 2021. [https://www.who.int/en/activities/tracking-SARS-CoV-2-variants/](https://www.who.int/en/activities/tracking-SARS-CoV-2-variants/). 7. 7.GOV.UK. Full list of local restriction tiers by area. 2021. [https://www.gov.uk/guidance/full-list-of-local-restriction-tiers-by-area](https://www.gov.uk/guidance/full-list-of-local-restriction-tiers-by-area) (accessed 29 July 2021. 8. 8.Institute for Government. Timeline of UK government coronavirus lockdowns. 2021. 9. 9.Sah P, Fitzpatrick MC, Zimmer CF, et al. Asymptomatic SARS-CoV-2 infection: A systematic review and meta-analysis. Proceedings of the National Academy of Sciences 2021; 118(34). 10. 10.Steyerberg EW, Moons KG, van der Windt DA, et al. Prognosis Research Strategy (PROGRESS) 3: prognostic model research. PLoS Med 2013; 10(2): e1001381. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pmed.1001381&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23393430&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F09%2F05%2F2021.09.02.21263017.atom) 11. 11.Sterne JA, White IR, Carlin JB, et al. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. Bmj 2009; 338: b2393. [FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiRlVMTCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiYm1qIjtzOjU6InJlc2lkIjtzOjE3OiIzMzgvanVuMjlfMS9iMjM5MyI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIxLzA5LzA1LzIwMjEuMDkuMDIuMjEyNjMwMTcuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 12. 12.Hughes RA, Heron J, Sterne JA, Tilling K. Accounting for missing data in statistical analyses: multiple imputation is not always the answer. International journal of epidemiology 2019; 48(4): 1294–304. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F09%2F05%2F2021.09.02.21263017.atom) 13. 13.Johnston R, Jones K, Manley D. Confounding and collinearity in regression analysis: a cautionary tale and an alternative procedure, illustrated by studies of British voting behaviour. Qual Quant 2018; 52(4): 1957–76. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s11135-017-0584-6&link_type=DOI) 14. 14.Riley S, Atchison C, Ashby D, et al. REal-time Assessment of Community Transmission (REACT) of SARS-CoV-2 virus: Study protocol. Wellcome Open Res 2020; 5: 200. 15. 15.Ministry of Housing CLG. English indices of deprivation 2019, 2019. 16. 16.Northern Ireland Statistics and Research Agency. Northern Ireland Multiple Deprivation Measure 2017 (NIMDM2017), 2017. 17. 17.Scottish Government. Scottish Index of Multiple Deprivation 2020, 2020. 18. 18.Statistics for Wales. Welsh Index of Multiple Deprivation (full Index update with ranks): 2019, 2019. 19. 19.Chang BH, Hoaglin DC. Meta-Analysis of Odds Ratios: Current Good Practices. Med Care 2017; 55(4): 328–35. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1097/MLR.0000000000000696&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F09%2F05%2F2021.09.02.21263017.atom) 20. 20.Doerken S, Avalos M, Lagarde E, Schumacher M. Penalized logistic regression with low prevalence exposures beyond high dimensional settings. PLoS One 2019; 14(5): e0217057. 21. 21.Office for National Statistics. COVID-19 Infection Survey: methods and further information. 2021. [https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/methodologies/covid19infectionsurveypilotmethodsandfurtherinformation](https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/methodologies/covid19infectionsurveypilotmethodsandfurtherinformation) (accessed 29 July 2021. 22. 22.Walker AS, Vihta KD, Gethings O, et al. Increased infections, but not viral burden, with a new SARS-CoV-2 variant. medRxiv 2021. 23. 23.Public Health England. SARS-CoV-2 variants of concern and variants under investigation in England, 2021. 24. 24.Krzywinski M, Altman N. Power and sample size. Nature Methods 2013; 10(12): 1139–40. 25. 25.Nemes S, Jonasson JM, Genell A, Steineck G. Bias in odds ratios by logistic regression modelling and sample size. BMC Med Res Methodol 2009; 9: 56. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/1471-2288-9-56&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19635144&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F09%2F05%2F2021.09.02.21263017.atom) 26. 26.van Smeden M, de Groot JA, Moons KG, et al. No rationale for 1 variable per 10 events criterion for binary logistic regression analysis. BMC Med Res Methodol 2016; 16(1): 163. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s12874-016-0267-3.&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F09%2F05%2F2021.09.02.21263017.atom) 27. 27.Albers C. The problem with unadjusted multiple and sequential statistical testing. Nat Commun 2019; 10(1): 1921. 28. 28.Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological) 1995; 57(1): 289–300. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2307/2346101&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=WOS:A1995QE4&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F09%2F05%2F2021.09.02.21263017.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1995QE45300017&link_type=ISI) 29. 29.Benjamini Y, Hochberg Y. On the adaptive control of the false discovery rate in multiple testing with independent statistics. Journal of educational and Behavioral Statistics 2000; 25(1): 60–83. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.3102/10769986025001060&link_type=DOI) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000087561900004&link_type=ISI) 30. 30.Riley S, Eales O, Haw D, et al. REACT-1 round 13 interim report: acceleration of SARS- CoV-2 Delta epidemic in the community in England during late June and early July 2021. medRxiv 2021. 31. 31.UK Department of Health & Social Care. NHS Test and Trace Statistics (England): Methodology. 2021. [https://www.gov.uk/government/publications/nhs-test-and-trace-statistics-england-methodology/nhs-test-and-trace-statistics-england-methodology](https://www.gov.uk/government/publications/nhs-test-and-trace-statistics-england-methodology/nhs-test-and-trace-statistics-england-methodology) (accessed 29 July 2021. 32. 32.Pan American Health Organisation. Interim guidelines for detecting cases of reinfection by SARS-CoV-2, 2020. ## SUPPLEMENTARY REFERENCE LIST 1. 1.Ministry of Housing CLG. English indices of deprivation 2019, 2019. 2. 2.Northern Ireland Statistics and Research Agency. Northern Ireland Multiple Deprivation Measure 2017 (NIMDM2017), 2017. 3. 3.Scottish Government. Scottish Index of Multiple Deprivation 2020, 2020. 4. 4.Statistics for Wales. Welsh Index of Multiple Deprivation (full Index update with ranks): 2019, 2019. [1]: /embed/graphic-1.gif