Summary
Control of the SARS-CoV-2 epidemic requires rapid identification and isolation of infected individuals and their contacts. Community testing in England (Pillar 2) by polymerase chain reaction (PCR) is reserved for those reporting at least one of four ‘classic’ COVID-19 symptoms (loss or change of sense of smell, loss or change of sense of taste, fever, new continuous cough).1 Detection of positive cases in the community might be improved by including additional symptoms and their combinations. We used data from the REal-time Assessment of Community Transmission-1 (REACT-1) study to investigate symptom profiles for PCR positivity at different ages. Among rounds 2–7 (June to December 2020), an age-stratified, variable selection approach stably selected chills (all ages), headache (5–17 years), appetite loss (18–54 and 55+ years) and muscle aches (18–54 years) as jointly and positively predictive of PCR positivity together with the classic four symptoms. Between round 7 (November to December 2020) and round 8 (January 2021) when new variant B.1.1.7 predominated, only loss or change of sense of smell (more predictive in round 7) and (borderline) new persistent cough (more predictive in round 8) differed between cases. At any level of PCR testing, triage based on the symptoms identified here would result in more cases detected than the current approach.
Main
High community transmission and prevalence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in England during December 2020 and January 20212 led to a third national lockdown starting 6 January 2021. To help control the epidemic, rapid identification and isolation of infected individuals is essential to curtail the transmission of SARS-CoV-23 together with testing and isolation of coronavirus disease 2019 (COVID-19) contacts.4,5,6 Polymerase chain reaction (PCR) is the gold standard7 for SARS-CoV-2 detection in the general population. The UK government has established a National Health Service Test and Trace system for testing symptomatic individuals in the community (Pillar 2) based on reporting of at least one of the following four ‘classic’ COVID-19 symptoms: fever, new continuous cough, loss or change in sense of smell or taste.1 We hypothesised that detection of cases in the community might be improved by including additional symptoms and their combinations.
The REal-time Assessment of Community Transmission-1 (REACT-1) study is a series of 2-3 weekly community-based cross-sectional surveys of SARS-CoV-2 PCR swab positivity carried out each month in England. It is based on random, non-overlapping samples of the general population aged 5 years and above.8 Here, we use REACT-1 data from rounds 2 to 7 (June to December 2020) and round 8 (January 2021) to evaluate the ability of (combinations of) reported symptoms in predicting PCR positivity in order to inform test allocation prioritisation. We compare the performances of models based on presence of at-least-one-of-four symptoms as per the current test allocation strategy, and parsimonious age-specific sets of symptoms found to be jointly predictive of PCR positivity. We also compare results across the most recent two rounds of data collection (round 7, November to December 2020, and round 8).
Descriptive analyses of symptoms and PCR positivity
Of the 979,732 respondents completing a valid swab test who were enrolled in REACT-1 rounds 2–7, 19 were excluded due to missing information leaving 979,713 participants; of these, 870,875 (88.9%) reported no symptoms in the week prior to testing while 108,838 (11.1%) reported at least one of the 26 surveyed symptoms (Figure 1). We detected 4,169 PCR positives of whom 1,539 (36.9%) reported one or more symptoms in the past week (Figure 1-A and Supplementary Table 1).
In round 8 (Figure 1-B), after exclusion of 6 individuals with missing information, there were 167,636 participants, of whom 146,701 (87.5%) reported no symptoms in the past week and 20,935 (12.5%) reported at least one of the 26 surveyed symptoms. We detected 2,282 PCR positives of whom 1,031 (45.1%) reported one or more symptoms in the past week.
In rounds 2–7, we found a mean unweighted community prevalence of PCR positivity of 0.46%; 1.49% of participants reported at-least-one-of-four classic COVID-19 symptoms in the week before testing, increasing to 1.36% and 1.60% respectively in round 8. Odds ratio (OR) for testing positive among participants with at-least-one-of-four symptoms was 19.2 (95% confidence interval 17.7, 20.8) in rounds 2–7 and 30.3 (27.2, 33.7) in round 8 compared to people reporting no symptoms in the preceding week (Figure 1), consistent with results of the Office for National Statistics (ONS) Coronavirus (COVID-19) Infection Survey9. We calculated positive predictive values of 5.5% (rounds 2–7) and 20.7% (round 8) for at-least-one-of-four classic symptoms and 0.8% and 2.5%, respectively, for participants with one or more of the other 22 symptoms (Figure 1). In age-stratified analyses, we show that each of the 26 symptoms was associated with PCR positivity, except chest pain (rounds 2–7 and round 8) and shortness of breath (round 8 only) among 5–17 year olds (Supplementary Figure 1).
We constructed upset plots that show frequency of symptom occurrence or co-occurrence separately among PCR positives and negatives using the UpsetR package in R10 (Supplementary Figure 2). Among PCR positives, the most common symptom profiles were, in order: headache alone, co-occurring loss or change of sense of smell and taste, tiredness alone (rounds 2–7); and tiredness alone, headache alone, co-occurring loss or change of sense of smell and taste (round 8). Among PCR negatives, the most common symptom profiles were, in order: sore throat, headache, tiredness alone (rounds 2–7); and headache, tiredness, sore throat alone (round 8).
Multivariate analyses
We then applied a stability selection procedure using penalised (Least Absolute Shrinkage and Selection Operator, LASSO) logistic regression among participants reporting at least one of the 26 surveyed symptoms. This was done separately at ages 5–17, 18–54 and 55+ years, to identify a model for each age group with parsimonious sets of symptoms jointly predictive of PCR positivity. We fitted and calibrated models on a 70% training set of data from rounds 2–7 and applied them in the remaining 30% as well as in all of round 8 data (see Methods). We considered stably selected symptoms to be predictive of PCR positivity, with selection proportions >50% across the N=1,000 models (based on 50% subsamples of the full study population). The selected symptoms and their penalised (shrunk) coefficients, averaged over models, are conditional on all other symptoms included.
In age-stratified analyses, six (5–17 years) or seven (18–54 and 55+ years) symptoms were jointly selected (Figure 2). All four classic symptoms (except new persistent cough for 5–17 years) were selected, along with chills in all age groups. Additional age-specific symptoms selected were headache (5–17 years), appetite loss (18–54 and 55+ years) and muscle aches (18–54 years) (positive coefficients) as well as runny nose (5–17 years) and numbness/tingling (55+ years) (negative coefficients). In contrast to results from univariate analyses (Supplementary Figure 1), runny nose appears to protect against risk of PCR positivity in the stability selection model for 5–17 year olds. This change in the sign of the association upon adjustment for other symptoms suggests that runny nose tends to co-occur with COVID-19 symptoms but is not specific to SARS-CoV-2 infection. Similarly, numbness/tingling reduces the predicted probability of getting a PCR positive test among 55+ year olds conditional on the other selected symptoms.
In all age groups, addition of any other symptom (with non-zero selection proportion) made only incremental improvement to model performance (change in area under the receiver operating characteristic curve [AUC] < 0.02). Our stability selection models outperformed the at-least-one-of-four classic symptoms approach in rounds 2–7 test (hold-out) data with AUC = 0.72, 0.80, and 0.73 versus 0.64, 0.73 and 0.66 for 5–17, 18–54, and 55+ years, respectively (Figure 2). In sensitivity analyses, gradient-boosted trees,11 a non-linear alternative approach that accommodates complex interactions, did not outperform our penalised regression models in test data (Supplementary Figure 3).
Round 7 versus round 8
We observed the highest prevalence of any REACT-1 round so far in round 8.12 We compared the prevalence of symptoms in round 7 and round 8 among PCR negatives and positives, separately, using a test-negative design.13 This was to account for possible seasonal effects that might explain the high rates of symptom reporting in round 8 (Supplementary Table 1 and Supplementary Table 2). Our analyses identified several symptoms with higher prevalence in round 8 than round 7, but, accounting for seasonal effects, prediction of PCR positivity only differed between rounds for loss or change of sense of smell (more predictive in round 7 than round 8) and (borderline) for new persistent cough, which was more predictive in round 8 (Supplementary Table 2). Our stability selection LASSO trained on round 7 and round 8 selected identical sets of variables jointly predictive of PCR positivity among individuals aged 55+ years, and slightly different sets of variables for the two younger age groups (Supplementary Figure 4). Despite these differences, the model trained on rounds 2–7 and tested on round 8 provided comparable predictive performance (AUC=0.78) to that tested on round 2–7 hold out data (AUC=0.77).
Our results may be relevant to understanding whether the symptom profile for variant B.1.1.7, which was first detected in England in September 2020, differs from previous strains. The rapid increase in the prevalence of this variant in England means that it became the dominant strain by January 2021, responsible for greater than 80% of infections at that time compared to ∼16% in mid-November 2020.14 Thus our findings of reduced PCR positivity risk associated with loss or change of sense of smell and increased risk for new persistent cough (in round 8 compared to round 7) may be due to variant B.1.1.7. These results are in keeping with findings from the ONS Coronavirus (COVID-19) Infection Survey where people testing PCR positive compatible with variant B.1.1.7 were less likely to report loss or change of taste or smell (both symptoms combined), and more likely to report cough, compared with earlier strains.9 Conversely, other differences in the symptoms associated with PCR positivity in round 7 and round 8 may, at least partially, be attributed to (possibly seasonal) changes in the prevalence of these symptoms.
Community case detection
We show that at any proportion of community testing of symptomatic people, our models with selected symptoms would detect more cases than the at-least-one-of-four symptom approach currently adopted in England (Figure 3). If all people who reported at-least-one-of-four classic symptoms were tested (13% of the symptomatic population), we would detect 53% (rounds 2–7, Figure 3-A) and 55% (round 8, Figure 3-B) of symptomatic infections in the community; this would rise to 75% (76% in round 8) with optimised PCR testing allocation using our stability selection models (red dotted lines on Figure 3). Of these detected positives, 30% (rounds 2–7) and 27% (round 8) would not have reported any of the four classic symptoms, and so would not have been eligible for PCR testing. The ∼40% increase in case detection (above that for at-least-one-of-four symptoms) would involve a 2.7-fold increase in the number of tests allocated to the symptomatic population in England. Even so, only around one third of total community infections would be detected overall, due to the high proportion of non-symptomatic infections. This suggests that targeted testing of non-symptomatic individuals should also be considered alongside symptomatic testing15, for example, in areas of high prevalence.
Discussion
Previous models of the relationship between symptoms and infection identified fatigue and skipping meals in addition to the classic four symptoms as predictive of reported PCR positivity, although these symptoms did not appear to vary by age.16 In contrast, we show here that symptom profile may differ by age, and that age-stratified symptom-based models may offer improved predictions of PCR positivity. For every possible number of PCR tests done, more COVID-19 cases would be identified by our models than with the current at-least-one-of-four classic symptoms strategy.
We note that the selected symptoms all contribute to prediction of PCR positivity: thus, for example, loss or change of sense of taste and loss or change of sense of smell confer independent predictive information and therefore both symptoms should be considered. Likewise, reporting chills provides additional information at all ages beyond fever, and loss of appetite (among adults) adds additional information over and above loss or change of sense of taste or smell. Beyond the symptoms selected by LASSO stability selection, however, we found no meaningful improvement in prediction of PCR positivity, indicating that our selection procedure provided an informative but parsimonious set of variables.
Limitations of our study include that the participants, though randomly selected from the community, may not be fully representative and therefore results may not be directly applicable to the whole population. However, our sampling procedure, which provided approximately equal numbers of participants in all 315 lower tier local authority areas in England, ensured wide geographical coverage and captured the socio-demographic and ethnic diversity of the population of England. In addition, self-administered questionnaires are subject to possible recall and other biases and therefore may not give an accurate representation of the symptom profiles among the population as a whole. Also, as sampling in each round was cross-sectional, some individuals may have been infected (and had symptoms) more than one week before the swab was obtained, but were no longer symptomatic at the time of the study.
Our models with stably selected symptoms have the advantage of increasing the yield of positive tests leading to greater numbers of infected people being required to self-isolate, and therefore reducing the pool of infection in the community. However, this may increase the need to support individuals who may be economically disadvantaged by having to self-isolate. Also, since a high proportion of community infections are non-symptomatic, the majority of infected individuals would still go undetected. Thus, at the same time as widening the availability of testing among symptomatic people, consideration should be given to targeted testing of non-symptomatic individuals, for example, those in areas of high prevalence.
In summary, we show that the yield for community COVID-19 case detection could be increased. This would involve inclusion of a few additional symptoms in the criteria for test eligibility, in particular, chills, headache, appetite loss and muscle aches.
Methods
Study Population
REACT-1 is a series of community prevalence surveys of SARS-CoV-2 virus swab-positivity in England, conducted at approximately monthly intervals since May 2020. In each round of the study, recruitment letters are sent at random to a nationally representative sample of individuals aged 5 years and over using the National Health Service patient register across the 315 lower-tier local authorities in England. Each round aims to gather around 160,000 valid responses. Data from round 1 were excluded from our analysis as the symptom questions asked in that round were not consistent with subsequent rounds.8
Statistical analyses
Univariate models
Within each age group we used logistic regressions to model the risk of testing positive for SARS-CoV-2 infection, conditionally on having experienced any of 26 symptoms in the week prior to testing. We report ORs from such univariate models for each of the 26 symptoms separately.
Multivariate analyses
We adopted a variable selection approach to account for correlation and co-occurrence patterns across the 26 symptoms, and to model their joint and mutually adjusted effects on risk of PCR positivity. We used stability selection applied to LASSO (Least Absolute Shrinkage and Selection Operator) penalised logistic regression including all 26 symptoms as predictors and PCR test positive/negative as the outcome variable.17 In our stability selection, LASSO models are refitted on 1,000 random subsamples of the data, with a representative proportion of negative and positive tests retained in each subsample. As a measure of importance of each symptom in predicting PCR positivity, conditional on the other symptoms, we report the number of times each symptom was included in the model across the 1,000 subsamples. We considered symptoms with selection proportion above 50% as stable and included them in the stability selection model. The penalty parameter is calibrated to ensure that fewer than five predictors could be falsely selected in this model (per family error rate). For stable symptoms, the average ORs (conditionally on selection in the refitted LASSO models with calibrated penalty) are computed and used as effect size estimates in the stability selection model.
As a non-parametric alternative, we used gradient-boosted forests, as implemented in XGBoost.11 Model complexity was calibrated by tuning the number of trees (‘nrounds’) and tree depth (‘max_depth’) hyperparameters using 5-fold cross-validation, within each age group. Other hyperparameters were set to default.
For both multivariate models, the rounds 2–7 dataset (restricted to people reporting any of the 26 symptoms) was split into a 70% training set (76,187 observations, of which 1,078 were positive) and a 30% test set (32,651 observations, of which 461 were positive). All models were trained on the 70% training set and evaluated on the test set, as well as on round 8 data.
Evaluation of the test allocation strategy
For each participant in the test set, we used age-specific effect size estimates to derive an individual predicted probability of testing positive from (i) the stability selection model, (ii) the (unpenalised) logistic model with the four classic symptoms, and (iii) the currently implemented strategy (probability of 1 if experiencing any of the four main symptoms, and zero otherwise). To compare the performances of these three symptom-based strategies to select individuals for PCR testing, we computed the AUC for each.
We also simulated reallocating PCR tests in the community on the basis of model-derived risk scores and with the current testing strategy (at-least-one-of-four classic symptoms). We reported the proportion of symptomatic COVID-19 cases that would be detected (on PCR testing) using each of the three strategies for prioritisation, as a function of the proportion of symptomatic participants tested.
Round 7 versus round 8
We adopted a similar stability selection logistic LASSO approach on data restricted to participants from rounds 7 and 8 separately, and compared resulting selection proportions. To account for seasonal effects on the prevalence of symptoms, irrespective of PCR positivity, we adopted a test-negative design approach comparing separately among PCR positives and negatives across rounds. We investigated differences in the ORs (measuring the association between the symptom and PCR positivity) between round 7 and round 8 using a logistic model setting the symptom as outcome, the round and PCR positivity as predictors, and including a multiplicative interaction term. The effect of round is independent of PCR positivity (and can be interpreted as the effect of season on the symptom’s prevalence), the effect of PCR positivity measures the association between symptoms prevalence and the outcome of the test, irrespective of the round, and the interaction term measures a possible change in the association between PCR positivity and the symptoms, independently of the effect of season.
Data Availability
Summary statistics and descriptive tables are available in Supplementary information; R scripts used to perform the analyses can be downloaded at https://github.com/mathzero/react_symptom_risk_prediciton.
Ethical Approval
We obtained research ethics approval from the South Central-Berkshire B Research Ethics Committee (IRAS ID: 283787).
Contributors
JE, MW, and BB are joint first authors. MC-H and PE are joint last authors. JE, MW, MC-H, BB and PE conceived the study and drafted the manuscript. JE, BB, and MDW performed the statistical analyses. SR, HW and GC provided insights into the study design and results interpretation. AD and PE obtained funding. All authors revised the manuscript for important intellectual content and approved the submission of the manuscript. MC-H and PE had full access to the data and take responsibility for the integrity of the data and the accuracy of the data analysis and for the decision to submit for publication.
Funding/Support
This work was funded by the Department of Health and Social Care in England.
MC-H and MW acknowledge support from the H2020-EXPANSE project (Horizon 2020 grant No 874627). MC-H, JE, and BB acknowledge support from Cancer Research UK, Population Research Committee Project grant ‘Mechanomics’ (grant No 22184 to MC-H). BB received a PhD studentship from the MRC Centre for Environment and Health. HW is a NIHR Senior Investigator and acknowledges support from NIHR Biomedical Research Centre of Imperial College NHS Trust, NIHR School of Public Health Research, NIHR Applied Research Collaborative North West London, Wellcome Trust (UNS32973). SR acknowledges support from MRC Centre for Global Infectious Disease Analysis, National Institute for Health Research (NIHR) Health Protection Research Unit (HPRU), Wellcome Trust (200861/Z/16/Z, 200187/Z/15/Z), and Centres for Disease Control and Prevention (US, 442 U01CK0005-01-02). GC is supported by an NIHR Professorship. PE is Director of the MRC Centre for Environment and Health (MR/L01341X/1, MR/S019669/1). PE acknowledges support from the National Institute for Health Research Imperial Biomedical Research Centre and the NIHR Health Protection Research Units in Chemical and Radiation Threats and Hazards, and in Environmental Exposures and Health, the British Heart Foundation (BHF) Centre for Research Excellence at Imperial College London (RE/18/4/34215).and the UK Dementia Research Institute at Imperial (MC_PC_17114). We thank The Huo Family Foundation for their support of our work on COVID-19. The funders had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; and preparation, review, or approval of this manuscript.
Data sharing statement
Summary statistics and descriptive tables are available in Supplementary information; R scripts used to perform the analyses can be downloaded at https://github.com/barbarabodinier/Community_detection_COVID-19.
Declaration of interests
Prof. Elliott is the director of the MRC Centre of Environment and Health (MR/L01341X/1 and MC/S019669/1) and has no conflict of interest to disclose. Prof M Chadeau-Hyam holds shares in the O-SMOSE company and has no conflict of interest to disclose. Consulting activities conducted by the company are independent of the present work. All other authors have no conflict of interest to disclose.
Supplementary information
Acknowledgements
We thank key collaborators on this work – Ipsos MORI: Kelly Beaver, Sam Clemens, Gary Welch, Nicholas Gilby, Kelly Ward and Kevin Pickering; Institute of Global Health Innovation at Imperial College: Gianluca Fontana, Dr Hutan Ashrafian, Sutha Satkunarajah, Didi Thompson, Justine Alford and Lenny Naar; Molecular Diagnostic Unit, Imperial College London: Prof. Graham Taylor; North West London Pathology and Public Health England for help in calibration of the laboratory analyses; NHS Digital for access to the NHS register; and the Department of Health and Social Care for logistic support.