Identifying optimal combinations of symptoms to trigger diagnostic work-up of suspected COVID-19 cases in vaccine trials: analysis from a community-based, prospective, observational cohort
============================================================================================================================================================================================

* M Antonelli
* J Capdevila
* A Chaudhari
* J Granerod
* LS Canas
* MS Graham
* K Klaser
* M Modat
* E Molteni
* B Murray
* C Sudre
* R Davies
* A May
* LH Nguyen
* DA Drew
* A Joshi
* AT Chan
* JP Cramer
* T Spector
* J Wolf
* S Ourselin
* C Steves
* AE Loeliger

## Abstract

**Background** Several COVID-19 vaccine efficacy trials are ongoing with others predicted to start soon. Diagnostic work-up of trial participants following any COVID-19 associated symptom will lead to extensive testing, potentially overwhelming laboratory capacity whilst primarily yielding negative results. We aimed to identify an efficient symptom combination to capture most cases using the lowest possible number of tests.

**Methods** UK and US users of the COVID-19 Symptom Study app who reported new-onset symptoms between March-September 2020 and an RT-PCR test within seven days of symptom onset were included. Sensitivity, specificity, and number of RT-PCR tests needed to identify one RT-PCR positive case were calculated for individual symptoms and symptom combinations. A multi-objective evolutionary algorithm was applied to generate symptom combinations with good trade-offs between sensitivity and specificity.

**Findings** The UK dataset included 122,305 individuals (1,202 RT-PCR positive). Findings were replicated in a US dataset including 3,162 individuals (79 RT-PCR positive). Within three days of symptom onset, the COVID-19 specific symptom combination (cough, dyspnoea, fever, anosmia/ageusia) identified 69% of cases requiring 47 RT-PCR tests per positive case. The symptom combination with highest sensitivity was fatigue, anosmia, cough, diarrhoea, headache, and sore throat, identifying 96% of cases and requiring 96 tests.

**Interpretation** We confirm the significance of COVID-19 specific symptoms widely recommended for triggering RT-PCR. By using the data-driven optimization technique we identified additional symptoms (fatigue, sore throat, headache, diarrhoea) that enabled many more positive cases to be captured efficiently. By providing a set of solutions with optimal trade-offs between sensitivity and specificity, we produced a selection of symptom subsets that maximise the capture of cases given different laboratory capacities. The methodology may be of particular use for COVID-19 vaccine developers across a range of resource settings and have more far-reaching public health implications for detection of symptomatic SARS CoV2 infection.

**Funding** Zoe Global Limited, Department of Health, Wellcome Trust, Engineering and Physical Sciences Research Council (EPSRC), National Institute for Health Research (NIHR), Medical Research Council (MRC), Alzheimer’s Society, Massachusetts Consortium for Pathogen Readiness (MassCPR), Coalition for Epidemic Preparedness Innovations (CEPI)

**Evidence before this study** We searched PubMed up to November 16, 2020, with the terms “COVID-19” OR “SARS-CoV-2” AND “symptom” AND “community-based”, with no date or language restrictions, to find information about symptoms associated with COVID-19 from the community setting. The search retrieved 68 articles; however, most were not relevant as related to specific subgroups (e.g. pregnant women, cancer patients) or aspects (e.g. mental health, diagnostic testing). Fever, cough, dyspnoea, tachypnoea, anosmia, and ageusia are the symptoms most commonly identified in COVID-19 patients and typically included in guidelines from the WHO and similar bodies. These data however come primarily from hospital-based studies. An assessment of the value of symptom combinations in the community in predicting COVID-19 is lacking.

**Added value of this study** We present data from the largest, prospective community-based cohort study to date and quantify the contribution of various COVID-19 symptoms and symptom-combinations to RT-PCR positive COVID-19 case-finding. Our study is unique in that it simulates PCR testing in a clinical trial situation. Newly-symptomatic individuals were investigated at three days of symptom onset, and a second analysis of symptoms that occurred within seven days of symptom onset was also included to capture delayed symptom triggers. We confirm the significance of symptoms (e.g. fever, cough, anosmia/ageusia) widely used for triggering a RT-PCR test and identified additional symptoms such as fatigue, headache, sore throat, and diarrhoea to define the most efficient symptom combinations.

**Implications of all the available evidence** The applied methodology enables the selection of symptom subsets to maximise the capture of cases while taking account of specific laboratory capacity. Our findings have important implications for COVID-19 vaccine developers to optimise the choice of triggering symptoms for diagnostic work-up COVID-19 vaccine efficacy trials and also implications for public health due to the community-based nature of the data source used.

## Introduction

Safe and effective vaccines represent the most promising intervention to prevent morbidity and mortality during the coronavirus disease (COVID)-19 pandemic.1,2 Several COVID-19 Phase 3 vaccine efficacy trials are ongoing with others predicted to start soon. In a clinical trial, diagnostic testing of suspected cases (e.g. reverse transcription polymerase chain reaction [RT-PCR] for severe acute respiratory syndrome coronavirus 2 [SARS-CoV-2]) could be triggered by the presence of any COVID-19 associated symptom. However, the signs and symptoms associated with COVID-19 are extensive and overlap with those of other common viral infections.3,4 Thus, it is possible that diagnostic work-up following any COVID-19 associated symptom would lead to indiscriminate testing and overwhelm laboratory capacity whilst primarily yielding negative results.

Cough, dyspnoea (shortness of breath), tachypnoea (fast breathing), and fever characterise COVID-19 pneumonia according to the World Health Organization (WHO).5 Additionally, anosmia and ageusia had the highest positive predictive value(PPV) of all reported COVID-19 symptoms.6,7 A household survey in the United Kingdom (UK), however, showed that fever, cough, anosmia, and ageusia were present on the day of testing in only 60% of symptomatic, RT-PCR positive individuals.8 Thus, other less specific signs/symptoms associated with COVID-19 occur in a substantial number of patients.

Identification of an efficient symptom combination to trigger diagnostic work-up that will capture the majority of COVID-19 cases using the lowest possible number of tests would enable optimum use of laboratory and financial resources in future vaccine efficacy trials. Such data are scant, and triggering symptoms vary between publicly available vaccine efficacy trial protocols.9-13 Identification of an efficient combination of symptoms may be especially useful for vaccine developers in resource- or capacity-constrained settings.

We aimed to simulate COVID-19 case finding in a trial population using a community-based, prospective, observational cohort study. Data from UK COVID Symptom Study App14 users were used to quantify how much individual COVID-19 symptoms contribute to COVID-19 case finding and to compute the sensitivity and specificity of specific combinations of symptoms if used to trigger a RT-PCR test. The findings were replicated in a dataset of COVID Symptom Study App users in the United States (US).

## Methods

### Study design and data source

A community-based cohort study was carried out using data from the COVID Symptom Study App, a free smartphone app developed by Zoe Global (London, UK) in collaboration with King’s College London (London, UK) and Massachusetts General Hospital (Boston, MA, USA).14 The app was launched on March 24th and March 29th, 2020 in the UK and US, respectively. Participants report baseline demographic information, data on comorbidities and COVID-19 testing results, and are encouraged to self-report a set of pre-specified symptoms on a daily basis to enable collection of longitudinal information on incident symptoms. This study was approved by the Partners Human Research Committee (Protocol 2020P000909) and King’s College London ethics committee (REMAS ID 18210, LRS-19/20-18210).

Data used in this study is available to bona fide researchers through UK Health Data Research ([https://web.www.healthdatagateway.org/dataset/fddcb382-3051-4394-8436-b92295f14259](https://web.www.healthdatagateway.org/dataset/fddcb382-3051-4394-8436-b92295f14259)).

### Study population

Individuals were included in the study if they met the following criteria: 1) aged ≥18 years, 2) reported developing any symptom between March 24th and September 15th, 2020, and 3) entered a valid RT-PCR test result within the first seven days of symptom onset. App users who recorded a history of COVID-19 were excluded. Data were frozen and extracted on October 21st. Users could update RT-PCR results retrospectively (i.e. from ‘waiting for result’ to ‘result was negative’); however, only updates implemented before the extraction date were considered. Two cohorts were used for analyses: the discovery data cohort and replication data cohort from UK and US participants who fulfilled the inclusion criteria, respectively. The latter served to confirm the generalisability of the results.

### Data analyses

Symptoms recorded within three and seven days of symptom onset were included in the analyses (see supplementary **Table e1** for complete list of symptoms and corresponding questions participants were asked). Analysis of symptoms within the first three days is key to enable testing for SARS-CoV-2 soon after symptom onset while viral load is highest. An additional buffer for inclusion of symptoms within seven days was also used, which may be important to detect development of lower respiratory tract signs indicative of pneumonia. Anosmia and ageusia were considered one symptom in the reporting app. Tachypnoea was not captured as it is a sign measured by a healthcare professional rather than a self-reported symptom, however it may be captured in part by the symptom dyspnoea.

Individuals were classified as symptom-screening positive when they recorded at least one of the symptoms in the subset concerned. This was compared with self-reported RT-PCR results considered the gold standard for COVID-19 positive case detection. If multiple positive RT-PCR test results were recorded for an individual, only the first was included for the purpose of the analyses.

For different symptoms or combinations of symptoms, three evaluation parameters were considered: 1) sensitivity, computed as the percentage of COVID-19 positive individuals correctly identified, 2) specificity, calculated as the percentage of individuals correctly classified as COVID-19 negative, and 3) the reciprocal of precision, that is the number of RT-PCR tests needed to identify one RT-PCR positive COVID-19 case (i.e. Tests Per Case [TPC]).

All analyses were conducted using the discovery data cohort (UK) and replication data cohort (US) and stratified by age (18-54 and ≥55 years). These age groups were selected to align with those selected in ongoing COVID-19 vaccine efficacy trials.

### Evaluation of individual symptoms and clinically-inferred subsets of symptoms

Sensitivity, specificity, and TPC were evaluated for each individual symptom, and for the following four combinations of symptoms derived a priori from clinical experience and guidance (i.e. clinically-inferred subsets): 1) respiratory symptoms (cough, dyspnoea), 2) WHO-defined pneumonia symptoms (cough, dyspnoea, fever), 3) COVID-19 specific symptoms as defined by Public Health England (PHE; fever, cough, dyspnoea, anosmia/ageusia), and 4) extended symptoms (fever, cough, dyspnoea, anosmia/ageusia, fatigue, headache). This latter category was added post-hoc after exploration of the App data indicated high sensitivity of headache and fatigue in other contexts.15

### Evaluation of data-inferred subsets

An optimisation technique was subsequently used to generate optimal subsets of symptoms from the data. Optimisation problems with multiple objectives have a set of optimal solutions (known as Pareto-optimal solutions) rather than one single optimal solution. No Pareto-optimal solution is better than the other without further information on the specific objective to be addressed. As sensitivity and specificity represent conflicting objectives, a multi-objective evolutionary algorithm (MOEA) was applied to generate efficient combinations of symptoms each characterised by a good trade-off between specificity and sensitivity. The python package pymoo v0.4.2.1 was used for MOEA optimisation. More specifically, we employed the well-known NSGAII16, (see supplementary **Table e2** for parameter information).

To generate the Pareto of optimal combinations of symptoms (referred to as data-inferred subsets hereafter), the discovery (UK) cohort was randomly split into a training and validation set. The training set (601 COVID-19 positive, 60,552 negative cases) was used to train the MOEA and generate the Pareto of the optimal subsets of symptoms. The validation set (601 COVID-19 positive, 60,551 negative cases) was used to evaluate each optimal combination by computing the sensitivity, specificity, and TPC of each generated subset. For the validation set, the sensitivity and specificity of the Pareto of optimal subsets were also computed on the two age groups to show the generalisability of the generated optimal subsets. All optimal combinations were also validated on the replication (US) cohort.

## Results

A total of 122,305 individuals were included in the discovery (UK) data cohort, of which 1,202 recorded a positive RT-PCR test result for COVID-19. In the replication (US) data cohort, 3,162 individuals were included, of which 79 recorded a positive result. The patient selection flow charts for both cohorts are displayed in supplementary **Figures e1**, and **e2**. The age and sex distribution were similar between RT-PCR positive and negative participants within the cohorts; however, slight differences between cohorts were observed (**Table 1**). There was a lower proportion of male participants in the replication (US) cohort (17%) compared to the discovery (UK) cohort (25%), and the mean age was slightly higher (54% compared to 48%) (**Table 1**).

View this table:
[Table 1.](http://medrxiv.org/content/early/2020/11/25/2020.11.23.20237313/T1)

Table 1. Demographics of study population

### Evaluation of individual symptoms and clinically-inferred subsets of symptoms

The sensitivity, specificity, and TPC for each individual symptom reported within three and seven days of symptom onset are displayed in **Table 2**. Using data from the discovery (UK) cohort for all ages, the individual symptoms with the highest sensitivity in both three- and seven-day analyses were headache (66·8% and 75·6% for three- and seven-day analyses, respectively) and fatigue (64·9% and 77·8% for three- and seven-day analyses, respectively). Similar results were obtained with data from the replication cohort (US) and when data were stratified by age. The sensitivity of anosmia in the discovery (UK) cohort was only 21·8% within the first three days of symptom onset and 48·7% in the seven-day analyses. Anosmia, however, had the lowest TPC; when compared to headache, the TPC decreased from 76 to 20 and 70 to 10, for three- and seven-day analyses, respectively. These results are confirmed by **Figure 1**, which displays the frequency of the symptoms for the discovery (UK) cohort for both COVID-19 positive and negative cases.

View this table:
[Table 2.](http://medrxiv.org/content/early/2020/11/25/2020.11.23.20237313/T2)

Table 2. Sensitivity, specificity, and Tests Per Case (TPC) for each individual symptom computed on the discovery data (UK) cohort

![Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/11/25/2020.11.23.20237313/F1.medium.gif)

[Figure 1.](http://medrxiv.org/content/early/2020/11/25/2020.11.23.20237313/F1)

Figure 1. Symptom frequency for COVID-19 negative (left) and COVID-19 positive (right) cases

The sensitivity, specificity, and TPC for the four clinically-inferred subsets of symptoms reported within three and seven days of symptom onset are displayed in supplementary **Table e3**. For the discovery (UK) data cohort, only 45% of individuals positive for COVID-19 reported cough or dyspnoea within the first three days of symptom onset. The addition of fever (i.e. WHO-defined pneumonia symptom subset) increased sensitivity to 59%, while the further addition of anosmia/ageusia (i.e. PHE COVID-19 specific symptom subset) increased sensitivity to 67%. The extended symptoms set (i.e. cough, dyspnoea, fever, anosmia/ageusia, headache, and fatigue) increased the proportion of COVID-19 cases identified to 91% but required twice the number TPC compared to the respiratory symptom combination (43 versus 86). Similarly, within seven days of symptom onset, COVID-19 specific and extended symptoms were reported in 81% and 96% of RT-PCR positive cases, at the cost of 43 and 82 TPC, respectively. Similar results were obtained when data were stratified by age. The sensitivity estimates from the replication data cohort (US) were higher for all four combinations; extended symptoms subset estimates reached 96% and 98% for the three- and seven-day analyses, respectively. On the contrary, the specificity decreased to 21% and 17%, although TPC values are lower for the replication data (US) cohort.

### Evaluation of data-inferred subsets

The Pareto-optimal combinations of symptoms generated by the MOEA are displayed in **Figure 2** (see supplementary **Tables e4** and **e5** for corresponding list of symptom combinations and related sensitivity, specificity, and TPC for three- and seven-day analyses, respectively). These generated symptom combinations achieved similar values of sensitivity and specificity for the discovery (UK) training, discovery (UK) validation, and replication (US) cohorts, thus confirming the validity of this methodology. Moreover, results were also confirmed for the two age groups.

![Figure 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/11/25/2020.11.23.20237313/F2.medium.gif)

[Figure 2.](http://medrxiv.org/content/early/2020/11/25/2020.11.23.20237313/F2)

Figure 2. Pareto of optimal subset generated by the multi-objective evolutionary algorithm for three- and seven-day analyses
Each point represents a subset of symptoms characterised by a different trade-off between sensitivity and specificity.

**Figure 3** displays three symptom combinations generated by the MOEA for both three- and seven-day analyses; namely, the one with highest sensitivity, the one with a sensitivity of ∼90%, and the one characterised by a specificity of 50%, which is of interest from a clinical standpoint. Fatigue, anosmia, cough, diarrhoea, headache, and sore throat constituted the set of symptoms with the highest sensitivity in both the three- and seven-day analyses. Anosmia/ageusia were included in all three symptom combinations at both time points, fatigue was included in all symptom combinations for the three-day analyses, and cough for the seven-day analyses (**Figure 3**). Headache was slightly more important when symptoms were recorded within three days of onset. Diarrhoea as an individual symptom was not predictive of a positive COVID-19 RT-PCR result but became predictive when associated with other symptoms.

![Figure 3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/11/25/2020.11.23.20237313/F3.medium.gif)

[Figure 3.](http://medrxiv.org/content/early/2020/11/25/2020.11.23.20237313/F3)

Figure 3. Combination of symptoms with highest sensitivity, sensitivity ∼ 90%, and specificity ∼50%

**Figure 4** displays the frequency of symptoms selected in symptom combinations with a sensitivity ≥90%. Fatigue, cough, and anosmia were present in most subsets with high specificity. Delirium, skipped meals, abdominal pain, and chest pain were never selected in three-day analyses and very rarely selected in seven-day analyses. Diarrhoea was selected ∼60% of the time for the three-day analyses. There were nine and 21 combinations with a sensitivity ≥90% for the three- and seven-day analyses, respectively. Symptom combinations with a high sensitivity tended to include most of the extended symptoms, although headache was more likely to be selected in the three-day scenario and fever during the seven-day scenario.

![Figure 4.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/11/25/2020.11.23.20237313/F4.medium.gif)

[Figure 4.](http://medrxiv.org/content/early/2020/11/25/2020.11.23.20237313/F4)

Figure 4. Percentage of a symptom’s appearance in symptom combinations with sensitivity ≥ 90 %

The sensitivity, specificity, and TPC for three data-inferred subsets (i.e. highest sensitivity, sensitivity ∼90%, and specificity ∼50%) compared to the four clinically-inferred subsets of symptoms reported within three and seven days of symptom onset are displayed in **Table 3**. In this table, to compare data-inferred and clinically-inferred subsets, we showed also the latter subsets re-evaluated on the held-out validation dataset. Fatigue, anosmia, cough, diarrhoea, headache, and sore throat identified 95% and 97% of RT-PCR positive COVID-19 cases and required 82 and 92 TPC in the three- and seven-day analyses, respectively. The inclusion of the subset with the highest sensitivity increased the proportion of COVID-19 cases correctly identified to 96% and 99% for the three- and seven-day analyses, respectively. The sensitivity results were similar for the replication (US) data cohort and by age. However, the number of tests needed for those aged ≥55 years increased by 30% for both the three-day and seven-day analyses.

View this table:
[Table 3.](http://medrxiv.org/content/early/2020/11/25/2020.11.23.20237313/T3)

Table 3. Sensitivity, specificity, and Tests Per Case (TPC) for the clinically- and data-inferred combinations of symptoms, computed on the held-out validation dataset

## Discussion

We present data from, what is to our knowledge, the largest community-based COVID-19 symptom cohort study and aimed to quantify the contribution of various symptoms and combination of symptoms associated with COVID-19 to RT-PCR positive case-finding. COVID-19 symptoms and RT-PCR tests were collected prospectively which allowed us to select newly-symptomatic individuals and simulate a clinical trial situation in which RT-PCR tests are typically conducted within three days after symptom onset. Our results have implications not only for vaccine efficacy trials but also for public health in detection of symptomatic SARS CoV2 infection. We confirm the significance of symptoms (fever, cough, anosmia/ageusia) widely considered important for triggering a RT-PCR test and extend this to include additional symptoms (fatigue, sore throat headache, diarrhoea). The applied methodology enables the selection of symptom subsets to maximise the capture of cases but not overwhelm laboratory capacity. Our findings may help to optimise the choice of triggering symptoms for diagnostic work-up in COVID-19 vaccine efficacy trials.

It is important in an efficacy trial to capture all COVID-19 cases with pulmonary involvement as prevention of pneumonia and severe COVID-19 would be an important outcome for COVID-19 vaccines. Therefore, the signs and symptoms that characterise WHO-defined COVID-19 pneumonia (fever, cough, dyspnoea, tachypnoea) should always trigger diagnostic work-up in a trial participant. Our findings support the inclusion of these symptoms as well as anosmia and ageusia, which had the highest precision of all reported COVID-19 symptoms.17,18 However, these COVID-specific symptoms (fever, cough, dyspnoea, anosmia/ageusia) correctly identified only 69% of COVID-19 cases in this study when RT-PCR was conducted within three days of symptom onset. This has important implications in terms of cases missed as the COVID-specific symptoms align with the current PHE definition of a possible COVID-19 case.19 We found that the addition of headache and fatigue (i.e. extended symptoms) increased the proportion of COVID-19 cases correctly identified to 92% but also almost doubled the TPC (from 47 to 85). Thus, an increase in sensitivity comes at a cost in the context of vaccine efficacy trials.

Application of MEOA identified fatigue, anosmia, cough, diarrhoea, headache, and sore throat as the symptom set with the highest sensitivity in three- and seven-day analyses. This symptom set identified 96% and 99% of RT-PCR positive cases and required 96 and 92 TPC in the three- and seven-day analyses, respectively. Diarrhoea and sore throat were identified as symptoms that may increase case finding in an efficient way, in addition to those symptoms already considered important for triggering a RT-PCR test. The outcome of this work is that in a vaccine trial with limits on testing capacity, the optimum symptom set which could be applied to identify and confirm COVID-19 cases includes fever, cough, dyspnoea, ageusia/anosmia, fatigue, headache, diarrhoea, and sore throat. This finding may prove useful for COVID-19 vaccine developers when deciding which symptoms should trigger testing to optimise financial and logistical resource utilisation. Importantly, all the symptoms that constitute the subset with the highest sensitivity have been included as triggering symptoms in publicly available clinical trial protocols of ongoing vaccine efficacy trials.9-13

Few studies have been published that assess COVID-19 symptoms in community-based cohorts. A UK household survey showed the presence of fever, cough, anosmia, and ageusia in only 60% of symptomatic, RT-PCR positive individuals.8 However, in this study the symptoms were assessed only on the day of testing. As trial protocols require symptoms to be present for 24-48 hours prior to testing and our study aimed to simulate case finding in a trial situation, we assessed the presence of symptoms at three days from onset with a seven-day safety net in case of delayed testing following symptom onset.

Menni et al. presented results using data generated from this COVID-19 Symptom Study app; however, the aim was different and only data from March to April 2020 were included.17 We extend these data to September 2020 and importantly consider the results from the perspective of a potential COVID-19 vaccine developer. Menni et al. suggest anosmia, fatigue, persistent cough, and loss of appetite might together identify individuals with COVID-19.17 A separate COVID-19 symptom app from Germany suggests nausea and vomiting have a stronger predictive value for COVID-19 infection than symptoms such as sore throat or persistent cough.18 Thus, both studies identify gastrointestinal symptoms as important in identifying cases of COVID-19. Our study reports similar findings with diarrhoea found to be important to case finding. More recently, in another community-based observational study, sensitivity, specificity, PPV, and negative predictive value were reported for retrospectively-collected symptoms and symptom combinations that occurred during the 14-day period prior to screening for SARS-CoV-2 infection in a US seroprevalence study.20 The two symptom clusters most associated with SARS-CoV-2 infection were: 1) ageusia, anosmia, and fever, and 2) shortness of breath, cough, and chest pain. In our study, dyspnoea was rarely selected and chest pain never selected as part of an efficient symptom combination likely due to dyspnoea often occurring later in the disease course.21 The sensitivity of dyspnoea increased in the seven-day compared to three-day analyses. However, the importance of dyspnoea as a symptom of pulmonary involvement makes it a critical triggering symptom in vaccine efficacy trials. Tachypnoea, which is included in the WHO-defined definition for pneumonia, was not captured as a symptom in the app per se; however, it likely co-occurs with dyspnoea. Headache was more likely to be selected in the three-day scenario and fever during the seven-day scenario again, reflecting different timings of symptoms in the disease course.

The sensitivity of symptoms and various clinically-inferred combinations were similar for the age groups (18-54 and ≥55 years); however, the TPC was higher in the ≥55 years age group. This suggests self-reporting may work better for younger than older individuals.

The sensitivity, specificity, and TPC computed on the replication (US) data cohort were higher than for the UK cohort possibly due to different testing practices and public health measures adopted in each country. The data presented here are primarily from the UK, while a replication cohort from the US was analysed for external validation. It will be important for these findings to be validated in low- and middle-income country (LMIC) settings as COVID-19 vaccine efficacy trials are likely to be conducted in high income countries as well as LMICs. Vaccine developers should take into account regional considerations such as background incidence of co-infection and other trial-related aspects when interpreting these results.

This study has many strengths, including the large sample size and cost-effectiveness of the data source. Also, our study is community-based and adds important data as most studies that have assessed symptoms in COVID-19 have involved hospital-based populations. Some limitations, however, also need consideration. First, the results are based on data self-reported through a mobile app and therefore biased towards people with smartphone access. However, the app included a feature to enable reporting on behalf of someone else given their consent. Second, reported test results were not externally verified, however, antigen tests were not available during the study period, thus minimising risk of participant confusion regarding precise swab tests. As the precise PCR used was not recorded and likely varied between participants, false positive rates were unknown and results taken at face value. A further limitation is that app users may not be representative of the wider population. Finally, these data were generated in the spring and summer months when the incidence of concurrent respiratory infections (e.g. influenza) is low. The latter may have implications for trials conducted in winter.

In summary, we confirm the significance of symptoms widely recommended for triggering RT-PCR and identified additional symptoms to enable efficient trade-off between the number of positive cases detected and tests needed. Our findings may help optimise the choice of triggering symptoms for diagnostic work-up in COVID-19 vaccine efficacy trials and also have wider public health implications.

## Data Availability

Data collected in the COVID-19 Symptom Study smartphone application are being shared with other health researchers through the UK National Health Service-funded Health Data Research UK (HDRUK) and Secure Anonymised Information Linkage consortium housed in the UK Secure Research Platform (Swansea, UK). Anonymised data are available to be shared with HDRUK researchers according to their protocols in the public interest (https://web.www.healthdatagateway.org/dataset/fddcb382-3051-4394-8436-b92295f14259). US investigators are encouraged to coordinate data requests through the Coronavirus Pandemic Epidemiology Consortium (https://www.monganinstitute.org/cope-consortium).

[https://web.www.healthdatagateway.org/dataset/fddcb382-3051-4394-8436-b92295f14259](https://web.www.healthdatagateway.org/dataset/fddcb382-3051-4394-8436-b92295f14259) 

## Ethics

Ethics has been approved by KCL Ethics Committee REMAS ID 18210, review reference LRS-19/20-18210 and all participants provided consent.

## Author Contributions

MA, JC, AC, CS, AEL contributed to study concept and design. LSC, MSG, KK, MM, EM, BM, CHS, RD, AM, LHN, AJ, ATC contributed to acquisition of data. MA and JC contributed to data analysis. MA, JG contributed to initial drafting of the manuscript. All authors contributed to interpretation of data and critical revision of the manuscript. JPC, TS, JW, SO contributed to study supervision.

## Declaration of interests

JW, RD, JCP, AM are employees of Zoe Global Ltd. ATC reports grants from Massachusetts Consortium on Pathogen Readiness, during the conduct of the study; personal fees from Pfizer Inc., grants and personal fees from Bayer Pharma; CEPI (authors AC, JG, JPC, AEL) funds clinical trials of COVID-19 vaccines. All other authors declare no competing interests.

## Data sharing statement

Data collected in the COVID-19 Symptom Study smartphone application are being shared with other health researchers through the UK National Health Service-funded Health Data Research UK (HDRUK) and Secure Anonymised Information Linkage consortium, housed in the UK Secure Research Platform (Swansea, UK). Anonymised data are available to be shared with HDRUK researchers according to their protocols in the public interest ([https://web.www.healthdatagateway.org/dataset/fddcb382-3051-4394-8436-b92295f14259](https://web.www.healthdatagateway.org/dataset/fddcb382-3051-4394-8436-b92295f14259)). US investigators are encouraged to coordinate data requests through the Coronavirus Pandemic Epidemiology Consortium ([https://www.monganinstitute.org/cope-consortium](https://www.monganinstitute.org/cope-consortium)).

## Supplementary tables

View this table:
[Supplementary Table e1](http://medrxiv.org/content/early/2020/11/25/2020.11.23.20237313/T4)

Supplementary Table e1 List of self-reported symptoms and corresponding question used in the reporting app

View this table:
[Supplementary Table e2](http://medrxiv.org/content/early/2020/11/25/2020.11.23.20237313/T5)

Supplementary Table e2 NSGAII parameters

View this table:
[Supplementary Table e3](http://medrxiv.org/content/early/2020/11/25/2020.11.23.20237313/T6)

Supplementary Table e3 Sensitivity, specificity, and Tests Per Case (TPC) for the four clinically-inferred subsets of symptoms computed on the discovery data (UK) cohort

View this table:
[Supplementary Table e4](http://medrxiv.org/content/early/2020/11/25/2020.11.23.20237313/T7)

Supplementary Table e4 Pareto of optimal combination of symptoms for three-day analysis computed on the on the discovery (UK) training data, ordered by decreasing sensitivity (TPC: Tests per Case)

View this table:
[Supplementary Table e5](http://medrxiv.org/content/early/2020/11/25/2020.11.23.20237313/T8)

Supplementary Table e5 Pareto of optimal combination of symptoms for seven-day analysis computed on the discovery (UK) training data, ordered by decreasing sensitivity (TPC: Tests per Case)

## Supplementary figures

![Supplementary Figure e1](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/11/25/2020.11.23.20237313/F5.medium.gif)

[Supplementary Figure e1](http://medrxiv.org/content/early/2020/11/25/2020.11.23.20237313/F5)

Supplementary Figure e1 Flow diagram of user selection for the discovery data cohort

![Supplementary Figure e2](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/11/25/2020.11.23.20237313/F6.medium.gif)

[Supplementary Figure e2](http://medrxiv.org/content/early/2020/11/25/2020.11.23.20237313/F6)

Supplementary Figure e2 Flow diagram of user selection for the replication data cohort

## Acknowledgements

Zoe provided in kind support for all aspects of building, running and supporting the app and service to all users worldwide. CEPI provided funding for the analysis of the data. Support for this study was provided by the NIHR-funded Biomedical Research Centre based at GSTT NHS Foundation Trust. Investigators also received support from the Wellcome Trust, the MRC/BHF, Alzheimer’s Society, EU, NIHR, CDRF, and the NIHR-funded BioResource, Clinical Research Facility and BRC based at GSTT NHS Foundation Trust in partnership with KCL, the UK Research and Innovation London Medical Imaging & Artificial Intelligence Centre for Value Based Healthcare, the Wellcome Flagship Programme (WT213038/Z/18/Z), the Chronic Disease Research Foundation, and DHSC.

DAD is supported by the National Institute of Diabetes and Digestive and Kidney Diseases K01DK120742 and by the American Gastroenterological Association AGA-Takeda COVID-19 Rapid Response Research Award (AGA2021-5102). ATC was supported in this work through a Stuart and Suzanne Steele MGH Research Scholar Award. The Massachusetts Consortium on Pathogen Readiness (MassCPR) and Mark and Lisa Schwartz supported MGH investigators (LHN, DAD, ADJ, ATC).

*   Received November 23, 2020.
*   Revision received November 25, 2020.
*   Accepted November 25, 2020.


*   © 2020, Posted by Cold Spring Harbor Laboratory

This pre-print is available under a Creative Commons License (Attribution-NoDerivs 4.0 International), CC BY-ND 4.0, as described at [http://creativecommons.org/licenses/by-nd/4.0/](http://creativecommons.org/licenses/by-nd/4.0/)

## References

1.  1.Hodgson SH, Mansatta K, Mallett G, Harris V, Emary KRW, Pollard AJ. What defines an efficacious COVID-19 vaccine? A review of the challenges assessing the clinical efficacy of vaccines against SARS-CoV-2. Lancet Infect Dis 2020; S1473–3099(20):30773–8.
    
    

2.  2.Corey L, Mascola JR, Fauci AS, Collins FS. A strategic approach to COVID-19 vaccine R&D. Science 2020; 368(6494):948–50.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEyOiIzNjgvNjQ5NC85NDgiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMC8xMS8yNS8yMDIwLjExLjIzLjIwMjM3MzEzLmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 

3.  3.Pormohammad A, Ghorbani S, Khatami A, Razizadeh MH, Alborzi A, Zarei M, et al. Comparison of influenza type A and B with COVID-19: A global systematic review and meta-analysis on clinical, laboratory and radiographic findings. Rev Med Virol. 2020:e2179.
    
    

4.  4.Wiersinga JW, Rhodes A, Cheng AC, et al. Pathophysiology, transmission, diagnosis, and treatment of coronavirus disease 2019 (COVID-19): a review. JAMA. 2020;324(8):782–93.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1001/jama.2020.12839.e4&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F11%2F25%2F2020.11.23.20237313.atom) 

5.  5.Clinical Management of COVID-19 (WHO interim guidance): <[https://www.who.int/publications/i/item/clinical-management-of-covid-19](https://www.who.int/publications/i/item/clinical-management-of-covid-19)> (Accessed November 22, 2020).
    
    

6.  6.Agyeman AA, Chin KL, Landersdorfer CB. Smell and Taste Dysfunction in Patients With COVID-19: A systematic review and meta-analysis. Mayo Clin Proc. 2020;95(8):1621–31.
    
    

7.  7.Haehner A, Draf J, Drager S, de With K, Hummel T. Predictive value of sudden olfactory loss in the diagnosis of COVID-19. ORL J Otorhinolaryngol Relat Spec 2020;82(4):175–80.
    
    

8.  8.Petersen I, Phillips A. Three Quarters of People with SARS-CoV-2 Infection are Asymptomatic: Analysis of English Household Survey Data. Clin Epidemiol. 2020;12:1039–43.
    
    

9.  9.A Phase 3, Randomised, Stratified, Observer-Blind, Placebo-Controlled Study to Evaluate the Efficacy, Safety, and Immunogenicity of mRNA-1273 SARS-CoV-2 Vaccine in Adults Aged 18 Years and Older. <[https://www.modernatx.com/sites/default/files/mRNA-1273-P301-Protocol.pdf](https://www.modernatx.com/sites/default/files/mRNA-1273-P301-Protocol.pdf)>
    
    

10. 10.A phase 1/2/3, placebo-controlled, randomised, observer-blind, dose-finding study to evaluate the safety, tolerability, immunogenicity, and efficacy of SARS-CoV-2 RNA vaccine candidates against COVID-19 in healthy. individuals [https://pfe-pfizercom-d8-prod.s3.amazonaws.com/2020-09/C4591001\_Clinical\_Protocol\_0.pdf](https://pfe-pfizercom-d8-prod.s3.amazonaws.com/2020-09/C4591001_Clinical_Protocol_0.pdf)
    
    

11. 11.A Randomised, Double-blind, Placebo-controlled Phase 3 Study to Assess the Efficacy and Safety of Ad26.COV2.S for the Prevention of SARS-CoV-2-mediated COVID-19 in Adults Aged 18 Years and Older. <[https://www.jnj.com/coronavirus/covid-19-phase-3-study-clinical-protocol](https://www.jnj.com/coronavirus/covid-19-phase-3-study-clinical-protocol)>
    
    

12. 12.A Phase III Randomized, Double-blind, Placebo-controlled Multicenter Study in Adults to Determine the Safety, Efficacy, and Immunogenicity of AZD1222, a Non-replicating ChAdOx1 Vector Vaccine, for the Prevention of COVID-19. [https://s3.amazonaws.com/ctr-med-7111/D8110C00001/52bec400-80f6-4c1b-8791-0483923d0867/c8070a4e-6a9d-46f9-8c32-cece903592b9/D8110C00001\_CSP-v2.pdf](https://s3.amazonaws.com/ctr-med-7111/D8110C00001/52bec400-80f6-4c1b-8791-0483923d0867/c8070a4e-6a9d-46f9-8c32-cece903592b9/D8110C00001_CSP-v2.pdf)
    
    

13. 13.A Phase 3, Randomised, Observer-blinded, Placebo-Controlled Trial to evaluate the Efficacy and Safety of a SARS-CoV-2 Recombinant Spike Protein Nanoparticle Vaccine (SARS-CoV-RS) with Matrix-M1 Adjuvant in Adult participants 18-84 years of Age in the United Kingdom. <[https://www.novavax.com/download/files/protocols/2019nCoV302Phase3UKVersi>on2FinalCleanRedacted.pdf](https://www.novavax.com/download/files/protocols/2019nCoV302Phase3UKVersi>on2FinalCleanRedacted.pdf)
    
    

14. 14.Drew DA, Nguyen LH, Steves CJ, et al. Rapid implementation of mobile technology for real-time epidemiology of COVID-19. Science 2020;368(6497):1362–67.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEzOiIzNjgvNjQ5Ny8xMzYyIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMTEvMjUvMjAyMC4xMS4yMy4yMDIzNzMxMy5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

15. 15.Sudre CH, Lee K, Lochlainn MN. Symptom clusters in Covid19: A potential clinical prediction tool from the COVID Symptom study app. medRxiv Jun 16; 2020.06.12.20129056. doi: 10.1101/2020.06.12.20129056. Preprint
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoibWVkcnhpdiI7czo1OiJyZXNpZCI7czoyMToiMjAyMC4wNi4xMi4yMDEyOTA1NnYxIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMTEvMjUvMjAyMC4xMS4yMy4yMDIzNzMxMy5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

16. 16.Deb K, Pratap A, Agarwal S, Meyarivan T. A fast and elitist multi-objective genetic algorithm: NSGA-II IEEE. Transactions on Evolutionary Computation 2002;6:182–97.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1109/4235.996017&link_type=DOI) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000175082800006&link_type=ISI) 

17. 17.Menni C, Valdes AM, Freidin MB, et al. Real-time tracking of self-reported symptoms to predict potential COVID-19. Nat Med. 2020;26(7):1037–40.
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F11%2F25%2F2020.11.23.20237313.atom) 

18. 18.Zens M, Brammertz A, Herpich J, et al. App-based tracking of self-reported COVID-19 symptoms: analysis of questionnaire data. J Med Internet Res. 2020;22(9):e21956.
    
    

19. 19.Public Health England. <[https://www.gov.uk/government/publications/wuhan->novel-coronavirus-initial-investigation-of-possible-cases/investigation-and-initial-clinical-management-of-possible-cases-of-wuhan-novel-coronavirus-wn-cov-infection#criteria](https://www.gov.uk/government/publications/wuhan->novel-coronavirus-initial-investigation-of-possible-cases/investigation-and-initial-clinical-management-of-possible-cases-of-wuhan-novel-coronavirus-wn-cov-infection#criteria) (Accessed November 10th 2020).
    
    

20. 20.Dixon BE, Wools-Kaloustian K, Fadel WF, et al. Symptoms and symptom clusters associated with SARS-CoV-2 infection in community-based populations: Results from a statewide epidemiological study. medRxiv Oct 22;2020.10.11.20210922. doi: 10.1101/2020.10.11.20210922. Preprint
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoibWVkcnhpdiI7czo1OiJyZXNpZCI7czoyMToiMjAyMC4xMC4xMS4yMDIxMDkyMnYyIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMTEvMjUvMjAyMC4xMS4yMy4yMDIzNzMxMy5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

21. 21.Tang D, Comish P, Kang R. The hallmarks of COVID-19 disease. PLoS Pathog 2020;16(5):e1008536.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.ppat.1008536&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F11%2F25%2F2020.11.23.20237313.atom)