Identification of optimal symptom combinations to trigger diagnostic work-up of suspected COVID-19 cases: analysis from a community-based, prospective, observational cohort
============================================================================================================================================================================

* M Antonelli
* J Capdevila
* A Chaudhari
* J Granerod
* LS Canas
* MS Graham
* K Klaser
* M Modat
* E Molteni
* B Murray
* C Sudre
* R Davies
* A May
* LH Nguyen
* DA Drew
* A Joshi
* AT Chan
* JP Cramer
* T Spector
* J Wolf
* S Ourselin
* C Steves
* AE Loeliger

## Abstract

**Background** Diagnostic work-up of participants following any COVID-19 associated symptom for example, in vaccine efficacy trials will lead to extensive testing, potentially overwhelming laboratory capacity whilst primarily yielding negative results. We aimed to identify an efficient symptom combination to capture most cases using the lowest possible number of tests.

**Methods** UK and US users of the COVID-19 Symptom Study app who reported new-onset symptoms between March-September 2020 and an RT-PCR test within seven days of symptom onset were included. Sensitivity, specificity, and number of RT-PCR tests needed to identify one RT-PCR positive case were calculated for individual symptoms and symptom combinations. A multi-objective evolutionary algorithm was applied to generate symptom combinations with good trade-offs between sensitivity and specificity.

**Findings** The UK cohort included 122,305 individuals (1,202 RT-PCR positive). Findings were replicated in the US cohort which included 3,162 individuals (79 RT-PCR positive). Within three days of symptom onset, the COVID-19 specific symptom combination (cough, dyspnoea, fever, anosmia/ageusia) identified 69% of cases requiring 47 RT-PCR tests per case (TPC). The symptom combination with highest sensitivity at three days was fatigue, anosmia, cough, diarrhoea, headache, sore throat, identifying 96% of cases and requiring 96 TPC.

**Interpretation** We confirm the significance of COVID-19 specific symptoms widely recommended for triggering RT-PCR. By using the data-driven optimization technique we identified additional symptoms (fatigue, sore throat, headache, diarrhoea) that enabled many more positive cases to be captured efficiently. By providing a set of solutions with optimal trade-offs between sensitivity and specificity, we produced a selection of symptom combinations that maximise the capture of cases given different laboratory capacities. This may be of use for COVID-19 vaccine developers across a range of resource settings and have more far-reaching public health implications for detection of symptomatic SARS CoV2 infection.

**Funding** Zoe Global Limited, Department of Health, Wellcome Trust, Engineering and Physical Sciences Research Council (EPSRC), National Institute for Health Research (NIHR), Medical Research Council (MRC), Alzheimer’s Society, Massachusetts Consortium for Pathogen Readiness (MassCPR), Coalition for Epidemic Preparedness Innovations (CEPI)

**Evidence before this study** We searched PubMed up to November 16, 2020, with the terms “COVID-19” OR “SARS-CoV-2” AND “symptom” AND “community-based”, with no date or language restrictions, to find information about symptoms associated with COVID-19 from the community setting. The search retrieved 68 articles; however, most were not relevant as related to specific subgroups (e.g. pregnant women, cancer patients) or aspects (e.g. mental health, diagnostic testing). Fever, cough, dyspnoea, tachypnoea, anosmia, and ageusia are the symptoms most commonly identified in COVID-19 patients and typically included in guidelines from the WHO and similar bodies. These data however come primarily from hospital-based studies. An assessment of the value of symptom combinations for predicting COVID-19 in the community is lacking.

**Added value of this study** We present data from the largest, prospective community-based cohort study to date and quantify the contribution of COVID-19 symptoms and symptom combinations to COVID-19 case-finding. Our study is unique in that it simulates PCR testing in a clinical trial. Using RT-PCR as the gold standard for detecting COVID-19, we assessed the sensitivity and specificity of symptoms occurring within three days of symptom onset. An analysis of symptoms occurring within seven days of symptom onset aimed to capture delayed symptom triggers. We confirm the significance of fever, cough, and anosmia/ageusia, widely used to trigger RT-PCR testing and identified fatigue, headache, sore throat, and diarrhoea as additional symptoms for efficient COVID-19 case finding.

**Implications of all the available evidence** The applied methodology enables the selection of symptom combinations to maximise the capture of cases while taking account of specific laboratory capacity. Our findings not only have important implications for COVID-19 vaccine developers to optimise the choice of triggering symptoms for diagnostic work-up COVID-19 vaccine efficacy trials, but also have wider public health implications for early detection of symptomatic SARS-CoV-2 infection.

Keywords
*   COVID-19
*   optimal symptom combinations
*   community-based cohort
*   vaccine trials
*   SARS-CoV-2

## Introduction

Safe and effective vaccines represent the most promising intervention to prevent morbidity and mortality during the coronavirus disease (COVID)-19 pandemic.1,2 Positive results have recently emerged from three ongoing vaccine efficacy trials of COVID-19 vaccines.3,5 However, further vaccines are required to meet global demand, and vaccines currently in early development may result in better tolerability profiles, scalability, impact on viral shedding, and may be suitable to specific population subgroups. Thus, further important COVID-19 vaccine efficacy trials are predicted to start soon. In a clinical trial, diagnostic testing of suspected cases (e.g., reverse transcription polymerase chain reaction [RT-PCR] for severe acute respiratory syndrome coronavirus 2 [SARS-CoV-2]) could be triggered by the presence of any COVID-19 associated symptom. A household survey in the United Kingdom (UK) showed that fever, cough, anosmia, and ageusia were present on the day of testing in only 60% of symptomatic, RT-PCR positive individuals, implying that other less specific signs/symptoms associated with COVID-19 occur in a substantial number of patients.6 The signs and symptoms associated with COVID-19 are extensive and overlap with those of other common viral infections.7,8 Thus, it is possible that diagnostic work-up following any COVID-19 associated symptom would lead to indiscriminate testing and overwhelm laboratory capacity whilst primarily yielding negative results.

Identification of an efficient symptom combination to trigger diagnostic work-up that will capture the majority of COVID-19 cases using the lowest possible number of tests would enable optimum use of laboratory and financial resources in future vaccine efficacy trials. Such data are scant, and triggering symptoms vary between publicly available vaccine efficacy trial protocols.9-14 Identification of an efficient combination of symptoms may be useful for vaccine developers in resource- or capacity-constrained settings. This would also be of wider benefit in public health settings for the early detection of symptomatic SARS-CoV-2 infection.

We aimed to simulate COVID-19 case finding in a trial population using a community-based, prospective, observational cohort study. Data from UK COVID Symptom Study app15 users were used to quantify how much individual COVID-19 symptoms contribute to COVID-19 case finding and to compute the sensitivity and specificity of specific symptom combinations if used to trigger a RT-PCR test. The findings were replicated in a dataset of COVID Symptom Study app users in the United States (US).

## Methods

### Study design and data source

A community-based cohort study was carried out using data from the COVID Symptom Study app, a free smartphone app developed by Zoe Global (London, UK) in collaboration with King’s College London (London, UK) and Massachusetts General Hospital (Boston, MA, USA).15 The app was launched on March 24th and March 29th, 2020 in the UK and US, respectively. Users report baseline demographic information, data on comorbidities and COVID-19 testing results, and are encouraged to self-report a set of pre-specified symptoms on a daily basis to enable collection of longitudinal information on incident symptoms. This study was approved by the Partners Human Research Committee (Protocol 2020P000909) and King’s College London ethics committee (REMAS ID 18210, LRS-19/20-18210).

Data used in this study is available to bona fide researchers through UK Health Data Research ([https://web.www.healthdatagateway.org/dataset/fddcb382-3051-4394-8436-b92295f14259](https://web.www.healthdatagateway.org/dataset/fddcb382-3051-4394-8436-b92295f14259)).

### Study population

Individuals were included in the study if they met the following criteria: 1) aged ≥18 years, 2) reported developing any symptom between March 24th and September 15th, 2020, and 3) entered a valid RT-PCR test result within the first seven days of symptom onset. App users who recorded a history of COVID-19 were excluded. Data were frozen and extracted on October 21st 2020. Users could update RT-PCR results retrospectively (i.e. from ‘waiting for result’ to ‘result was negative’); however, only updates implemented before the extraction date were considered. UK participants served as a discovery cohort, and US participants served as a replication cohort to confirm the generalisability of the results. Both cohorts were stratified by age (18-54 and ≥55 years) to align with age strata in ongoing COVID-19 vaccine efficacy trials.

### Data analyses

Symptoms recorded within three and seven days of symptom onset were included in the analyses (see supplementary Table e1 for complete list of symptoms and corresponding questions participants were asked). Analysis of symptoms within the first three days is key to enable testing for SARS-CoV-2 soon after symptom onset while viral load is highest. An additional buffer for inclusion of symptoms within seven days was also used, which may be important to detect development of lower respiratory tract signs indicative of pneumonia. Anosmia and ageusia were considered one symptom in the reporting app. Tachypnoea was not captured as it is a sign measured by a healthcare professional rather than a self-reported symptom, however it may be captured in part by the symptom dyspnoea.

Participants were classified as symptom-screening positive when they recorded at least one of the symptoms in the symptom combination concerned. This was compared with self-reported RT-PCR results considered the gold standard for COVID-19 case detection. If multiple positive RT-PCR test results were recorded for an individual, only the first was included for the purpose of the analyses.

A COVID-19 case was defined as a newly symptomatic individual with a first ever positive SARS-CoV-2 RT-PCR test result taken as ground truth. For different symptoms or combinations of symptoms, three evaluation parameters were considered taking disease status to be a positive RT-PCR test: 1) sensitivity, computed as the percentage of COVID-19 positive individuals correctly identified, 2) specificity, calculated as the percentage of individuals correctly classified as COVID-19 negative, and 3) the reciprocal of precision, that is the number of RT-PCR tests needed to identify one RT-PCR positive COVID-19 case (i.e. Tests Per Case [TPC]).

### Evaluation of individual symptoms and clinically-inferred symptom combinations

Sensitivity, specificity, and TPC were evaluated for each individual symptom, and for the following four symptom combinations derived a priori from clinical experience and guidance (i.e. clinically-inferred symptom combinations): 1) respiratory symptoms (cough, dyspnoea), 2) WHO-defined pneumonia symptoms (cough, dyspnoea, fever), 3) COVID-19 specific symptoms as defined by Public Health England (PHE; fever, cough, dyspnoea, anosmia/ageusia), and 4) extended symptoms (fever, cough, dyspnoea, anosmia/ageusia, fatigue, headache). This latter category was added post-hoc after exploration of the app data indicated high sensitivity of headache and fatigue in other contexts.16

### Evaluation of data-inferred symptom combinations

An optimisation technique was subsequently used to generate optimal symptom combinations from the data. Optimisation problems with multiple objectives have a set of optimal solutions (known as Pareto-optimal solutions) rather than one single optimal solution. No Pareto-optimal solution is better than the other without further information on the specific objective to be addressed. As sensitivity and specificity represent conflicting objectives, a multi-objective evolutionary algorithm (MOEA) was applied to generate efficient symptom combinations each characterised by a good trade-off between specificity and sensitivity. The python package pymoo v0.4.2.1 was used for MOEA optimisation. More specifically, we employed the well-known NSGAII17, (see supplementary Table e2 for parameter information).

To generate the Pareto of optimal combinations of symptoms (referred to as data-inferred symptom combinations hereafter), the UK-discovery cohort was randomly split into a training and validation dataset. The training dataset (601 COVID-19 positive, 60,552 negative cases) was used to train the MOEA and generate the Pareto of the optimal symptom combinations. The validation dataset (601 COVID-19 positive, 60,551 negative cases) was used to evaluate each optimal combination by computing the sensitivity, specificity, and TPC of each generated symptom combination. For the validation dataset, the sensitivity and specificity of the Pareto of optimal symptom combinations were also computed on the two age groups to show the generalisability of the generated optimal symptom combinations. All optimal symptom combinations were also validated on the US-replication cohort.

## Results

A total of 122,305 individuals were included in the UK-discovery cohort, of which 1,202 recorded a positive RT-PCR test result for COVID-19. In the US-replication cohort, 3,162 individuals were included, of which 79 recorded a positive result. The patient selection flow charts for both cohorts are displayed in supplementary Figures e1, and e2. The age and sex distribution were similar between RT-PCR positive and negative participants within the cohorts; however, slight differences between cohorts were observed (**Table 1**). There was a lower proportion of male participants in the US-replication cohort (17%) compared to the UK-discovery cohort (25%), and the mean age was slightly higher (54% compared to 48%) (**Table 1**).

View this table:
[Table 1.](http://medrxiv.org/content/early/2020/12/01/2020.11.23.20237313/T1)

Table 1. Demographics of study population

### Evaluation of individual symptoms and clinically-inferred symptom combinations

The sensitivity, specificity, and TPC for each individual symptom reported within three and seven days of symptom onset are displayed in **Table 2**. Using data from the UK-discovery cohort for all ages, the individual symptoms with the highest sensitivity in both three- and seven-day analyses were headache (66·8% and 75·6% for three- and seven-day analyses, respectively) and fatigue (64·9% and 77·8% for three- and seven-day analyses, respectively). Similar results were obtained with data from the US-replication cohort and when data were stratified by age. The sensitivity of anosmia in the UK-discovery cohort was only 21·8% within the first three days of symptom onset and 48·7% in the seven-day analyses. Anosmia, however, had the lowest TPC; when compared to headache, the TPC decreased from 76 to 20 and 70 to 10, for three- and seven-day analyses, respectively. These results are confirmed by **Figure 1**, which displays the frequency of the symptoms for the UK-discovery cohort for both COVID-19 positive and negative cases.

View this table:
[Table 2.](http://medrxiv.org/content/early/2020/12/01/2020.11.23.20237313/T2)

Table 2. 
**Sensitivity, specificity, and Tests Per Case (TPC) for each individual symptom computed on the discovery data (UK) cohort**

![Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/12/01/2020.11.23.20237313/F1.medium.gif)

[Figure 1.](http://medrxiv.org/content/early/2020/12/01/2020.11.23.20237313/F1)

Figure 1. Symptom frequency for COVID-19 negative (left) and COVID-19 positive (right) cases

The sensitivity, specificity, and TPC for the four clinically-inferred symptom combinations from the UK-discovery cohort reported within three and seven days of symptom onset are displayed in supplementary Table e3. In this cohort, 45% of individuals positive for COVID-19 reported cough or dyspnoea within the first three days of symptom onset. The addition of fever (i.e., WHO-defined pneumonia symptom combination) increased sensitivity to 59%, while the further addition of anosmia/ageusia (i.e., PHE COVID-19 specific symptom combination) increased sensitivity to 67%. The extended symptom combination (i.e., cough, dyspnoea, fever, anosmia/ageusia, headache, and fatigue) increased the proportion of COVID-19 cases identified to 91% but required twice the number of TPC compared to the respiratory symptom combination (43 versus 86). Similarly, within seven days of symptom onset, COVID-19 specific and extended symptoms were reported in 81% and 96% of RT-PCR positive cases, at the cost of 43 and 82 TPC, respectively. Similar results were obtained when data were stratified by age. The sensitivity estimates from the US-replication cohort were higher for all four combinations; extended symptom combination estimates reached 96% and 98% for the three- and seven-day analyses, respectively. On the contrary, the specificity decreased to 21% and 17%, although TPC values are lower for the US-replication cohort.

### Evaluation of data-inferred symptom combinations

The Pareto-optimal symptom combinations generated by the MOEA are displayed in **Figure 2** (see supplementary Tables e4 and e5 for corresponding list of symptom combinations and related sensitivity, specificity, and TPC for three- and seven-day analyses, respectively). These generated symptom combinations achieved similar values of sensitivity and specificity for the UK-training, UK-validation, and US-replication cohorts, thus confirming the validity of this methodology. Moreover, results were also confirmed for the two age groups.

![Figure 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/12/01/2020.11.23.20237313/F2.medium.gif)

[Figure 2.](http://medrxiv.org/content/early/2020/12/01/2020.11.23.20237313/F2)

Figure 2. Pareto of optimal subset generated by the multi-objective evolutionary algorithm for three- and seven-day analyses
Each point represents a subset of symptoms characterised by a different trade-off between sensitivity and specificity.

**Figure 3** displays three symptom combinations generated by the MOEA for both three- and seven-day analyses; namely, the one with highest sensitivity, the one with a sensitivity of ∼90%, and the one characterised by a specificity of 50%, which is of interest from a clinical standpoint. Fatigue, anosmia, cough, diarrhoea, headache, and sore throat constituted the symptom combination with the highest sensitivity in both the three- and seven-day analyses. Anosmia/ageusia were included in all three symptom combinations at both time points, fatigue was included in all symptom combinations for the three-day analyses, and cough for the seven-day analyses (**Figure 3**). Headache was slightly more important when symptoms were recorded within three days of onset. Diarrhoea as an individual symptom was not predictive of a positive COVID-19 RT-PCR result but became predictive when associated with other symptoms.

![Figure 3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/12/01/2020.11.23.20237313/F3.medium.gif)

[Figure 3.](http://medrxiv.org/content/early/2020/12/01/2020.11.23.20237313/F3)

Figure 3. Combination of symptoms with highest sensitivity, sensitivity ∼ 90%, and specificity ∼50%

**Figure 4** displays the frequency of symptoms selected in symptom combinations with a sensitivity ≥90%. Fatigue, cough, and anosmia were present in most symptom combinations with high specificity. Delirium, skipped meals, abdominal pain, and chest pain were never selected in three-day analyses and very rarely selected in seven-day analyses. Diarrhoea was selected ∼60% of the time for the three-day analyses. There were nine and 21 combinations with a sensitivity ≥90% for the three- and seven-day analyses, respectively. Symptom combinations with a high sensitivity tended to include most of the extended symptoms, although headache was more likely to be selected in the three-day scenario and fever during the seven-day scenario.

![Figure 4.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/12/01/2020.11.23.20237313/F4.medium.gif)

[Figure 4.](http://medrxiv.org/content/early/2020/12/01/2020.11.23.20237313/F4)

Figure 4. Percentage of a symptom’s appearance in symptom combinations with sensitivity ≥ 90 %

The sensitivity, specificity, and TPC for three data-inferred symptom combinations (i.e., highest sensitivity, sensitivity ∼90%, and specificity ∼50%) compared to the four clinically-inferred symptom combinations reported within three and seven days of symptom onset are displayed in Table 3. In this table, to compare data-inferred and clinically-inferred symptom combinations, we showed also the latter symptom combinations re-evaluated on the UK-validation dataset. The symptom combination with highest sensitivity (fatigue, anosmia, cough, diarrhoea, headache, and sore throat) identified 96% and 99% of RT-PCR positive COVID-19 cases and required 96 and 92 TPC in the three- and seven-day analyses, respectively. The sensitivity results were similar for the US-replication cohort and by age. However, the number of tests needed for those aged ≥55 years increased by 30% for both the three-day and seven-day analyses.

View this table:
[Table 3.](http://medrxiv.org/content/early/2020/12/01/2020.11.23.20237313/T3)

Table 3. 
**Sensitivity, specificity, and Tests Per Case (TPC) for the clinically- and data-inferred combinations of symptoms, computed on the held-out validation dataset**

## Discussion

We present data from, what is to our knowledge, the largest community-based COVID-19 symptom cohort study with the aim to quantify the contribution of various symptoms and symptom combinations associated with COVID-19 to RT-PCR positive case-finding. COVID-19 symptoms and RT-PCR test results were collected prospectively which allowed us to select newly symptomatic individuals and simulate a clinical trial situation in which RT-PCR tests are typically conducted within three days after symptom onset. We confirm the significance of symptoms (fever, cough, anosmia/ageusia) widely considered important for triggering a RT-PCR test and extend this to include additional symptoms (fatigue, sore throat headache, diarrhoea). The applied methodology enables the selection of symptom combinations to maximise the capture of cases but not overwhelm laboratory capacity. Our findings may help to optimise the choice of triggering symptoms for diagnostic work-up in COVID-19 vaccine efficacy trials, and also have wider public health implications for early detection of symptomatic SARS CoV2 infection.

In an efficacy trial, it is important to capture all COVID-19 cases with pulmonary involvement as signs and symptoms of pneumonia define moderate or severe COVID-19. Prevention of pneumonia and severe COVID-19 would be an important outcome for COVID-19 vaccines. Therefore, the signs and symptoms that characterise WHO-defined COVID-19 pneumonia (fever, cough, dyspnoea, tachypnoea) should always trigger diagnostic work-up in a trial participant.18 Additionally, anosmia and ageusia had the highest positive predictive value (PPV) of all reported COVID-19 symptoms.19,9. Our findings support the inclusion of these symptoms. However, these COVID-specific symptoms (fever, cough, dyspnoea, anosmia/ageusia) correctly identified only 69% of COVID-19 cases in this study when RT-PCR was conducted within three days of symptom onset. This has important implications in terms of cases missed as the COVID-specific symptoms align with the current PHE definition of a possible COVID-19 case.20 We found that the addition of headache and fatigue (i.e., extended symptoms) increased the proportion of COVID-19 cases correctly identified to 92% but also almost doubled the TPC (from 47 to 85). Thus, an increase in sensitivity comes at a cost in the context of vaccine efficacy trials.

Application of MEOA identified fatigue, anosmia, cough, diarrhoea, headache, and sore throat as the symptom set with the highest sensitivity in three- and seven-day analyses. Diarrhoea and sore throat were identified as symptoms that may increase case finding in an efficient way, in addition to those symptoms already considered important for triggering a RT-PCR test. In situations where there is a limited testing capacity, we provide a range of optimal symptom combinations that could be used, given different target numbers of tests per case identified (see Tables e4 and e5). This finding may prove useful for COVID-19 vaccine developers or in public health settings when deciding which symptoms should trigger testing to optimise financial and logistical resource utilisation. Importantly, all the symptoms that constitute the combination with the highest sensitivity have been included as triggering symptoms in publicly available clinical trial protocols of ongoing vaccine efficacy trials.9-14

Few studies have been published that assess COVID-19 symptoms in community-based cohorts. A UK household survey which reported fever, cough, anosmia, and ageusia in 60% of symptomatic, RT-PCR positive individuals only assessed symptoms on the day of testing.6 As trial protocols require symptoms to be present for 24-48 hours prior to testing and our study aimed to simulate case finding in a trial situation, we assessed the presence of symptoms within three days from onset with a seven-day safety net in case of delayed testing following symptom onset. Menni et al. presented results using data generated from this COVID-19 Symptom Study app; however, the aim was different and only data from March to April 2020 were included.21 We extend these data to September 2020 and importantly consider the results from the perspective of a potential COVID-19 vaccine developer. Menni et al. suggest anosmia, fatigue, persistent cough, and loss of appetite might together identify individuals with COVID-19.21 A separate COVID-19 symptom app from Germany suggests nausea and vomiting have a stronger predictive value for COVID-19 infection than symptoms such as sore throat or persistent cough.22 Thus, both studies identify gastrointestinal symptoms as important in identifying cases of COVID-19. Our study reports similar findings with diarrhoea found to be important to case finding. More recently, in another community-based observational study, sensitivity, specificity, PPV, and negative predictive value were reported for retrospectively-collected symptoms and symptom combinations that occurred during the 14-day period prior to screening for SARS-CoV-2 infection in a US seroprevalence study.23 The two symptom clusters most associated with SARS-CoV-2 infection were: 1) ageusia, anosmia, and fever, and 2) shortness of breath, cough, and chest pain. In our study, dyspnoea was rarely selected and chest pain never selected as part of an efficient symptom combination likely due to dyspnoea often occurring later in the disease course.24 The sensitivity of dyspnoea increased in the seven-day compared to three-day analyses. However, the importance of dyspnoea as a symptom of pulmonary involvement makes it a critical triggering symptom in vaccine efficacy trials. Tachypnoea, which is included in the WHO-defined definition for pneumonia, was not captured as a symptom in the app per se; however, it likely co-occurs with dyspnoea. Headache and diarrhoea were more likely to be selected in the three-day scenario and fever during the seven-day scenario again, reflecting different timings of symptoms in the disease course.

The sensitivity of symptoms and various clinically-inferred symptom combinations were similar for the age groups (18-54 and ≥55 years); however, the TPC was higher in the ≥55 years age group. This suggests self-reporting may work better for younger than older individuals.

The sensitivity, specificity, and TPC computed on the US-replication cohort were higher than for the UK-discovery cohort possibly due to different testing practices and public health measures adopted in each country. It will be important for these findings to be validated in low- and middle-income country (LMIC) settings as COVID-19 vaccine efficacy trials are likely to be conducted in high income countries as well as LMICs. Vaccine developers should take into account regional considerations such as background incidence of co-infection and other trial-related aspects when interpreting these results.

This study has many strengths, including the large sample size and cost-effectiveness of the data source. Also, our study is community-based and adds important data as most studies that have assessed symptoms in COVID-19 have involved hospital-based populations. Some limitations, however, also need consideration. First, the results are based on data self-reported through a mobile app and therefore biased towards people with smartphone access. However, the app included a feature to enable reporting on behalf of someone else given their consent. Second, reported test results were not externally verified, however, antigen tests were not available during the study period, thus minimising risk of participant confusion regarding precise swab tests. As the precise PCR used was not recorded and likely varied between participants, false positive rates were unknown and results taken at face value. A further limitation is that app users may not be representative of the wider population. Finally, these data were generated in the spring and summer months when the incidence of concurrent respiratory infections (e.g. influenza) is low. The latter may have implications for trials conducted in winter.

In summary, we confirm the significance of symptoms widely recommended for triggering RT-PCR and identified additional symptoms to enable efficient trade-off between the number of positive cases detected and tests needed. Our findings may help optimise the choice of triggering symptoms for diagnostic work-up in COVID-19 vaccine efficacy trials and also have wider public health implications.

## Data Availability

Data collected in the COVID-19 Symptom Study smartphone application are being shared with other health researchers through the UK National Health Service-funded Health Data Research UK (HDRUK) and Secure Anonymised Information Linkage consortium housed in the UK Secure Research Platform (Swansea, UK). Anonymised data are available to be shared with HDRUK researchers according to their protocols in the public interest ([https://web.www.healthdatagateway.org/dataset/fddcb382-3051-4394-8436-b92295f14259](https://web.www.healthdatagateway.org/dataset/fddcb382-3051-4394-8436-b92295f14259)). US investigators are encouraged to coordinate data requests through the Coronavirus Pandemic Epidemiology Consortium ([https://www.monganinstitute.org/cope-consortium](https://www.monganinstitute.org/cope-consortium)).

[https://web.www.healthdatagateway.org/dataset/fddcb382-3051-4394-8436-b92295f14259](https://web.www.healthdatagateway.org/dataset/fddcb382-3051-4394-8436-b92295f14259) 

## Ethics

Ethics has been approved by KCL Ethics Committee REMAS ID 18210, review reference LRS-19/20-18210 and all participants provided consent.

## Author Contributions

MA, JC, AC, CS, AEL contributed to study concept and design. LSC, MSG, KK, MM, EM, BM, CHS, RD, AM, LHN, AJ, ATC contributed to acquisition of data. MA and JC contributed to data analysis. MA, JG contributed to initial drafting of the manuscript. All authors contributed to interpretation of data and critical revision of the manuscript. JPC, TS, JW, SO contributed to study supervision, SO, TS, CS contributed to the funding of the study.

## Declaration of interests

JW, RD, JCP, AM are employees of Zoe Global Ltd. ATC reports grants from Massachusetts Consortium on Pathogen Readiness, during the conduct of the study; personal fees from Pfizer Inc., grants and personal fees from Bayer Pharma; CEPI (authors AC, JG, JPC, AEL) funds clinical trials of COVID-19 vaccines. All other authors declare no competing interests.

## Data sharing statement

Data collected in the COVID-19 Symptom Study smartphone application are being shared with other health researchers through the UK National Health Service-funded Health Data Research UK (HDRUK) and Secure Anonymised Information Linkage consortium, housed in the UK Secure Research Platform (Swansea, UK). Anonymised data are available to be shared with HDRUK researchers according to their protocols in the public interest ([https://web.www.healthdatagateway.org/dataset/fddcb382-3051-4394-8436-b92295f14259](https://web.www.healthdatagateway.org/dataset/fddcb382-3051-4394-8436-b92295f14259)). US investigators are encouraged to coordinate data requests through the Coronavirus Pandemic Epidemiology Consortium ([https://www.monganinstitute.org/cope-consortium](https://www.monganinstitute.org/cope-consortium)).

## Supplementary tables and figures

View this table:
[Supplementary Table e1 –](http://medrxiv.org/content/early/2020/12/01/2020.11.23.20237313/T4)

Supplementary Table e1 – List of self-reported symptoms and corresponding question used in the reporting app

View this table:
[Supplementary Table e2 –](http://medrxiv.org/content/early/2020/12/01/2020.11.23.20237313/T5)

Supplementary Table e2 – NSGAII parameters

View this table:
[Supplementary Table e3 -](http://medrxiv.org/content/early/2020/12/01/2020.11.23.20237313/T6)

Supplementary Table e3 - 
**Sensitivity, specificity, and Tests Per Case (TPC) for the four clinically-inferred subsets of symptoms computed on the discovery data (UK) cohort**

View this table:
[Supplementary Table e4 –](http://medrxiv.org/content/early/2020/12/01/2020.11.23.20237313/T7)

Supplementary Table e4 – 
**Pareto of optimal combination of symptoms for three-day analysis computed on the on the discovery (UK) training data, ordered by decreasing sensitivity (TPC: Tests per Case)**

View this table:
[Supplementary Table e5 –](http://medrxiv.org/content/early/2020/12/01/2020.11.23.20237313/T8)

Supplementary Table e5 – 
**Pareto of optimal combination of symptoms for seven-day analysis computed on the discovery (UK) training data, ordered by decreasing sensitivity (TPC: Tests per Case)**

![Supplementary Figure e1](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/12/01/2020.11.23.20237313/F5.medium.gif)

[Supplementary Figure e1](http://medrxiv.org/content/early/2020/12/01/2020.11.23.20237313/F5)

Supplementary Figure e1 Flow diagram of user selection for the discovery data cohort

![Supplementary Figure e2](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/12/01/2020.11.23.20237313/F6.medium.gif)

[Supplementary Figure e2](http://medrxiv.org/content/early/2020/12/01/2020.11.23.20237313/F6)

Supplementary Figure e2 Flow diagram of user selection for the replication data cohort

## Acknowledgements

Zoe provided in kind support for all aspects of building, running and supporting the app and service to all users worldwide. CEPI provided funding for the analysis of the data. Support for this study was provided by the NIHR-funded Biomedical Research Centre based at GSTT NHS Foundation Trust. Investigators also received support from the Wellcome Trust, the MRC/BHF, Alzheimer’s Society, EU, NIHR, CDRF, and the NIHR-funded BioResource, Clinical Research Facility and BRC based at GSTT NHS Foundation Trust in partnership with KCL, the UK Research and Innovation London Medical Imaging & Artificial Intelligence Centre for Value Based Healthcare, the Wellcome Flagship Programme (WT213038/Z/18/Z), the Chronic Disease Research Foundation, and DHSC.

DAD is supported by the National Institute of Diabetes and Digestive and Kidney Diseases K01DK120742 and by the American Gastroenterological Association AGA-Takeda COVID-19 Rapid Response Research Award (AGA2021-5102). ATC was supported in this work through a Stuart and Suzanne Steele MGH Research Scholar Award. The Massachusetts Consortium on Pathogen Readiness (MassCPR) and Mark and Lisa Schwartz supported MGH investigators (LHN, DAD, ADJ, ATC).

*   Received November 23, 2020.
*   Revision received December 1, 2020.
*   Accepted December 1, 2020.


*   © 2020, Posted by Cold Spring Harbor Laboratory

This pre-print is available under a Creative Commons License (Attribution-NoDerivs 4.0 International), CC BY-ND 4.0, as described at [http://creativecommons.org/licenses/by-nd/4.0/](http://creativecommons.org/licenses/by-nd/4.0/)

## References

1.  1.Hodgson SH, Mansatta K, Mallett G, Harris V, Emary KRW, Pollard AJ. What defines an efficacious COVID-19 vaccine? A review of the challenges assessing the clinical efficacy of vaccines against SARS-CoV-2. Lancet Infect Dis 2020; S1473–3099(20):30773–8.
    
    
2.  2.Corey L, Mascola JR, Fauci AS, Collins FS. A strategic approach to COVID-19 vaccine R&D. Science 2020; 368(6494):948–50.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEyOiIzNjgvNjQ5NC85NDgiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMC8xMi8wMS8yMDIwLjExLjIzLjIwMjM3MzEzLmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 

3.  3.Pfizer and Biontech conclude pahse 3 study of COVID-19 vaccine candidate, meeting all primary efficacy endpoints: [https://www.pfizer.com/news/press-release/press-release-detail/pfizer-and-biontech-conclude-phase-3-study-covid-19-vaccine](https://www.pfizer.com/news/press-release/press-release-detail/pfizer-and-biontech-conclude-phase-3-study-covid-19-vaccine)
    
    
4.  4.Moderna Announces Primary Efficacy Analysis in Phase 3 COVE Study for Its COVID-19 Vaccine: [https://investors.modernatx.com/node/10421/pdf](https://investors.modernatx.com/node/10421/pdf)
    
    
5.  5.AZD1222 vaccine met primary efficacy endpoint in preventing COVID-19: [https://www.astrazeneca.com/media-centre/press-releases/2020/azd1222hlr.html](https://www.astrazeneca.com/media-centre/press-releases/2020/azd1222hlr.html)
    
    
6.  6.Petersen I, Phillips A. Three Quarters of People with SARS-CoV-2 Infection are Asymptomatic: Analysis of English Household Survey Data. Clin Epidemiol. 2020;12:1039–43.
    
    
7.  7.Pormohammad A, Ghorbani S, Khatami A, Razizadeh MH, Alborzi A, Zarei M, et al. Comparison of influenza type A and B with COVID-19: A global systematic review and meta-analysis on clinical, laboratory and radiographic findings. Rev Med Virol. 2020:e2179.
    
    
8.  8.Wiersinga JW, Rhodes A, Cheng AC, et al. Pathophysiology, transmission, diagnosis, and treatment of coronavirus disease 2019 (COVID-19): a review. JAMA. 2020;324(8):782–93.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1001/jama.2020.12839.e4&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F12%2F01%2F2020.11.23.20237313.atom) 

9.  9.Haehner A, Draf J, Drager S, de With K, Hummel T. Predictive value of sudden olfactory loss in the diagnosis of COVID-19. ORL J Otorhinolaryngol Relat Spec 2020;82(4):175–80.
    
    
10. 10.A Phase 3, Randomised, Stratified, Observer-Blind, Placebo-Controlled Study to Evaluate the Efficacy, Safety, and Immunogenicity of mRNA-1273 SARS-CoV-2 Vaccine in Adults Aged 18 Years and Older.[https://www.modernatx.com/sites/default/files/mRNA-1273-P301-Protocol.pdf](https://www.modernatx.com/sites/default/files/mRNA-1273-P301-Protocol.pdf)
    
    
11. 11.A phase 1/2/3, placebo-controlled, randomised, observer-blind, dose-finding study to evaluate the safety, tolerability, immunogenicity, and efficacy of SARS-CoV-2 RNA vaccine candidates against COVID-19 in healthy. individuals [https://pfe-pfizercom-d8-prod.s3.amazonaws.com/2020-09/C4591001\_Clinical\_Protocol\_0.pdf](https://pfe-pfizercom-d8-prod.s3.amazonaws.com/2020-09/C4591001_Clinical_Protocol_0.pdf)
    
    
12. 12.A Randomised, Double-blind, Placebo-controlled Phase 3 Study to Assess the Efficacy and Safety of Ad26.COV2.S for the Prevention of SARS-CoV-2-mediated COVID-19 in Adults Aged 18 Years and Older. [https://www.jnj.com/coronavirus/covid-19-phase-3-study-clinical-protocol](https://www.jnj.com/coronavirus/covid-19-phase-3-study-clinical-protocol)
    
    
13. 13.A Phase III Randomized, Double-blind, Placebo-controlled Multicenter Study in Adults to Determine the Safety, Efficacy, and Immunogenicity of AZD1222, a Non-replicating ChAdOx1 Vector Vaccine, for the Prevention of COVID-19. [https://s3.amazonaws.com/ctr-med-7111/D8110C00001/52bec400-80f6-4c1b-8791-0483923d0867/c8070a4e-6a9d-46f9-8c32-cece903592b9/D8110C00001\_CSP-v2.pdf](https://s3.amazonaws.com/ctr-med-7111/D8110C00001/52bec400-80f6-4c1b-8791-0483923d0867/c8070a4e-6a9d-46f9-8c32-cece903592b9/D8110C00001_CSP-v2.pdf)
    
    
14. 14.A Phase 3, Randomised, Observer-blinded, Placebo-Controlled Trial to evaluate the Efficacy and Safety of a SARS-CoV-2 Recombinant Spike Protein Nanoparticle Vaccine (SARS-CoV-RS) with Matrix-M1 Adjuvant in Adult participants 18-84 years of Age in the United Kingdom. [https://www.novavax.com/download/files/protocols/2019nCoV302Phase3UKVersion2FinalCleanRedacted.pdf](https://www.novavax.com/download/files/protocols/2019nCoV302Phase3UKVersion2FinalCleanRedacted.pdf)
    
    
15. 15.Drew DA, Nguyen LH, Steves CJ, et al. Rapid implementation of mobile technology for real-time epidemiology of COVID-19. Science 2020;368(6497):1362–67.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEzOiIzNjgvNjQ5Ny8xMzYyIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMTIvMDEvMjAyMC4xMS4yMy4yMDIzNzMxMy5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

16. 16.Sudre CH, Lee K, Lochlainn MN. Symptom clusters in Covid19: A potential clinical prediction tool from the COVID Symptom study App. medRxiv Jun 16; 2020.06.12.20129056. doi: 10.1101/2020.06.12.20129056. Preprint
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoibWVkcnhpdiI7czo1OiJyZXNpZCI7czoyMToiMjAyMC4wNi4xMi4yMDEyOTA1NnYxIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMTIvMDEvMjAyMC4xMS4yMy4yMDIzNzMxMy5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

17. 17.Deb K, Pratap A, Agarwal S, Meyarivan T. A fast and elitist multi-objective genetic algorithm: NSGA-II IEEE. Transactions on Evolutionary Computation 2002;6:182–97.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1109/4235.996017&link_type=DOI) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000175082800006&link_type=ISI) 

18. 18.Clinical Management of COVID-19 (WHO interim guidance): [https://www.who.int/publications/i/item/clinical-management-of-covid-19](https://www.who.int/publications/i/item/clinical-management-of-covid-19) (Accessed November 22, 2020).
    
    
19. 19.Agyeman AA, Chin KL, Landersdorfer CB. Smell and Taste Dysfunction in Patients With COVID-19: A systematic review and meta-analysis. Mayo Clin Proc. 2020;95(8):1621–31.
    
    
20. 20.Public Health England. [https://www.gov.uk/government/publications/wuhan-novel-coronavirus-initial-investigation-of-possible-cases/investigation-and-initial-clinical-management-of-possible-cases-of-wuhan-novel-coronavirus-wn-cov-infection#criteria](https://www.gov.uk/government/publications/wuhan-novel-coronavirus-initial-investigation-of-possible-cases/investigation-and-initial-clinical-management-of-possible-cases-of-wuhan-novel-coronavirus-wn-cov-infection#criteria) (Accessed November 10th 2020).
    
    
21. 21.Menni C, Valdes AM, Freidin MB, et al. Real-time tracking of self-reported symptoms to predict potential COVID-19. Nat Med. 2020;26(7):1037–40.
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F12%2F01%2F2020.11.23.20237313.atom) 

22. 22.Zens M, Brammertz A, Herpich J, et al. App-based tracking of self-reported COVID-19 symptoms: analysis of questionnaire data. J Med Internet Res. 2020;22(9):e21956.
    
    
23. 23.Dixon BE, Wools-Kaloustian K, Fadel WF, et al. Symptoms and symptom clusters associated with SARS-CoV-2 infection in community-based populations: Results from a statewide epidemiological study. medRxiv Oct 22;2020.10.11.20210922. doi: 10.1101/2020.10.11.20210922. Preprint
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoibWVkcnhpdiI7czo1OiJyZXNpZCI7czoyMToiMjAyMC4xMC4xMS4yMDIxMDkyMnYyIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMTIvMDEvMjAyMC4xMS4yMy4yMDIzNzMxMy5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

24. 24.Tang D, Comish P, Kang R. The hallmarks of COVID-19 disease. PLoS Pathog 2020;16(5):e1008536.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.ppat.1008536&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F12%2F01%2F2020.11.23.20237313.atom)