Changes to the sebum lipidome upon COVID-19 infection observed via non-invasive and rapid sampling from the skin
================================================================================================================

* Matt Spick
* Katie Longman
* Cecile Frampas
* Catia Costa
* Deborah Dunn Walters
* Alex Stewart
* Mike Wilde
* Danni Greener
* George Evetts
* Drupad Trivedi
* Perdita Barran
* Andy Pitt
* Melanie Bailey

## ABSTRACT

The COVID-19 pandemic has led to an urgent and unprecedented demand for testing – both for diagnosis and prognosis. Here we explore the potential for using sebum, collected via swabbing of a patient’s skin, as a novel sampling matrix to fulfil these requirements. In this pilot study, sebum samples were collected from 67 hospitalised patients (30 PCR positive and 37 PCR negative). Lipidomics analysis was carried out using liquid chromatography mass spectrometry. Total fatty acid derivative levels were found to be depressed in COVID-19 positive participants, indicative of dyslipidemia. Orthogonal Partial Least Squares-Discriminant Analysis (OPLS-DA) modelling showed promising separation of COVID-19 positive and negative participants when comorbidities and medication were controlled for. Given that sebum sampling is rapid and non-invasive, this work may offer the potential for diagnostic and prognostic testing for COVID-19.

## Introduction

SARS-CoV-2, a novel coronavirus, was identified by the World Health Organization as originating in the Wuhan province of China in late 2019, 1,2 and causes Corona Virus Disease 2019 (COVID-19). SARS-CoV-2 combines an *R**O* of 3 to 4 (*R**O*, the reproduction number absent any controls such as lockdown), 3 and an estimated case fatality ratio (CFR) of 1%, 4 making the virus faster spreading with higher mortality than seasonal influenza. 5,6,7 The threat of SARS-CoV-2, therefore, derives from this combination of exponential transmission and relatively high mortality rates, on top of additional mortality due to ‘crowding out’ of the treatment of other illnesses. In many countries, high mortality rates were mitigated against through lockdown measures. But such measures are far from costless, with GDP in the OECD contracting by 9.8% in Q2 2002, 8 leading to meaningful disruption and welfare harm.

Entry of SARS-CoV-2 to the human body occurs via receptors on the surfaces of cells, specifically the angiotensin-converting enzyme related carboxy-peptidase (ACE2) receptor. 9 Many cases of COVID-19 will be asymptomatic; those that are symptomatic most commonly present with pathologies related to the lower respiratory tract, although attacks on other organs are also well described. 10 In the most severe cases, the disease leads to hyper inflammation and acute respiratory distress syndrome (ARDS), driven by an excess of pro-inflammatory cytokines, sometimes referred to as a cytokine storm. 11 These symptoms reflect both the direct impact of the virus on specific tissues and also the host body’s immune response. By taking account of both the presence of symptoms and pre-existing conditions that may influence the immune response, progress has been made in stratification of patients admitted to hospital with COVID-19, 12 and treatment options have also improved. 13 Nonetheless, the disease still represents a major threat to health and welfare.

Mass testing has been identified by the World Health Organisation as a key weapon in the battle against COVID-19 to contain outbreaks and reduce hospitalisations. 14 Currently deployed approaches to testing require the detection of SARS-CoV-2 viral RNA collected from the upper respiratory tract via polymerase chain reaction (PCR). Whilst these approaches are easily deployable and highly selective for the virus, they suffer from a significant proportion of false negative events. These arise due to a limited time window during the course of infection for sampling, as well as difficulties with sample collection. Furthermore, currently deployed approaches carry no prognostic information.

Approaches that measure the effect of the virus on the host (as opposed to direct measurement of the virus itself) may offer a complementary solution in clinical or mass testing settings. As a coronavirus requiring lipids for reproduction, COVID-19 can be expected to disrupt the lipidome. 15 Evidence of dysregulated lipidomes have recently been observed in patients with COVID-19 via analyses of blood plasma and also by lipidomic analysis of nasopharyngeal swabs, 16,17 and dysregulation of the skin would be consistent with the ability of canines to differentiate COVID-19 positive and negative by smell. 18 Lipidomics therefore offers a promising route to better understanding of - and potentially diagnosis and prognosis for -COVID-19. Sebum is a biofluid secreted by the sebaceous glands and is rich in lipids. A sebum sample can be collected easily and non-invasively via a gentle swab of skin areas rich in sebum (for example the face, neck or back), with characteristic features identified from sebum for illnesses such as Parkinson’s Disease 19 and Type 1 Diabetes Mellitus. 20 In this work, we explore differences in sebum lipid profiles for patients with and without COVID-19, with a view to exploring their future use as a non-invasive sampling medium for testing and prognosis.

In May 2020 several UK bodies announced their intention to pool resources and form the COVID-19 International Mass Spectrometry (MS) Coalition. 21 This consortium has the proximal goal of providing molecular level information on SARS-CoV-2 in infected humans, with the distal goal of understanding the impact of the novel coronavirus on metabolic pathways in order to better diagnose and treat cases of COVID-19 infection. This work took place as part of the COVID-19 MS Coalition and all data will be stored and fully accessible on the MS Coalition open repository.

## RESULTS AND DISCUSSION

### Population metadata overview

The study population analysed in this work included 67 participants, comprising 30 participants presenting with COVID-19 clinical symptoms (and an associated positive COVID-19 RT-PCR test) and 37 participants presenting without. A summary of the metadata is shown in Table 1.

View this table:
[Table 1:](http://medrxiv.org/content/early/2020/09/29/2020.09.29.20203745/T1)

Table 1: 
Summary of participant metadata

There were more male participants in the COVID-19 positive group (M:F ratio of 0.57) compared to the participant population overall (M:F ratio of 0.52); given recruitment took place in a hospital environment this may reflect higher severity amongst males. 22 Age distributions for COVID-19 positive and negative cohorts were almost identical (mean age of 64.7 years and 65.0 years respectively). Comorbidities are associated with both hospitalisation and also more severe outcomes for COVID-19 infection, but will also alter the metabolome of participants, representing both a causative and confounding factor. The impact on classification accuracy of these comorbidities was tested by splitting participant data by variable and retesting modelling of COVID-19 positive and negative participants to see if separation improved; this process is described in the following sections. In this pilot study, comorbidities were less well represented in the cohort of COVID-19 positive participants than in the cohort of COVID-19 negative participants.

In terms of diagnostic indicators, levels of C-Reactive Protein (CRP) were significantly higher for COVID-19 participants, whilst lymphocyte and eosinophils levels were lower. A two-tailed Mann Whitney U test on the CRP indicator provided a p-value of .031, and on the lymphocytes a p-value of .004. Effect sizes (calculated by Cohen’s D) were 0.56 and 0.85 respectively. COVID-19 positive participants were also markedly more likely to present with bilateral chest X-ray changes (21 out of 30 COVID-19 positive patients, versus just 2 out of 37 COVID-19 negative patients).

![Figure 1:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/09/29/2020.09.29.20203745/F1.medium.gif)

[Figure 1:](http://medrxiv.org/content/early/2020/09/29/2020.09.29.20203745/F1)

Figure 1: 
Boxplots of diagnostic readings from plasma: COVID-19 negative versus positive

As regards outcomes, COVID-19 positive participants experienced higher rates of requiring oxygen / CPAP, higher rates of escalation, and lower survival rates. These observations were in agreement with literature descriptions of COVID-19 symptoms and progression. 12

### Analysis by lipid class

Univariate analysis of individual lipid features showed no significant differences, but aggregated lipid classes did show differentiation; aggregate triglyceride (n=82), diglyceride (n=51) and monoglyceride (n=12) levels were all depressed for participants with both a positive COVID-19 diagnosis and PCR result. Boxplots for these lipid classes are shown in Figure 2.

![Figure 2:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/09/29/2020.09.29.20203745/F2.medium.gif)

[Figure 2:](http://medrxiv.org/content/early/2020/09/29/2020.09.29.20203745/F2)

Figure 2: 
Boxplots of ln(total lipid ion count) by class: COVID-19 negative versus positive

The distributions of the natural log of aggregated lipid ion counts by class were not characterised as normal by Shapiro-Wilk normality tests. 23 Two-tailed Mann-Whitney U-tests were performed to test the significance of aggregate levels of these lipid classes. These resulted in p-values of 0.018, 0.007 and 0.028 for triglycerides, diglycerides and monoglycerides respectively, with effect sizes (calculated by Cohen’s D) of 0.66, 0.61 and 0.53, indicative of medium effect size. These results are suggestive of dyslipidemia within stratum corneum due to COVID-19. The lipid differences between positive and negative cohorts were comparable to those for CRP as an indicator of COVID-19 status.

Other work has found evidence of dyslipidemia in plasma from COVID-19 positive patients, 17 albeit evidence of whether upregulation or downregulation is dominant for these lipid classes is mixed. Plasma triglyceride (TAG) levels have been found to be elevated in blood plasma for mild cases of COVID-19, but TAG levels in plasma may also decline as the severity of COVID-19 increased. 24 It should be remembered, however, that the primary role of skin is barrier function, and lipid expression in the stratum corneum depends on *de novo* lipogenesis – in fact nonskin sources such as plasma provide only a minor contribution to sebum lipids. 25

### Population-level clustering analyses

No clustering was identifiable at the total population level by PCA, i.e. by unsupervised analysis (Figure S1, Supplementary Information). OPLS-DA performed on the same data set still revealed limited separation (Figure 3). R2Y was 0.72 and Q2Y was -0.04, showing that the model was able to achieve some separation of the two groups (COVID-19 positive and negative), but that the model did not have predictive power, possibly indicating overfitting. Given the wide range of comorbidities and the lack of age-matching, this is not unexpected.

![Figure 3:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/09/29/2020.09.29.20203745/F3.medium.gif)

[Figure 3:](http://medrxiv.org/content/early/2020/09/29/2020.09.29.20203745/F3)

Figure 3: 
OPLS-DA of 67 participants, classified by COVID-19 positive / negative

### Investigation of confounding factors

To test the impact of age and diagnostic indicators (CRP, lymphocytes and eosinophils), these variables were pareto-scaled and included in the matrix for OPLS-DA modelling. Variable importance in projection (VIP) scores for lymphocytes, CRP, and eosinophils were 2.02, 1.34 and 1.17 respectively, ranking 16, 243 and 293 out of 998 features. As a single feature, depressed lymphocyte levels show high correlation with COVID-19 positive status, consistent with lymphocyte count being both a diagnostic and prognostic biomarker. 26 Age as a vector had a VIP score of just 0.19 (ranking 849 out of 998 features), indicating that age is a smaller influencer of stratum corneum lipids than other factors. Overall, OPLS-DA separation did not improve by the addition of these age and diagnostic indicators.

To test whether separation would improve in smaller / more homogenous groups, separate OPLS-DA models were built for each split of the population by comorbidity. If model performance improved (measured by goodness of fit, R2Y, and predictive power, measured by Q2Y) then this could indicate that sebum profiling would perform better if models were constructed based on stratified and matched datasets. Table S1 shows the results for these metrics across the different modelled subsets.

Separation generally improved as the data were binned more finely, but for most subpopulations there was no improvement in the modelled predictive power. Four subsets did however show more interesting improvements in model performance. These were the subsets with a specific comorbidity that was being treated by medication (high cholesterol, T2DM and IHD) and the subset undergoing treatment with statins. These are discussed in more detail in the following sections.

OPLS-DA modelling of the subset of participants under medication for **high cholesterol** showed both good separation (R2Y of 1.00) and also better predictive power (Q2Y of 0.53). This subgroup was treated with lipid-lowering agents, specifically statins, with one exception (due to allergic reaction).

![Figure 4:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/09/29/2020.09.29.20203745/F4.medium.gif)

[Figure 4:](http://medrxiv.org/content/early/2020/09/29/2020.09.29.20203745/F4)

Figure 4: 
OPLS-DA of 15 participants treated for high cholesterol, COVID-19 positive / negative

As was seen with the high cholesterol subset, OPLS-DA modelling of the subset of participants under medication for **type-2 diabetes mellitus** (T2DM) showed good separation (R2Y of 1.00), albeit lower predictive power (Q2Y of 0.28). This subgroup was typically being treated with oral hypoglycaemics, for example metformin, in some cases with insulin and in some instances with diet control only.

![Figure 5:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/09/29/2020.09.29.20203745/F5.medium.gif)

[Figure 5:](http://medrxiv.org/content/early/2020/09/29/2020.09.29.20203745/F5)

Figure 5: 
OPLS-DA of 19 participants treated for T2DM, by COVID-19 positive / negative

The subgroup comprising participants undergoing treatment for **ischemic heart disease** (IHD) also showed much better separation (R2Y of 1.00) plus some indication of improved predictive power (Q2Y of 0.52). This subgroup received varied medication, but participants presenting with IHD were also being prescribed statins

![Figure 6:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/09/29/2020.09.29.20203745/F6.medium.gif)

[Figure 6:](http://medrxiv.org/content/early/2020/09/29/2020.09.29.20203745/F6)

Figure 6: 
OPLS-DA of 27 participants treated for hypertension, by COVID-19 positive / negative

\It should be noted that there was limited commonality in the features identified as significant in differentiating between COVID-19 positive and negative. Whilst some features had high VIP scores in all subgroups (Figure 7), many did not, a possible indicator of overfitting due to the small bins available in this pilot study.

![Figure 7:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/09/29/2020.09.29.20203745/F7.medium.gif)

[Figure 7:](http://medrxiv.org/content/early/2020/09/29/2020.09.29.20203745/F7)

Figure 7: 
Heat map of VIP scores ranked by commonality to different subgroup OPLS-DA models

### Possible drivers of improved separation

One possibility for the improved separation and model scores is that the sub-populations are more homogenous for confounding factors. Certainly, comorbidity as a confounding factor has been reduced by grouping according to whether participants were treated for said comorbidities. The ranges of ages in the comorbidities subgroups are somewhat reduced for those treated for hypertension and T2DM (generally skewing older) and more markedly reduced for those treated for high cholesterol (Figure S2, supporting material), albeit as discussed above age itself appears not to be a direct predictor of dyslipidemia. Additionally, as shown in Figures S3 to S5, it is possible to provide separation on the basis of gender. This raises the possibility that a larger dataset -with the potential for matching and stratifying the participant population more rigorously -could yield classification models with greater predictive power.

Alternatively, confounding factors might be reduced by medication, providing a more similar “baseline” against which to measure perturbance in the lipidome by COVID-19. A good example is statins, a standard treatment for patients presenting with high cholesterol levels. Analysing all patients taking statins (which includes both participants treated for high cholesterol and also participants with poor diabetic control or history of ischaemic heart disease, where statins are routinely added prophylactically to improve long-term outcomes) also shows improved separation and predictive power by OPLS-DA modelling (Figure 9), with R2Y of 0.74 and Q2Y of 0.39.

![Figure 9:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/09/29/2020.09.29.20203745/F8.medium.gif)

[Figure 9:](http://medrxiv.org/content/early/2020/09/29/2020.09.29.20203745/F8)

Figure 9: 
OPLS-DA of 19 participants with statins, by COVID-19 positive / negative

In the participant dataset in this pilot study, too few datapoints were available to rigorously stratify by medication and by diagnosis of comorbidity. Nonetheless, at the aggregate level, participants with a positive clinical COVID-19 diagnosis present with depressed levels of some lipid classes. Furthermore, these findings suggest that better matching of participants could yield a clearer separation of positive and negative COVID-19 participants by their lipidomic profile. Of course, it cannot be ruled out that the lower *n* values for the smaller subsets could lead to apparently better R2Y and Q2Y scores only by chance. Overfitting is a risk in any pilot study with small *n*; this risk can only be reduced through both a larger training set of data and subsequently testing the models on validation sets of data.

Another point to note is a possible lack of confounders in the participant population from seasonal respiratory viruses. Whilst the COVID-negative patients included patients with respiratory illnesses (e.g. COPD, asthma) and COVID-like symptoms, samples were collected between May and July, when the incidence of respiratory viruses is generally low. Both the common cold and influenza have some symptoms overlap with COVID-19 and may possibly lead to alterations to lipid metabolism that could interfere with the identification of features related to COVID-19 infection. Such viruses within the UK are more prevalent in autumn and winter. 27 Whilst it seems unlikely that seasonal respiratory viruses were a major confounding factor in this work, this is a factor that will need to be taken into account in future studies, and may also allow the opportunity to test sebum’s selectivity and specificity with regard to other respiratory viruses.

A final limitation of this study is inconsistency in the timeline between onset of symptoms, hospital admission, PCR test and sebum sampling, which was an inevitable consequence of collecting samples in a pandemic situation. Patients were sampled immediately upon recruitment to the study. This means that the range in time between symptom onset and sebum sampling ranged from 1 day to > 1 month. Future work should explore longitudinal sampling of patients, to establish how quickly (or whether at all) the sebum lipidome returns to normal, and whether it has prognostic power. This will help to inform the practical utility of sebum in clinical or mass testing.

## CONCLUSIONS

At the aggregate level, analysis of the metadata for the participants in this study illustrates the challenges involved in constructing a well-designed sample set during a pandemic. Age ranges of participants were large, and a wide range of comorbidities were present, leading to many confounding factors. We provide evidence that COVID-19 infection leads to dyslipidemia in the stratum corneum, with participants in this study with symptoms and a positive clinical COVID-19 diagnosis presenting with depressed levels of some lipid classes. We further find that the sebum lipidomics profiles of COVID positive and negative patients can be separated using the multivariate analysis method, OPLS-DA, with the separation improving when the patients are segmented in accordance with certain co-morbidities. In addition to these promising findings, sebum samples can be provided quickly and painlessly and can be transported and stored at room temperature. We conclude that sebum is worthy of future consideration for clinical sampling for COVID-19 infection.

## EXPERIMENTAL

### Participant recruitment

The participants included in this study were recruited at NHS Frimley Park NHS Trust, totalling 67 participants. Ethical approval for this project (IRAS project ID 155921) was obtained via the NHS Health Research Authority (REC reference: 14/LO/1221).

### Materials and chemicals

The materials and solvents utilised in this study were as follows: gauze swabs (Reliance Medical, UK), 30 mL Sterilin™ tubes (Thermo Scientific, UK), 10 mL syringes (Becton Dickinson, Spain), 2 mL microcentrifuge tubes (Eppendorf, UK), 0.2 µm syringe filters (Corning Incorporated, USA), 200 µL micropipette tips (Starlab, UK) and Qsert™ clear glass insert LC vials (Supelco, UK). Optima™ (LC-MS) grade methanol was used as an extraction solvent, and Optima™ (LC-MS) grade methanol, ethanol, acetonitrile and 2-propanol were used to prepare injection solvents and mobile phases. Formic acid was added to the mobile phase solvents at 0.1% (v/v). Solvents were purchased from Fisher Scientific, UK.

### Sample collection, inactivation and extraction

Collection of the samples was performed Researchers from the University of Surrey at Frimley Park NHS Foundation Trust hospitals. Participants were identified by clinical staff and were categorised by the hospital as either “query COVID” (meaning there was clinical suspicion of COVID-19 infection) or “COVID positive” (meaning that a positive COVID test result had been recorded during their admission). Each participant was swabbed on the right side of the upper back, using 15 cm by 7.5 cm gauzes that had each been folded twice to create a four-ply swab. The surface area of sampling was approximately 5 cm × 5 cm, pressure was applied uniformly whilst moving the swab across the upper back for ten seconds. The gauzes were placed into Sterilin polystyrene 30 mL universal containers. Samples were transferred from the hospital to the University of Surrey by courier within 4 hours of collection, whereupon the samples were then quarantined at room temperature for seven days. Finally, the vials were transferred to minus 80°C storage until required.

Alongside sebum collection, metadata for all participants was also collected covering *inter alia* gender, age, comorbidities (based on whether the participant was receiving treatment), the results and dates of COVID PCR (polymerase chain reaction) tests, bilateral chest X-Ray changes, smoking status, and whether the participant presented with clinical symptoms of COVID. Values for lymphocytes, CRP and eosinophils were also taken -here the most extreme values during the hospital admission period were recorded. These were not collected concomitantly with the sebum samples.

The analysis of the obtained samples was adapted from Sinclair *et al*. 28 To extract analytes from the sample gauzes, the Sterilin vials and contents were allowed to equilibrate to room temperature after which 9 mL methanol was added, followed by vortex-mixing for 10 sec. The solution with gauze was then sonicated for 30 min at ambient temperature. The metabolite-rich methanol was then filtered through a 0.2 μm filter to yield three equal aliquots of 2 mL fractions in 2 mL Eppendorf tubes, and a 0.2 mL aliquot reserved to create a pooled QC in a separate 10 mL scintillation vial. Each 2 mL sample was then dried under nitrogen for 3 hours, leaving a lipid pellet, and frozen at minus 80 °C until the day of analysis.

To reconstitute the samples, the Eppendorf tubes and contents were allowed to equilibrate to room temperature and the dried pellet was dissolved in 200 μL of methanol:ethanol (v/v, 50:50). The reconstituted solution was vortex mixed for 20 seconds followed by sonication (5 min) and centrifugation at 12 000 g for 10 minutes. 150 μL of supernatant was extracted and transferred to a LC-MS insert vial for analysis. Samples were analysed over a period of five days. Each day consisted of a run incorporating solvent blank injections (n=5), pooled QC injections (n=3), followed by 16 participant samples (triplicate injections of each) with a single pooled QC injection every six injections. Each day’s run was completed with pooled QC injections (n=2) and solvent blanks (n=3). A triplicate injection of a field blank was also obtained.

### Instrumentation and software

Analysis of samples was carried out using a Dionex Ultimate 3000 HPLC module equipped with a binary solvent manager, column compartment and autosampler, coupled to a Orbitrap Q-Exactive Plus mass spectrometer (Thermo Fisher Scientific, UK) at the University of Surrey’s Ion Beam Centre. Chromatographic separation was performed on a Waters ACQUITY UPLC BEH C18 column (1.7 µm, 2.1 mm × 100 mm) operated at 55 °C with a flow rate of 0.3 ml min-1.

The mobile phases were as follows: mobile phase A was acetonitrile:water (v/v 60:40) with 0.1% formic acid, whilst mobile phase B was 2-propanol:acetonitrile (v/v, 90:10) with 0.1% formic acid (v/v). An injection volume of 5 µL was used. The initial solvent mixture was 40% B, increasing to 50% B over 1 minute, then to 69% B at 3.6 minutes, with a final ramp to 88% B at 12 minutes. The gradient was reduced back to 40% B and held for 2 minutes to allow for column equilibration. Analysis on the Q-Exactive Plus mass spectrometer was performed in split-scan mode with an overall scan range of 150 *m/z* to 2 000 *m/z*, and 5 ppm mass accuracy. Split scan was chosen to maximise the *m/z* range to 150 to 2 000 *m/z* whilst maximising the number of features identified. 29,30 Operating conditions are summarised in the table below.

View this table:
[Table 2:](http://medrxiv.org/content/early/2020/09/29/2020.09.29.20203745/T2)

Table 2: 
Operating conditions of the mass spectrometer used in this research.

### Data processing

LC-MS outputs (.raw files) were pre-processed for alignment, normalisation and peak identification using Progenesis QI (Non-Linear Dynamics, Waters, Wilmslow, UK), a platform-independent small molecule discovery analysis software for LC-MS data. Peak picking (mass tolerance ±5 ppm), alignment (RT window ±15 s) and area normalisation was carried out with reference to the pooled QC samples. Features were annotated using accurate mass match with Lipid Blast in Progenesis QI cross-checked against LipidSearch (Thermo Fisher Scientific, UK). This process yielded a peak table with 14,160 features. All those features with a coefficient of variation across all pooled QCs above 20% were removed, as were those that were not present in at least 90% of pooled QC injections. These features were then field blank adjusted: all those features with a signal to noise ratio below 3x were also rejected. The remaining set of 998 features were deemed to be robust, reproducible and suitably distinct from those found in the field blank.

Inclusion criteria were also applied to participant data, requiring both full completion of metadata and also agreement between the result of the PCR COVID-19 test (Y/N) and the clinical diagnosis for COVID-19 (Y/N). Whilst these inclusion criteria reduced the total number of participants from n=87 to n=67, this was considered worthwhile given the potential for misdiagnosis to confound the development of statistical models.

### Statistical Analysis

Data processing and analysis of the peak:area matrix was conducted through a combination of (a) user-written scripts in the statistical programming language R, using the RStudio graphical user interface package, and (b) the online metabolomics suite of tools contained within Metaboanalyst™. 31 Both PCA and OPLS-DA were performed for classification and prediction of data. A knock-one-out approach was used for OPLS-DA model validation. The data were pareto-scaled in RStudio as part of all statistical analyses, without replacement of missing values.

## Data Availability

This work took place as part of the COVID-19 MS Coalition and all data will be stored and fully accessible on the MS Coalition open repository.

## Supplementary Information

View this table:
[Table S1:](http://medrxiv.org/content/early/2020/09/29/2020.09.29.20203745/T3)

Table S1: Summary of model parameters for different population subsets

![Figure S1:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/09/29/2020.09.29.20203745/F9.medium.gif)

[Figure S1:](http://medrxiv.org/content/early/2020/09/29/2020.09.29.20203745/F9)

Figure S1: 
Unsupervised PCA of 67 participants, classified by COVID-19 positive / negative

![Figure S2:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/09/29/2020.09.29.20203745/F10.medium.gif)

[Figure S2:](http://medrxiv.org/content/early/2020/09/29/2020.09.29.20203745/F10)

Figure S2: 
Boxplot of age distributions

![Figure S3:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/09/29/2020.09.29.20203745/F11.medium.gif)

[Figure S3:](http://medrxiv.org/content/early/2020/09/29/2020.09.29.20203745/F11)

Figure S3: 
OPLS-DA of 67 participants, by gender

![Figure S4:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/09/29/2020.09.29.20203745/F12.medium.gif)

[Figure S4:](http://medrxiv.org/content/early/2020/09/29/2020.09.29.20203745/F12)

Figure S4: 
OPLS-DA of 32 female participants, by COVID-19 positive / negative

![Figure S5:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/09/29/2020.09.29.20203745/F13.medium.gif)

[Figure S5:](http://medrxiv.org/content/early/2020/09/29/2020.09.29.20203745/F13)

Figure S5: 
OPLS-DA of 35 male participants, by COVID-19 positive / negative

![Figure S6:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/09/29/2020.09.29.20203745/F14.medium.gif)

[Figure S6:](http://medrxiv.org/content/early/2020/09/29/2020.09.29.20203745/F14)

Figure S6: 
OPLS-DA of 27 participants treated for hypertension, by COVID-19 positive / negative

## Acknowledgements

The authors would like to acknowledge funding from the EPSRC Impact Acceleration Account for sample collection, as well as EPSRC Fellowship Funding EP/R031118/1. Mass Spectrometry was funded under EP/P001440/1. The authors acknowledge Samiksha Ghimire from Groningen Medical School for translation of participant information sheets and consent forms into Nepalese. The authors acknowledge Holly Lewis, Mason Malloy, Patrick Sears and Janella de Jesus for their help with method development. We are grateful to Thanuja Weerasinge (Jay), Manjula Meda, Chris Orchard and Joanne Zamani of Frimley Park NHS Foundation Trust for their help with ethics approvals and access to hospital patients.

*   Received September 29, 2020.
*   Revision received September 29, 2020.
*   Accepted September 29, 2020.


*   © 2020, Posted by Cold Spring Harbor Laboratory

This pre-print is available under a Creative Commons License (Attribution-NoDerivs 4.0 International), CC BY-ND 4.0, as described at [http://creativecommons.org/licenses/by-nd/4.0/](http://creativecommons.org/licenses/by-nd/4.0/)

## References

1.  1.WHO, WHO advice for international travel and trade in relation to the outbreak of pneumonia caused by a new coronavirus in China, [https://www.who.int/news-room/articles-detail/who-advice-for-international-travel-and-trade-in-relation-to-the-outbreak-of-pneumonia-caused-by-a-new-coronavirus-in-china](https://www.who.int/news-room/articles-detail/who-advice-for-international-travel-and-trade-in-relation-to-the-outbreak-of-pneumonia-caused-by-a-new-coronavirus-in-china), (accessed 27 July 2020).
    
    
2.  2.WHO, Novel Coronavirus – China, [https://www.who.int/csr/don/12-january-2020-novel-coronavirus-china/en/](https://www.who.int/csr/don/12-january-2020-novel-coronavirus-china/en/), (accessed 27 July 2020).
    
    
3.  3. A. Pan,  L. Liu,  C. Wang,  H. Guo,  X. Hao,  Q. Wang,  J. Huang,  N. He,  H. Yu,  X. Lin,  S. Wei and  T. Wu, JAMA - J. Am. Med. Assoc., 2020, 323, 1915–1923.
    
    
4.  4.Diamond Princess passenger dies, bringing ship’s death toll to seven, [https://www.channelnewsasia.com/news/asia/coronavirus-covid19-japan-diamond-princess-deaths-12513028](https://www.channelnewsasia.com/news/asia/coronavirus-covid19-japan-diamond-princess-deaths-12513028), (accessed 16 July 2020).
    
    
5.  5. C. Fraser,  C. A. Donnelly,  S. Cauchemez,  W. P. Hanage,  M. D. Van Kerkhove,  T. D. Hollingsworth,  J. Griffin,  R. F. Baggaley,  H. E. Jenkins,  E. J. Lyons,  T. Jombart,  W. R. Hinsley,  N. C. Grassly,  F. Balloux,  A. C. Ghani,  N. M. Ferguson,  A. Rambaut,  O. G. Pybus,  H. Lopez-Gatell,  C. M. Alpuche-Aranda,  I. B. Chapela,  E. P. Zavala,  D. Ma. Espejo Guevara,  F. Checchi,  E. Garcia,  S. Hugonnet and  C. Roth, Science (80-.)., 2009, 324, 1557–1561.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEzOiIzMjQvNTkzNC8xNTU3IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMDkvMjkvMjAyMC4wOS4yOS4yMDIwMzc0NS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

6.  6. J. Truscott,  C. Fraser,  W. Hinsley,  S. Cauchemez,  C. Donnelly,  A. Ghani,  N. Ferguson and  A. Meeyai, PLoS Curr.,, DOI:10.1371/currents.RRN1125.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/currents.RRN1125&link_type=DOI) 

7.  7. D. D. Rajgor,  M. H. Lee,  S. Archuleta,  N. Bagdasarian and  S. C. Quek, Lancet Infect. Dis., 2020, 20, 776–777.
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F29%2F2020.09.29.20203745.atom) 

8.  8.OECD, GDP Growth - Second quarter of 2020, OECD, [http://www.oecd.org/sdd/na/gdp-growth-second-quarter-2020-oecd.htm](http://www.oecd.org/sdd/na/gdp-growth-second-quarter-2020-oecd.htm), (accessed 19 September 2020).
    
    
9.  9. X. Yang,  Y. Yu,  J. Xu,  H. Shu,  J. Xia,  H. Liu,  Y. Wu,  L. Zhang,  Z. Yu,  M. Fang,  T. Yu,  Y. Wang,  S. Pan,  X. Zou,  S. Yuan and  Y. Shang, Lancet Respir. Med., 2020, 8, 475–481.
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F29%2F2020.09.29.20203745.atom) 

10. 10. D. Wang,  B. Hu,  C. Hu,  F. Zhu,  X. Liu,  J. Zhang,  B. Wang,  H. Xiang,  Z. Cheng,  Y. Xiong,  Y. Zhao,  Y. Li,  X. Wang and  Z. Peng, JAMA - J. Am. Med. Assoc., 2020, 323, 1061–1069.
    
    
11. 11. P. Mehta,  D. F. McAuley,  M. Brown,  E. Sanchez,  R. S. Tattersall and  J. J. Manson, Lancet, 2020, 395, 1033–1034.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0140-6736(20)30628-0&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F29%2F2020.09.29.20203745.atom) 

12. 12. S. R. Knight,  A. Ho,  R. Pius,  I. Buchan,  G. Carson,  T. M. Drake,  J. Dunning,  C. J. Fairfield,  C. Gamble,  C. A. Green,  R. Gupta,  S. Halpin,  H. E. Hardwick,  K. A. Holden,  P. W. Horby,  C. Jackson,  K. A. Mclean,  L. Merson,  J. S. Nguyen-Van-Tam,  L. Norman,  M. Noursadeghi,  P. L. Olliaro,  M. G. Pritchard,  C. D. Russell,  C. A. Shaw,  A. Sheikh,  T. Solomon,  C. Sudlow,  O. V Swann,  L. C. W. Turtle,  P. J. M. Openshaw,  J. K. Baillie,  M. G. Semple,  A. B. Docherty and  E. M. Harrison, BMJ, 2020, 370, m3339.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiYm1qIjtzOjU6InJlc2lkIjtzOjE3OiIzNzAvc2VwMDlfNy9tMzMzOSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIwLzA5LzI5LzIwMjAuMDkuMjkuMjAyMDM3NDUuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 

13. 13.The RECOVERY Collaborative Group, N. Engl. J. Med.,, DOI:10.1056/nejmoa2021436.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/nejmoa2021436&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32678530&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F29%2F2020.09.29.20203745.atom) 

14. 14.World Health Organization, Simulation of the effects of COVID-19 testing rates on hospitalizations, [https://www.who.int/bulletin/volumes/98/5/20-258186/en/](https://www.who.int/bulletin/volumes/98/5/20-258186/en/), (accessed 29 September 2020).
    
    
15. 15. M. Abu-Farha,  T. A. Thanaraj,  M. G. Qaddoumi,  A. Hashem,  J. Abubaker and  F. Al-Mulla, Int. J. Mol. Sci.,, DOI:10.3390/ijms21103544.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.3390/ijms21103544&link_type=DOI) 

16. 16. I. W. De Silva,  S. Nayek,  V. Singh,  J. Reddy,  J. K. Granger and  G. F. Verbeck, Analyst,, DOI:10.1039/d0an01074j.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1039/d0an01074j&link_type=DOI) 

17. 17. D. Wu,  T. Shu,  X. Yang,  J.-X. Song,  M. Zhang,  C. Yao,  W. Liu,  M. Huang,  Y. Yu,  Q. Yang,  T. Zhu,  J. Xu,  J. Mu,  Y. Wang,  H. Wang,  T. Tang,  Y. Ren,  Y. Wu,  S.-H. Lin,  Y. Qiu,  D.-Y. Zhang,  Y. Shang and  X. Zhou, Natl. Sci. Rev., 2020, 7, 1157–1168.
    
    
18. 18. P. Jendrny,  C. Schulz,  F. Twele,  S. Meller,  M. Von Köckritz-Blickwede,  A. D. M. E. Osterhaus,  J. Ebbers,  V. Pilchová,  I. Pink,  T. Welte,  M. P. Manns,  A. Fathi,  C. Ernst,  M. M. Addo,  E. Schalke and  H. A. Volk, BMC Infect. Dis., 2020, 20, 1–7.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s12879-020-05251-9&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F29%2F2020.09.29.20203745.atom) 

19. 19. E. Sinclair,  D. Trivedi,  D. Sarkar,  C. Walton-Doyle,  J. Milne,  T. Kunath,  A. Rijs,  R. Debie,  R. Goodacre,  M. Silverdale and  P. Barran,, DOI:10.26434/chemrxiv.11603613.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.26434/chemrxiv.11603613&link_type=DOI) 

20. 20. S. S. Shetage,  M. J. Traynor,  M. B. Brown,  T. M. Galliford and  R. P. Chilcott, Sci. Rep., 2017, 7, 1–8.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/srep41926&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28127051&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F29%2F2020.09.29.20203745.atom) 

21. 21. W. Struwe,  E. Emmott,  M. Bailey,  M. Sharon,  A. Sinz,  F. J. Corrales,  K. Thalassinos,  J. Braybrook,  C. Mills and  P. Barran, Lancet, 2020, 395, 1761–1762.
    
    
22. 22. C. Wenham,  J. Smith and  R. Morgan, Lancet, 2020, 395, 846–848.
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F29%2F2020.09.29.20203745.atom) 

23. 23. B. W. Yap and  C. H. Sim, J. Stat. Comput. Simul., 2011, 81, 2141–2155.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1080/00949655.2010.520163&link_type=DOI) 

24. 24. X. Wei,  W. Zeng,  J. Su,  H. Wan,  X. Yu,  X. Cao,  W. Tan and  H. Wang, J. Clin. Lipidol., 2020, 14, 297–304.
    
    
25. 25. W. P. Esler,  G. J. Tesz,  M. K. Hellerstein,  C. Beysen,  R. Sivamani,  S. M. Turner,  S. M. Watkins,  P. Amor,  S. Carvajal-Gonzalez,  F. J. Geoly,  K. E. Biddle,  J. J. Purkal,  M. Fitch,  C. Buckeridge,  A. M. Silvia,  D. A. Griffith,  M. Gorgoglione,  L. Hassoun,  S. S. Bosanac,  N. B. Vera,  T. P. Rolph,  J. A. Pfefferkorn and  G. E. Sonnenberg, Sci. Transl. Med., 2019, 11, 1–14.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1126/scitranslmed.aaw8434&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F29%2F2020.09.29.20203745.atom) 

26. 26. J. Wagner,  A. DuPont,  S. Larson,  B. Cash and  A. Farooq, Int. J. Lab. Hematol.,, DOI:10.1111/ijlh.13288.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/ijlh.13288&link_type=DOI) 

27. 27. M. T. Eyre,  R. Burns,  V. Kirkby,  C. Smith,  S. Denaxas,  V. Nguyen,  A. Hayward,  L. Shallcross,  E. Fragaszy and  R. W. Aldridge, medRxiv, 2020, 2020.09.03.20187377.
    
    
28. 28. E. Sinclair,  D. Trivedi,  D. Sarkar,  C. Walton-Doyle,  J. Milne,  T. Kunath,  A. Rijs,  R. Debie,  R. Goodacre,  M. Silverdale and  P. Barran,, DOI:10.26434/chemrxiv.11603613.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.26434/chemrxiv.11603613&link_type=DOI) 

29. 29. F. Fall,  N. Lenuzza,  E. Lamy,  M. Brollo,  E. Naline,  P. Devillier,  E. Thévenot and  S. Grassin-Delyle, J. Chromatogr. B Anal. Technol. Biomed. Life Sci., 2019, 1128, 121780.
    
    
30. 30. C. Ranninger,  L. E. Schmidt,  M. Rurik,  A. Limonciel,  P. Jennings,  O. Kohlbacher and  C. G. Huber, Anal. Chim. Acta, 2016, 930, 13–22.
    
    
31. 31.MetaboAnalyst, [https://www.metaboanalyst.ca/](https://www.metaboanalyst.ca/).