Abstract
Long COVID, or Post-Acute COVID Syndrome (PACS), may develop following SARS-CoV-2 infection, posing a substantial burden to society. Recently, PACS has been linked to a persistent activation of the complement system (CS), offering hope for both a diagnostic tool and targeted therapy. However, our findings indicate that, after adjusting proteomics data for age, body mass index and sex imbalances, the evidence of complement system activation disappears. Furthermore, proteomic analysis of two orthogonal cohorts—one addressing PACS following severe acute phase and another after a mild acute phase—fails to support the notion of persistent CS activation. Instead, we identify a proteomic signature indicative of either ongoing infections or sustained immune activation similar to that observed in acute COVID-19, particularly within the mild-PACS cohort.
Main
Long COVID or Post acute COVID Syndrome (PACS) may emerge upon SARS-CoV-2 infection, causing a considerable societal burden (1–3). The underlying mechanisms are complex with evidence for immune activation and dysregulation, autoantibodies, vascular perfusion and mitochondrial disturbance and viral persistence or reactivation playing a role (4). There is no causal therapy available yet. Recently, a study by Cervia-Hasler et al. (5) thus generated great interest as it linked the disease to a persistent activation of the complement system (CS), and therefore of the innate immune system, providing a causal explanation for PACS. The results raised hopes for a diagnostic tool and a targeted therapy.
Our curiosity was stirred by a result shown in Figure 8b, demonstrating that in their cohort, age and body mass index (BMI) alone predicted PACS with an area under the Receiver Operating Characteristic (ROC) curve of almost 0.8. This result stands in contrast with other studies in the field. While these agree that age and BMI can be risk factors of PACS, age and BMI alone are deemed insufficient to predict which individuals would develop PACS (6).
In seeking to explain why age and BMI are such strong predictors of PACS in this study, we noticed that the cohort was substantially imbalanced for age and BMI. The non-PACS control group predominantly consisted of young individuals with a median age of 36 years and a mean BMI of 25, while individuals in the PACS group were considerably older (median age of 58 years) and had a mean BMI of 28.
We were wondering whether this imbalance might have also affected other results. When comparing the mass-spectrometry based proteomes of the control (non-PACS) and the PACS group without balancing for these demographic factors, CS components indeed appear to be up-regulated in PACS (Figure 1A). However, the CS is known to be age and BMI dependent (7, 8). In their data analysis, Cervia-Hasler et al. attempt a correction for demographic effects using linear regression (5). Nonetheless, age, sex and BMI can have complex interactions with the plasma proteome, and a simple linear model might hence not be sufficient to capture these interactions. To mitigate the age and BMI imbalances, we used a balanced factorial design strategy by splitting all patients in disease groups by age and sex (Figure 1B). Because the cohort was too small to balance all parameters, we excluded those individuals with vastly different age and/or BMI (Supplementary Table 1). While this strategy reduced the cohort, it resulted in a reasonable balancing by age and BMI in subgroups. The final set consisted of 85 individuals (56 controls and + 29 PACS), with approximately the same ratio of PACS to non-PACS as the initial data set. Comparing the proteomes between PACS and non-PACS in this balanced cohort, none of the complement components was significantly changed in PACS, using the same significance level set by Cervia-Hasler et al. (5) (Figure 1C). In contrast, this analysis revealed a different set of plasma proteins that emerged as discriminators of PACS. Apart from immunoglobulins, BCHE was significantly higher in males above 48 years of age, Paraoxonase 3 (PON3) was increased in PACS patients and Syntax Binding Protein 5 (STXBP5) was decreased in PACS patients (9).
Abundance of the complement system in independent PACS cohorts
To add confidence to the results, we set out to orthogonally explore for a persistent complement activation in PACS. We chose two independent cohorts, addressing two different patient groups suffering from the syndrome. First, we chose a study cohort that closely resembles the study design by Cervia-Hasler et al. (5) and which focuses on individuals that report PACS symptoms at a 6 months follow up, after being discharged from Charité where they have been treated for a severe acute phase (WHO severity grade 3-7) (Charité post severe acute COVID-19 cohort (10)). At the 6-month follow-up visit, plasma samples were collected from 130 individuals, of whom 67 reported symptoms consistent with PACS. This cohort was largely balanced for age, sex, and BMI. Importantly, among these individuals, there was no significant correlation between reporting PACS symptoms and age, sex, BMI, or severity in the acute phase (Methods), reflecting other studies investigating PACS (6). Using a high-throughput mass spectrometry platform (11, 12), we investigated the plasma proteome at the 6 months follow-up. Samples were analysed in a 3-minute water to acetonitrile active gradient on an Agilent Infinity II Sciex TripleTOF 6600 mass spectrometer system operating in ScanningSWATH mode (13). Comparing the PACS to the recovered group, we observed either no changes in several detected CS components, or a slight downregulation, at low effect sizes (Figure 1D). The statistical analysis of this cohort revealed among others the C-reactive protein (CRP), Apolipoprotein A-II (APOA2), Alpha-1-acid glycoprotein 2 (ORM2) and the CS regulator Complement factor H-related protein 5 (CFHR5) to be less abundant in PACS (Figure 1E). Only Properdin (CFP), a regulator of the alternative complement pathway, was significantly, albeit only slightly, upregulated in PACS patients. Thus, in another PACS cohort resembling the study (5), we can neither confirm that age or BMI are strong predictors of the development of PACS, nor do we confirm a persistent dysregulation of a broad set of complement factors in these subjects.
A great societal burden of PACS comes from individuals which suffered not from a severe, but rather a mild or moderate acute phase (10, 14). PACS in these individuals is characterised by a complex phenotype including fatigue, exertional intolerance, post exertional malaise (PEM) and brain fog as key symptoms with a subset having developed the most debilitating form of PACS, Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) (15). These patients are younger and predominantly female. We hence chose a third cohort which focuses on PACS with fatigue and exertional intolerance (“Charité post mild acute COVID19” cohort). This cohort included 53 PACS patients with persisting moderate to severe fatigue with 25 of them fulfilling CCC diagnostic criteria for ME/CFS (16). A total of 27 non-PACS, Post Covid Healthy Controls (PCHC) were recruited during the same time. Serum proteome of participants was determined 5 to 19 months following acute COVID-19. Like in the previous two cohorts, a factorial design was applied to address an, in this case albeit slight, age and sex specific imbalance. Proteome analysis was performed with a more conventional proteomic platform, using nano-flowrate, 30-min chromatographic gradient elution on an Ultimate 3000 RSLnano HPLC coupled to a Thermo Scientific Q-Exactive Plus mass spectrometer operating in data independent acquisition (DIA) mode (17), followed by raw-data processing using DIA-NN.
Statistical analysis of the Charité post mild acute COVID19 cohort (PACS (with ME/CFS and without) vs.– Post Covid Healthy Controls (PCHC)) revealed 27 differentially abundant proteins (at significance level alpha = 0.05 (fdr <= 0.33) and the estimated fold-change threshold (1.1, Figure 1F), of which 7 (CA1, SAA1, APOB, ITIH3, IGHV1-2, LRG1, CFI) are elevated. In this cohort, the only complement protein significantly upregulated, CFI, was correlated with BMI (r ∼ 0.56). Most of the 27 proteins that are significantly regulated by PACS were previously identified in COVID-19. Increased levels of SAA1 (Serum Amyloid A1), ITIH3 (Inter-Alpha-Trypsin Inhibitor Heavy Chain H3), CFI (Complement Factor I), LRG1 (Leucine-Rich Alpha-2-Glycoprotein 1) and decreased levels of Lumican (LUM), Fibronectin 1 (FN1), Multimerin-1 (MMRN1), Cartilage Oligomeric Matrix Protein (COMP), Transgelin 2 (TAGLN2), and Profilin 1 (PFN1) are consistent with protein regulation in severe acute patients (11, 18, 19). The proteins mirror the response in inflammation, acute phase, coagulation, extracellular matrix as well as activation and degradation of platelets. The proteome in these individuals could therefore indicate a persistent infection or persistent immune activation.
Thus, while our reanalysis of the Zurich cohort, the analysis of an analogous post severe acute COVID19 cohort, and a third cohort specifically focussing on PACS patients with fatigue and exertional intolerance subsequent to a mild acute phase, revealed interesting proteomic signatures, we can not substantiate a consistent activation of the complement system in PACS. Only two CS proteins (CFI and CFP) were found to be significantly upregulated in PACS patients, however only by one cohort each and with low effect size. To evaluate the role of age, sex, and BMI on the complement system in the different cohorts, we consequently compared the dependence of complement proteins on PACS, and the different demographic factors (Figure 1D). This illustration visualises that the dependence of BMI dominates over other demographic factors, and PACS, when it comes to the abundance of complement proteins in the plasma proteome. We must conclude that the strong signal which indicates a persistent activation of the complement system, is not a generalizable to in patients suffering from PACS, and that this signal may have emerged due to an imbalance in age and BMI between case and controls in an exploratory cohort.
Data Availability
All data produced in the present work are contained in the manuscript.
Methods
Cohorts
Charité post severe acute COVID19
The Charité post severe acute COVID19 cohort is a subcohort of the Pa-COVID-19 study, a prospective observational study registered with German clinical trials registry (DRKS 00021688) aiming to provide a platform for clinical characterisation of acute and post-acute COVID-19 (26). The study was approved by the ethics committee of Charité - Universitätsmedizin Berlin (EA2/066/20) and conducted in accordance with the declaration of Helsinki. This analysis includes patients from the Pa-COVID-19 study, who were followed-up as outpatients after acute SARS-CoV-2 infection. A total of 130 patients with plasma samples collected at month 6 post SARS-CoV-2 infection were included in this analysis. Of these 104 were hospitalised during acute infection and 44 were treated at ICU. 26 patients were not hospitalised and treated as outpatients with mild SARS-CoV-2 infection. Symptom assessment at outpatient presentation was recorded by a physician as described previously (10). Individuals having at least two symptoms out of cough, dyspnea, fatigue, headache, chest pain, gastrointestinal or depression symptoms were considered to be in the PACS group. Gastrointestinal symptoms were assigned to individuals suffering at least from two symptoms out of diarrhoea, vomitus, stomachache and nausea, while depression symptoms were assigned if at least two symptoms out of fear, panic and major depression were assigned to the individual.
We assessed the effect of the covariates sex, age, BMI and severity during the acute phase (by WHO grade) and identified none of them to be associated with PACS: point-biserial correlation(age) = 0, Cramér’s V(sex) = 0.12, point-biserial correlation(BMI) = 0.02, Cramér’s V(WHO severity) = 0.16 and p-value(age, ANOVA) = 1.00, p-value(sex, Chi2 independency test) = 0.17, p-value(BMI, ANOVA) = 0.82, p-value(WHO Severity, Chi2 independency test) = 0.68.
Charité post mild acute COVID19
A total of 53 PACS patients from the Pa-COVID-19 study with persistent moderate to severe fatigue, exertional intolerance and PEM following mild to moderate COVID-19 were included in this study. Of these n=25 fulfilled diagnostic criteria of ME/CFS based on the Canadian Consensus Criteria (CCC) referred to as PCS/ME/CFS as described previously (27). Patients were excluded from this study in case of relevant comorbidities (28), evidence of organ dysfunction, or pre-existing fatigue. The demographic characteristics of the study cohorts are presented in detail in Supplementary Table 3. Serum samples were collected a median of eight months following SARS-CoV-2 infection. As control, 27 healthy non-PACS were recruited a median of seven months after mild to moderate COVID-19 during the same time span.
To compensate for the difference between ages and achieve equal female/male contributions we opted for factorial design and split patients into sex (female/male) – age (above / below 42 years) subgroups. The average age of 42 years was taken as a threshold for age binarization. Two PACS non-ME/CFS samples and one healthy non-PACS sample were excluded due to quality reasons. Final size of the cohort was 77 individuals.
Sample preparation
Semi-automated in-solution digestion was performed as previously described for high throughput clinical proteomics (11). To reduce variability all stocks and stock plates were prepared in advance and stored at -80°C until use. Briefly, 5 μl of thawed samples were transferred to the denaturation and reduction solution (50 μl 8M Urea, 100 mM ammonium bicarbonate (ABC), 5 μl 50 mM dithiothreitol per well) mixed and incubated at 30°C for 60 minutes. Then 5 μl were transferred from the iodoacetamide stock solution plate (100 mM) to the sample plate and incubated in the dark at RT for 30 minutes before dilution with 100 mM ABC buffer (340 μl). 220 μl of this solution was transferred to the pre-made trypsin stock solution plate (12.5 μl, 0.1 μg/μl) and incubated at 37°C for 17 h (Benchmark Scientific Incu-Mixer MP4). For quenching formic acid (10% v/v, 25 μl) was added and for cleaning C18 solid phase extraction in 96-well plates (BioPureSPE Macro 96-Well, 100 mg PROTO C18, The Nest Group) was used. After drying under vacuum the eluent was reconstituted in 60 μl 0.1% formic acid. Before sample transferring to a new plate, insoluble particles were removed by centrifugation.
Mass spectrometry and Computational proteomics
Charité post severe acute COVID19
5 μg plasma peptides were chromatographically separated in a 3-minute water to acetonitrile active gradient on an Agilent Infinity II HPLC coupled to a Sciex Triple TOF 6600 mass spectrometer operating in ScanningSWATH mode with minor changes in the liquid chromatography method (13).
Raw proteomics data was processed using DIA-NN, version 1.8 (29). The MS1 and MS2 mass accuracies were set to 20 ppm, and the scan window to 6. Peptide ions were annotated with a publicly available spectral library for human plasma (30) and spectra and RT information were replaced with DIA-NN deep learning-based prediction. Protein inference was switched off and the match-between-runs (MBR) option was enabled.
Charité post mild acute COVID19
Peptide separation was accomplished in a 35-minute water to acetonitrile gradient (buffer A: 0.1 % formic acid, buffer B: 80 % ACN, 0.1 % formic acid) on an Ultimate 3000 RSLnanoHPLC coupled to a Q-Exactive Plus mass spectrometer (both ThermoFisher Scientific) operating in data independent acquisition (DIA) mode (17).
1 μg peptide mixtures were analysed by a two-linear-column system. Separation was done on a C18 column (Acclaim PepMap C18, 2 μm; 100 Å; 75μm, 15cm length, Thermo Fisher Scientific) with an active gradient from 4-38% buffer B in 30 min. To generate a spectral library, a pool of study samples was fractionated using a high pH fractionation kit according to manufacturer instructions (Pierce 84868). Fractionated samples were concentrated and separated as described above in a longer linear gradient from 5-28% buffer B in 63 min. Total acquisition time was 100 min.
The data was annotated with DIA-NN 1.8.1 (29) to the human reference proteome (Uniprot UP000005640_9606, accessed 2023-01-17) in library free mode using standard settings. Raw data were processed using DIA-NN 1.8.1 with scan window size set to 7 and MS2 and MS1 mass accuracies set to 20 and 10 ppm, respectively. Additionally, the match between-runs (MBR) option was enabled. The output was filtered at 1% FDR on peptide level.
Data processing
Charité post severe acute COVID19
Only proteotypic precursors with a Q-Value, Global Q-Value and Library Q-Value below 0.01 were used for further analysis. Precursors, which were only identified but could not be quantified due to low abundance, as well as precursors that were present in less than a third of all samples containing a precursor of the same protein were also excluded from analysis. The sample precursor quantity distributions were normalized to the median precursor abundance across all samples using precursors present in at least 90% of all samples. Missing values were imputed iteratively first across samples during the same visite from individuals exhibiting the same severity (WHO grade) during acute phase for precursors missing in up to one third of the samples per group, then over the complete 6 month followup visit for samples missing in up to one third of the samples per visit and last over all visits from acute phase till 12 months follow-up using KNNImputer with k=5 from the scikit-learn impute package for Python (31). If no precursor of a given protein was identified in the raw data of a sample, then they were only imputed on protein level after summarization of precursor to protein quantities. The resulting precursors x samples matrix was filtered on outlier samples by applying quality control filters on the number of identified precursors, precursor intensity distributions and technical parameter collected in the DIA-NN stats-report. After exclusion of samples with insufficient quality, the filtered raw precursor quantities were again normalised, imputed as described above and summarised to protein quantities using the MaxLFQ algorithm (32) implemented by DIA-NN (29, 33). Finally, missing values on protein level were imputed with KNNImputer with k=5 from the scikit-learn impute package for Python (31). The final protein matrix contained 200 proteins and 120 samples (Supplementary Table 4).
Charité post mild acute COVID19
For data integration we used DIA-NN (Demichev et al, 2020) output matrix of normalised precursor intensities for 80 study samples. After quality control three samples were identified as outliers and excluded, thus leaving for preprocessing 77 samples. Following pre-processing steps were applied: Cyclic loess group-wise (group levels: PCS(26 samples), PCS/ME/CFS (25 samples), PCHC (26 samples)) pre-normalization (34), implemented in the R package LIMMA (35) with option “fast” (36), stepwise imputation of missing values in peptides having completeness over 66%, using bpca method (37) implemented in R package pcaMethods (38), total set based cyclic loess normalisation (34) and PLM summarisation (39) implemented in R package preprocessCore (40) to get protein level data. Final protein matrix size was 233 proteins, 77 samples (Supplementary Table 5).
Statistical Analysis
Statistical analysis of proteomics data was carried out in R using publicly available packages. Balancing was achieved per cohort by splitting all patients into disease subgroups by sex – female or male and by age – below or above the mean age of individuals. For the Charité post severe acute COVID19 cohort the threshold for age subgroups was set to 52 instead of the mean of 58 for better comparison with the other two datasets. We ensured a sufficient number of individuals per subgroup. Individuals with too different age and/or BMI were excluded per dataset. In this way it was possible to attain reasonable balancing of age and BMI in disease age-sex subgroups. The final balanced Zurich cohort consisted of 85 individuals (56 non-PACS + 29 PACS with approximately the same odds ratio PACS to non-PACS as in the initial data set), the Charité post severe acute COVID19 cohort of 120 individuals (61 PACS to 59 non-PACS) and the Charité post mild acute COVID19 cohort of 77 individuals (25 PACS/ME/CFS, 26 PACS/non-ME/CFS and 26 non-PACS). The demographic characteristics for the three balanced cohorts are summarised in Supplementary Tables 1-3.
For analysis of all datasets, we used moderated statistics implemented in R package LIMMA (Ritchie, 2015) – due to borrowing information across features it is stable even for experiments with small numbers of samples in a group. We applied the same model to the unbalanced Zurich cohort as described by the authors (log2(P) is the log2-transformed expression of a protein: log2(P) ∼ 0 + Diagnosis + Age + Sex + Severity). To the balanced cohort as well as to the two validation cohorts we applied a different model accounting age and sex effects and including an additional BMI term: log2(P) ∼ 0 + Diagnosis _Sex_Age + BMI + Severity. Factor Severity was not included into the ME/CFS cohort model, as the samples were of mild to moderate severity. In the balanced model the PACS – non-PACS difference was calculated as equally weighted average of subgroup differences (below F/M stands for female/male, AMA/BMA for age above/below mean age): In the Charité post mild acute COVID19 cohort the categorical factor Diagnosis_Sex_Age had three levels (see Supplementary Table 3): PCS, PCS/ME/CFS, and PCHC; and was further split into Sex – Age subgroups. In this model the PACS – non-PACS difference was calculated as equally weighted average of eight subgroup differences (below ME stands for ME/CFS, F/M stands for female/male, A/B42 for age above/below 42): For finding regulated features we applied the following criteria: significance level alpha was set to 0.05, which was sufficient to guarantee fdr < 0.33 in contrast PACS – non-PACS. The fold change threshold FCT=1.1 was set to guarantee that the measured signal is above the noise level. As such we have taken half of the median residual standard deviation of the linear model. It was also higher than the coefficient of variation in quality control samples. To make cross datasets comparisons easier, we applied the same thresholds to all three datasets.
Acknowledgements
We thank the Pa-Covid19 study group at Charité Universitätsmedizin Berlin for contribution of data and biosamples for this study, as well as our colleagues for supporting data acquisition, and analysis. This work was supported by the Ministry of Education and Research (BMBF), as part of the National Research Initiative ‘Mass Spectrometry in Systems Medicine’ (MSCoreSys), under grant agreement number 01EP2201 (to MR) and 16LW0239K (to MM), under Grant 01EP2201 (to CS) as well as the. Weidenhammer-Zoebele Foundation, and the Berlin University Alliance (BUA Link Lab, 501_Massenspektrometrie, 501_Linklab).