Abstract
Systemic inflammation in critically ill patients can lead to serious consequences such as acute respiratory distress syndrome (ARDS), a condition characterized by the presence of lung inflammation, edema, and impaired gas exchange, associated with poor survival. Understanding molecular pathobiology is essential to improve critical care of these patients. To this end, we use multimodal profiles of SARS-CoV-2 infected hospitalized participants to the Biobanque Québécoise de la COVID-19 (BQC-19) to characterize endophenotypes associated with different degrees of disease severity. Proteomic, metabolomic, and genomic characterization supported a role for neutrophil-associated procoagulant activity in severe COVID-19 ARDS that is inversely correlated with sphinghosine-1 phosphate plasma levels. Fibroblast Growth Factor Receptor (FGFR) and SH2-containing transforming protein 4 (SHC4) signaling were identified as molecular features associated with endophenotype 6 (EP6). Mechanical ventilation in EP6 was associated with alterations in lipoprotein metabolism. These findings help define the molecular mechanisms related to specific severe outcomes, that can be used to identify early unfavorable clinical trajectories and treatable traits to improve the survival of critically ill patients.
Introduction
Acute respiratory distress syndrome (ARDS), a condition characterized by the presence of lung inflammation, edema, and impaired gas exchange, is observed in severe coronavirus disease 2019 (COVID-19) and associated with poor survival. Mechanical ventilation is an important critical care management strategy for severe COVID-19 (1). However, during the first waves of the pandemic, almost half of patients undergoing invasive mechanical ventilation died based on case fatality rate reports (2). Defining the molecular mechanisms related to specific severe outcomes, such as ARDS or acute respiratory failure that require invasive mechanical ventilation, is important to identify treatable traits and improve the survival of critically ill patients.
Endophenotypes are subgroups which are inapparent to traditional methods but share a common set of factors that dictate their response in a manner that is distinct from other sub-groups (3). Endophenotypes have been described in airway diseases such as asthma (4), COPD, (5) and chronic rhinosinusitis (6). Recent studies have identified endophenotypes associated with COVID-19 severity using different large data sets, such as bulk transcriptomic from blood (7), bulk transcriptomic from neutrophil-enriched fractions (8), or proteomic in our accompanying manuscript (9).
In this manuscript, we report multimodal characterization of one of the six endophenotypes (EPs) identified in our accompanying manuscript (9), which we found to be associated with the worst clinical trajectory during hospitalization following SARS-CoV-2 infection (including ARDS). This analysis reveals known and novel molecular effectors of immuno-thrombosis, ARDS and mechanical ventilation associated with COVID-19.
Results
Serprocidins are enriched in an endophenotype associated with COVID-19 severity and ARDS
To gain insight into the molecular mechanisms underlying the severity in COVID-19, we investigated the aptamers enriched within an endophenotype (EP6) (Figure 1 and Table S1) that we previously discovered using the circulating proteome of SARS-CoV-2 infected and hospitalized participants to BQC19 (n = 731) and found to be associated with severe outcome and ARDS (9). These aptamers correspond to measurements obtained using a multiplex SOMAmer affinity array. We used two-sided Mann–Whitney U (MWU) tests and the p-values were corrected for the number of aptamers and EPs using Benjamini–Hochberg (BH) false discovery rate (FDR). Consistent with the elevated neutrophil numbers in EP6, the most enriched aptamer was Cathepsin G (FDR = 4.10E-58), a serine protease localized in neutrophil azurophil granules. Notably, two other serprocidins, neutrophil elastase and proteinase-3, were also significantly enriched in EP6 (FDR = 1.04E-10 and FDR = 8.27E-7, respectively, Figure 1 and Table S1). Moreover, another NETosis-associated protein, Olfactomedin 4 (OLFM4) (10), was greatly enriched in EP6 (top 5%; FDR = 2.61E-46, Figure 1 and Table S1). This suggests an important role for neutrophil degranulation and/or NETosis in severe COVID-19.
Pathway enrichment analysis identified FGFR-signaling in severe COVID-19 acute respiratory distress syndrome
To capture a more comprehensive landscape of the signaling networks active in EP6, we performed pathway enrichment analysis using the Knowledge Engine for Genomics (KnowEnG) (11). Since aptamers were used to identify the EPs, it is expected that many of them would have significant associations with the EPs. As a result, we selected a very strict threshold of FDR < 1E-20 (two-sided MWU test, BH FDR) to identify top aptamers associated with EP6 (9). We then used the gene set characterization pipeline of KnowEnG with Reactome pathway collection (12). This analysis showed that EP6 is characterized by pathways associated with interleukins and cytokine signaling in the immune system. In addition, fibroblast growth factor receptor (FGFR) signaling was identified in EP6, suggesting this pathway as a potential driver of severe pathology (Table 1 and S1).
EP6 metabolomic signature was associated with pro-coagulant activities and liver dysfunction
To further characterize the pathobiology of severe COVID-19 disease, metabolomic profiling of plasma samples was performed in parallel to the aptamer analysis. The results yielded data on 1,435 metabolites, of which 576 were found significantly altered in EP6 (two-sided MWU FDR < 0.01) (Table S2). Characterization of the plasma samples supported the distinction among the different EPs at the levels of metabolomic sub-pathways and individual metabolites (Figure 2 and Table S2). The two top sub-pathways (FDR < 0.01) “Methionine, Cysteine, SAM and Taurine Metabolism” and “phosphatidinylethanolamine (PE)”, are known to interact (13, 14) and are associated with pro-coagulant activities. Phospholipids-containing microparticles from platelet activation contribute to Tissue Factor activation and pro-thrombinase activity (15), linking PE to coagulation and D-dimers level, which are elevated in EP6 (9). PE methylation acts as a major consumer of S-Adenosylmethionine (SAM) leading to the synthesis of S-Adenosylhomocysteine (SAH) and cystathionine (14), upstream of 2-hydroxybutyrate and 2-aminobutyrate in a model of hepatoxicity (16). SAH, cystathionine, 2-hydroxybutyrate and 2-aminobutyrate are all significantly enriched in EP6 (Figure 2B and 3A). To identify relationships that may shed light on factors influencing clinical laboratory results that define EP6, we also performed Spearman’s rank correlation analyses between 21 blood variables and metabolites (Table S3). SAH is positively correlated with creatinine, while cystathionine is positively correlated with creatinine and negatively correlated with albumin (Figures 4, S1-S2), congruent with the data from the model of hepatoxicity (16), linking abnormal coagulation and liver damage in severe COVID-19. We also observed kynurenine to be enriched in EP6 but depleted in EP1 (Figures 3 and 5). Kynurenine levels were also found to be positively correlated with D-Dimers levels (Figure 4 and S1-S2), suggesting a role of this metabolite of the tryptophan pathway in abnormal coagulation in severe COVID-19. Interestingly, TNFSF18 (GITR), a T-cell receptor that can activate indoleamine 2,3-dioxygenase (IDO) to increase kynurenine synthesis (17) in allergy, is enriched in EP6 and depleted in EP1 (Figures 3 and 5).
Low levels of sphingosine-1 phosphate are associated with COVID-19 ARDS
Strikingly, sphingosine-1 phosphate was enriched in EP1 and depleted in EP6 (Figures 3 and 5), confirming the previously reported negative association of this metabolite with COVID-19 ARDS (18). Accordingly, the aptamer detecting neutral ceramidase, an enzyme converting ceramides into sphingosine, is enriched in EP1 (FDR = 2.65E-10, Table S1). While changes in protein abundance as detected by aptamers may not necessarily reflect changes in enzymatic activity, it is interesting to note that dihydroceramides and ceramides are depleted in EP1 (Figure 5, Table S1). Conversely, dihydroceramides and ceramides are significantly enriched in EP6 (Figure 3, Table S1). Moreover, one of the top aptamers found enriched in EP6 (FDR = 2.10E-56, Figure 1) is serine palmitoyltransferase 2 (SPTLC2), an enzyme that is responsible for the rate-limiting step in de novo sphingolipid biosynthesis (19, 20).
Mechanical ventilation in EP6 is associated with lipoprotein metabolism
As noted in the introduction, mechanical ventilation in COVID-19 is associated with a high mortality rate. Within EP6, 55 out of the 118 individuals received mechanical ventilation. Within all EP6 members that share the overall similar aptamer signature, we identified a subset of aptamers and metabolites that were significantly different between those with or without mechanical ventilation (Figure 6A and Table S4). The second most enriched aptamer (Table 2 and S4), angiopoietin-like 3 (ANGPTL3), corresponds to a liver-secreted regulator of lipoprotein metabolism (21), which may reflect alterations observed in fatty acid metabolites (Table S4). There were no significant differences between the two groups in terms of sex, age, or body mass index (BMI) (Figure 6B-6C and Table S4). Finally, we investigated whether molecular information (proteomic and/or metabolomic) was associated with mechanical ventilation duration. We identified alpha-L-iduronidase (IDUA) to be inversely correlated with duration of ventilation in the patients with mechanical ventilation in EP6 (Spearman’s rank correlation = −0.60, FDR = 2.57E-2, Figure 6D and Table S4). Overall, these results point to an association between altered lipoprotein metabolism and mechanical ventilation in EP6.
SHC4 genotype and protein expression levels are associated with higher odds of belonging to EP6
To obtain independent support for the involvement of specific proteins identified using aptamers, we leveraged genome-wide association study (GWAS) genotyping data corresponding to these patients. We identified 25 single nucleotide variations (SNVs) distributed in 13 annotated genes that were below a p-value threshold of 1E-4 and distinguished EP6 from the rest (Table 3). We then investigated each of the SNVs to which we could assign a gene and an aptamer, to assess whether their protein product in circulation was differentially regulated by the genotype (Table 4). We discovered two genes, SHC4 (encoding SHC adaptor protein 4) and CACNA2D3 (encoding calcium voltage-gated channel auxiliary subunit alpha2 delta3) for which there was a significant association between genotype and protein expression levels (p-values < 0.05). While CACNA2D3 may have mild impact on EP6 membership (odds ratio = 0.61, Table 4), SHC4 was one of the top-enriched aptamers (position 32 out of 4,985, Figure 1) for inclusion in EP6, with an odds ratio of 11.98 for the protein product and 2.00 for the alternate allele, respectively. Finally, to gain further insight into the potential role of SHC4 in COVID-19 disease severity, we investigated the metabolites associated with SHC4 in EP6. The only significantly correlated metabolite was 6-oxopiperidine-2-carboxylate (Spearman’s rank correlation = 0.40, FDR = 0.012, Figures 4, S1-S2), where the p-values were corrected for the number of metabolites using BH FDR. Therefore, the GWAS analysis revealed that the signaling adaptor protein SHC4 may play an important mechanistic role in contributing to severe disease pathology.
Discussion
In this study, we leveraged the molecular features associated with endophenotypes discovered using unsupervised hierarchical clustering of circulating proteome of SARS-CoV-2 positive hospitalized participants to BQC19 to improve our understanding of pathobiology. This approach enabled the identification of the following molecular features: 1) the level of Sphingosine-1 phosphate is highest in EP1 and lowest in EP6, reflecting an inverse association with time to admission to ICU and/or death; 2) interleukins and FGFR signaling are molecular pathways associated with severe COVID-19; 3) genetic and environmental determinants of SHC4 expression are associated with inclusion in EP6; 4) the combined proteomic and metabolomic signature of EP6 supports a pathologic role for neutrophil-associated procoagulant activity in COVID-19 ARDS; and 5) mechanical ventilation in EP6 is associated with altered lipoprotein metabolism.
COVID-19 ARDS molecular pathology: a counterbalance between ceramides and sphingosine is associated with critical illness
The datasets used in this study carry rich molecular information on mechanisms of disease for COVID-19. EP6 is characterized by its enrichment in ARDS (50% vs 9% in EP1), acute kidney injury (50%) and liver dysfunction (22%) (9). Low levels of Sphingosine 1-phosphate (S1P) are associated with ARDS and have been shown to be associated with greater ICU admission and decreased survival in COVID-19 (18), and is supported by this study where S1P levels were found depleted in EP6 but enriched in EP1 (the endophenotype with the best prognostic). The study also supports a shunting away from sphingosine towards more pro-inflammatory ceramides that are associated with metabolic disorders (22), suggesting a counterbalance between ceramides and sphingosine, where the former is associated with poorer outcomes during critical illness, whereas higher levels of the latter is associated with more favorable outcomes in ARDS.
Molecular mechanisms of mechanical ventilation in EP6: a potential role for ANGPTL3
ANGPTL3 is a liver-secreted enzyme that can inhibit lipoprotein and endothelial lipase (23). Loss-of-function of ANGPTL3 is associated with hypolipidemia (24). Therefore, increased circulating levels of ANGPTL3 would be expected to raise plasma levels of triglycerides and phospholipids which is consistent with the observed metabolomic profiles (Table S4). Whether these alterations are sufficient to increase the likelihood of mechanical ventilation remains speculative. Nevertheless, it is interesting to note that administration of medium- and long-chain triglycerides can aggravate gas exchange abnormalities in ARDS (25) and may warrant the preclinical investigation of ANGTPL3 antagonists like evinacumab or other functionally similar therapeutic agents (26) in ameliorating outcomes in ARDS. As for the duration of mechanical ventilation, we discovered an inverse correlation with IDUA, a lysosomal enzyme required for the degradation of heparan sulfate and dermatan sulfate (27). Our current data cannot distinguish between the presence of this enzyme as a biomarker or as an active mediator that influences the course of COVID-19-associated acute respiratory failure.
Multimodal profiling of EP6 supports a role for neutrophil-derived procoagulant activity-mediated organ damage in COVID-19
COVID-19 ARDS has been found to be more frequently associated with thrombotic complications than non-COVID-19 ARDS (28, 29). In addition, histopathologic evidence of pulmonary microthrombi is frequently observed in autopsies of individuals deceased with COVID-19 (30). Early reports suggested a role for neutrophil-mediated release of NETs contributing to endothelial dysfunction as a mechanism of microthrombosis in COVID-19 ARDS (31-42). The neutrophil-derived serprocidins cathepsin G, neutrophil elastase, and proteinase-3, are significantly enriched in EP6. In addition to their roles in bacterial killing, serprocidins couple innate immunity to coagulation, where they promote coagulation by enhancing tissue factor and factor XII-dependent coagulation (43). Serprocidins are also important regulators of NETosis (44). In pediatric sepsis, another NETosis-associated protein olfactomedin 4 (OLFM4) (10), enriched in EP6, is associated with increased odds of having greater organ failure and death (45). The procoagulant environment associated with COVID-19-associated organ damage is further supported by the alteration in metabolic profiles in EP6. Phosphatidinylethanolamines (PE) become exposed at the surface of cell membranes upon exposure to stress, inflammation, and cell death (46, 47). In a Syrian hamster model, infection with SARS-CoV-2 markedly increased circulating PE expression in the animals that were fed a high salt, high fat diet, demonstrating the interaction between infection and metabolic disorder (48). Furthermore, the membranes of azurophilic granules, which contain serprocidins, are enriched in PE (49). Accordingly, EP6 is associated with decreased platelet counts, increased D-Dimers, international normalized ratio (INR) and activated partial thromboplastin time (aPTT) (9), hallmarks of abnormally activated coagulation pathways that are associated with serious and often lethal complication of sepsis (50). Accordingly, we report an association between kynurenine and D-dimer, pointing to an involvement of tryptophan metabolism in abnormal coagulation. Interestingly, TNFSF18, a T-cell receptor that promotes leukocyte adhesion to endothelial cells (51), is enriched in EP6. A SNV found in the proximity of TNFSF18 was found associated with severity of COVID19 in the western Indian population (52). Unfortunately, this SNV was absent from the array used in BQC19 preventing the confirmation of that observation, but it is nevertheless suggestive of a link between TNFSF18, kynurenine, and abnormal coagulation in severe COVID-19. Collectively, these findings point to immuno-thrombosis activity in EP6 in response to SARS-CoV-2 infection that is associated with abnormal coagulation. Recent results from the Antithrombotic Therapy to Ameliorate Complications of COVID-19 (ATTACC) study showed anticoagulation to be efficacious in moderate but not severe COVID-19 disease (53, 54), revealing a need for alternative management approaches in critically ill patients.
Novel molecular markers of COVID-19 pathology: FGFR and SHC4
Two novel molecular factors associated with COVID-19 ARDS that were identified by our study are FGFR and SHC4. Circulating levels of the pro-angiogenic FGF-2 has been associated with COVID-19 severity and creatinine levels in a study of 208 SARS-CoV-2 positive participants (55). It is also noteworthy that the use of nintedanib, an inhibitor of FGFR, vascular endothelial growth factor receptor (VEGFR), and platelet-derived growth factor receptor (PDGF-R) that is approved for use in interstitial lung disease, improved pulmonary inflammation and helped three middle-aged obese COVID-19 patients with markedly impaired lung function wean off mechanical ventilation (56). While the adaptor protein SHC4 has not been experimentally demonstrated to modulate FGFR signaling, a 12-gene biomarker signature associated with melanoma contains FGFR2, FGFR3 and SHC4 (57). It is attractive to speculate that SHC4 may act downstream of FGFR or other associated growth factor receptors, favoring heightened procoagulant activity associated with COVID-19 ARDS. In view of the limited knowledge of this understudied member of the SHC family, we looked at the metabolites associated with SHC4 to gain insights into its possible functions. Interestingly, 6-oxopiperidine-2-carboxylate (the only metabolite with a significant correlation with SHC4 in EP6) was found to be negatively associated with glomerular filtration rate in a GWAS study of kidney disease and hypertension in African Americans (58). Acute kidney injury is a frequent complication of acute liver failure (59), and liver dysfunction is associated with abnormal coagulation (60). Thus, the overall molecular information coming from the multi-modal analysis of EP6 points to a triad between liver function, kidney function, and hemostasis that becomes dysfunctional following ARDS-associated inflammation driven by SARS-CoV-2 infection. Whether the presence of SHC4 in circulation is a marker of dysfunction of this triad, or an active mediator, remains to be determined. Moreover, the identity of the cells expressing SHC4 leading to its presence in the circulation is not known. Taken together, our identification of FGFR and SHC4 signaling pathways distinguishing EP6 from other endophenotypes provides a rationale for investigation of their potential utility as biomarkers of severe disease activity and/or the use of antagonists of those pathways to treat severe COVID-19 disease manifestations.
Methods
Datasets and preprocessing
We obtained data corresponding to clinical, genomics, metabolomics, and proteomics (circulating proteome) of n = 731 hospitalized and SARS-CoV-2 positive patients (based on qRT-PCR) from the Biobanque Québécoise de la COVID-19 (BQC19; www.quebeccovidbiobank.ca). The circulating proteome measurements obtained using a multiplex SOMAmer affinity array (SomaLogic, 4,985 aptamers) between April 2020 and April 2021 were preprocessed and used to identify six endophenotypes (EP1-EP6). The details of the cohort, circulating proteome data processing, and identification of endophenotypes are provided in the accompanying manuscript (9).
In this study, we used the batch-normalized, missing-values-imputed, and log-transformed version of the BQC19 metabolomic data (1,435 metabolites). BQC19 GWAS imputation data were generated by Tomoko Nakanishi at Brent Richards lab, Jewish General Hospital and McGill University. Detailed codes used for generating the data can be found in: https://github.com/richardslab/BQC19_genotype_pipeline.
Analyses of the GWAS dataset
For the GWAS analyses, annotation of SNVs were done using the biomaRt package (61) from R (62), and all analyses were done using R version 4.1.3. Quality control steps were derived in majority from a 2017 QC tutorial article (63). At the beginning, we had 867,450 markers and 2,429 samples. We imported Plink format data into R using the “read_plink” function from genio R package (64). We removed 103,592 non ACGT bi-allelic markers. We calculated the predicted sex by looking at the rate of homozygote markers on chromosome 23. We removed 3,588 markers with call rates < 98%, 448,932 monomorphic markers and markers with MAF < 0.05, and 28,092 markers with Hardy–Weinberg equilibrium < 1E-6 (calculated by the “HWE.exact” function from the genetics package (65) from R. For the EP6 cluster group analyses, we additionally removed 1,747 samples that were not in the cluster analysis, 8 samples with a sex discrepancy (based on predicted sex calculated earlier and reported sex), and 3 samples with a heterozygosity rate > 3 standard deviations. We finally removed a pair of samples which had approximately the same genome, likely due to an error of manipulation. As we could not know which one was correct, we removed both. We also found 2 pairs of individuals with a pi-hat of ∼0.5 (meaning first degree relatives), so we kept the one sample per pair with the higher call rate. All other pairs of individuals had a pi-hat < 0.21 that is judged acceptable considering our population. Pi-hats were calculated with the “snpgdsIBDMoM” function from SNPRelate R package (66). At the end of quality controls, 283,246 markers on 655 samples had been used to perform association analyses.
To perform the principal component analyses (PCA), we took a subsample of independent markers (pruning) with a maximum sliding window of 500,000 base pairs and a linkage disequilibrium (LD) threshold of 0.2 using the “snpgdsLDpruning” function from the SNPRelate R package. We ran the PCA with the “snpgdsPCA” function from SNPRelate package from R. The first 2 principal components (PCs) were considered significant.
For the GWAS analyses of the 283,246 remaining markers between EP6 cluster compared to all others, we used a logistic regression model, with the dichotomous variable indicating if the participants belong to the EP6 cluster as the outcome variable. We used the additive model for markers as an independent variable, and we adjusted the models with the first two PCs. Odds ratio and p-values were calculated on each model. Quantile–quantile (Q–Q) plots have been performed as quality control of the models; p-values were plotted using “qqplot.pvalues” function from gaston R package (67) (data not shown). We compared the aptamers’ normalized level of expression between the three groups of genotypes for each studied gene by performing standard analysis of variance (ANOVA) analyses followed by Tukey post hoc tests (referred to in this study as pQTL analysis). Since aptamers tested are limited compared to SNVs, we fixed significance p-value threshold below 1E-4 to report more SNVs instead of the more common 1E-5 suggestive threshold. Finally, to identify if EP6 may be characterized by the aptamers and the SNVs, we performed multiple logistic regression analysis including aptamers expression values, SNV genotypes (additive model), and the two principal components. Odds ratios are reported with 95% confidence intervals.
Metabolomic pathway characterization of EP6
The 1,435 metabolites measured were organized into 122 sub-pathways in the original dataset (denoted as “SUB_PATHWAYS”). We first identified metabolites whose values were significantly higher or lower in EP6 compared to other EPs (two-sided MWU test, FDR<0.01). Then, we used these metabolites to perform pathway enrichment analysis (one-sided Fisher’s exact test) based on 122 pathways. The resulting p-values were then corrected for multiple tests using Benjamini– Hochberg FDR.
Statistics
Several non-parametric tests, including the MWU test, Fisher’s exact test, and Spearman’s rank correlation, were used in this study. Benjamini–Hochberg FDR was used to adjust the p-values for multiple tests.
Study approval
The study was approved by the Institutional Ethics Review Board of the “Centre intégré universitaire de santé et de services sociaux du Saguenay-Lac-Saint-Jean” (CIUSSS-SLSJ) affiliated to the Université de Sherbrooke [protocol #2021-369, 2021-014 CMDO – COVID19].
Data Availability
Input data corresponding to the cohort can be obtained form BQC19 (www.quebeccovidbiobank.ca). The data generated as a result of the analyses are provided as tables and supplementary tables.
Conflict of Interests
The authors have declared that no conflict of interest exists.
Acknowledgements
This work was made possible through open sharing of data and samples from the Biobanque Québécoise de la COVID-19, funded by the Fonds de recherche du Québec - Santé, Génome Québec, the Public Health Agency of Canada and, as of March 2022, the Ministère de la Santé et des Services Sociaux du Québec. We thank all participants to BQC19 for their contribution. This study was supported by the Fonds de recherche du Québec - Santé (FRQS)-Cardiometabolic Health, Diabetes and Obesity Research Network (CMDO)-Initiative. This work was also supported by Natural Sciences and Engineering Research Council of Canada (NSERC) grant RGPIN-2019-04460 (AE).