Abstract
Background The worldwide pandemic caused by SARS-CoV-2 has claimed millions of lives and has had a profound effect on global life. Understanding the pathogenicity of the virus and the body’s response to infection is crucial in improving patient management, prognosis, and therapeutic strategies. To address this, we performed functional transcriptomic profiling to better understand the generic and specific effects of SARS-CoV-2 infection.
Methods Whole blood RNA sequencing was used to profile a well characterised cohort of patients hospitalised with COVID-19, during the first wave of the pandemic prior to the availability of approved COVID-19 treatments and who went on to survive or die of COVID-19, and patients hospitalised with influenza virus infection between 2017 and 2019. Clinical parameters between patient groups were compared, and several bioinformatic tools were used to assess differences in transcript abundances and cellular composition.
Results The analyses revealed contrasting innate and adaptive immune programmes, with transcripts and cell subsets associated with the innate immune response elevated in patients with influenza, and those involved in the adaptive immune response elevated in patients with COVID-19. Topological analysis identified additional gene signatures that differentiated patients with COVID-19 from patients with influenza, including insulin resistance, mitochondrial oxidative stress and interferon signalling. An efficient adaptive immune response was furthermore associated with patient survival, while an inflammatory response predicted death in patients with COVID-19. A potential prognostic signature was found based on a selection of transcript abundances, associated with circulating immunoglobulins, nucleosome assembly, cytokine production and T cell activation, in the blood transcriptome of COVID-19 patients, upon admission to hospital, which can be used to stratify patients likely to survive or die.
Conclusions The results identified distinct immunological signatures between SARS-CoV-2 and influenza, prognostic of disease progression and indicative of different targeted therapies. The altered transcript abundances associated with COVID-19 survivors can be used to predict more severe outcomes in patients with COVID-19.
Introduction
The global pandemic caused by novel coronavirus SARS-CoV-2 emerged at the end of 2019 (1). By May 1st, 2021, more than 150 million people had been infected, leading to over 3 million deaths worldwide (2).
Previous studies investigating the differences between patients with COVID-19 or influenza on admission to hospital have found that both patient groups present with similar levels of systemic inflammation markers like C-reactive protein (CRP), white blood cell count (WBC), neutrophil count and neutrophil/lymphocyte (N/L) ratio (3). After admission patients hospitalised with COVID-19 were found to have a higher risk of developing respiratory distress, pulmonary embolism, septic shock and haemorrhagic strokes compared to influenza patients (4). In addition, the median length of stay in the intensive care unit was twice as high for COVID-19 patients compared to influenza patients, and COVID-19 patients were more likely to require mechanical ventilation (4). Furthermore, the in-hospital mortality for COVID-19 patients was 16.9% compared to 5.8% for influenza patients indicating a roughly three times higher relative risk of death for COVID-19 (4).
The viral immune response against influenza is well characterised (5). Briefly, initial defence involves cells of the innate immune system (e.g. macrophages, granulocytes and dendritic cells (DCs)), which release proinflammatory cytokines and type I interferons (IFN) to inhibit viral replication, recruit other immune cells to the site of infection, and stimulate the adaptive immune response. The adaptive immune response consists of a humoral and a cellular mediated immunity, initiated principally by virus-specific antibodies and T cells. Current understanding of the immune response specific to SARS-CoV-2 indicates that COVID-19 severity and duration are largely due to a total or early evasion of an innate immune and type I and type III interferon (IFN) responses (6–9), while patients infected with influenza are able to express type I and type II IFNs at a significantly higher concentration (3) which correlates with quicker recovery and decreased disease severity and mortality (10, 11). Consistent with this observation, early administration of inhaled recombinant IFN-beta for COVID-19 patients was associated with a lowered in-hospital mortality and quicker recovery (12, 13). Despite the reduced IFN response in patients with COVID-19 the expression of pro-inflammatory cytokines occurs for a prolonged time at similar levels with influenza patients (3), and interleukin (IL) −6 and IL-10 (14–16) and CCL3 (3) were associated with increased disease severity for COVID-19. The presence of CD4+ and CD8+ T cells, and antibodies were correlated with a positive patient outcome in the case of COVID-19 (17). This puts elderly patients at a higher risk due a smaller naïve T cell pool (18–20) and an absence of a pre-existing adaptive immunity (21) resulting in a potential delayed T cell response to a novel virus like SARS-CoV-2 (22). Delaying an adaptive immune response which, when combined with a high viral load, could lead to a poor outcome (23). As discussed by Sette and Crotty (24) an absent T cell response may cause an increased innate response attempting to control the virus resulting in an excessive lung immunopathology.
To investigate unique molecular features associated with COVID-19, a cohort of patients was identified from hospitalised individuals that were positive for SARS-CoV-2. As a comparator an equivalent group of patients hospitalised with influenza virus were identified. An extensive record of clinical parameters and peripheral blood, used for RNA-seq to obtain a global blood transcriptome overview, were taken at point of care and could therefore be correlated with any molecular signatures of disease. Through these side-by-side comparisons, we aim to identify distinct patterns of blood transcript abundances and cellular composition related to specific antiviral immune responses. Furthermore, we aim to identify a promising prognostic signature indicative of COVID-19 outcome.
Materials & Methods
Ethics and consent
The study was approved by the South Central - Hampshire A Research Ethics Committee (REC): REC reference 20/SC/0138 (March 16th, 2020) for the COVID-19 point of care (CoV-19POC) trial; and REC reference 17/SC/0368 (September 7th, 2017) for the FluPOC trial. For full inclusion and exclusion criteria details see (25) and (26). Patients gave written informed consent or consultee assent was obtained where patients were unable to give consent. Demographic and clinical data were collected at enrolment and outcome data from case note and electronic systems. ALEA and BC data management platforms were used for data capture and management.
Study design and participants
All participants were recruited within the first 24 hours of admission to two large studies of molecular point-of-care testing (mPOCT) for respiratory viruses (CoV-19POC and FluPOC). Blood samples including whole blood in PAXgene Blood RNA tubes (BRT) (Preanalytix) were collected from SARS-CoV-2 positive patients and influenza positive patients, within 24 hours of enrolment, and stored at −80°C. The studies were prospectively registered with the ISRCTN trial registry: ISRCTN14966673 (COV-19POC) (March 18th, 2020), and ISRCTN17197293 (FluPOC) (November 13th, 2017). The COV-19POC study was a non-randomised interventional trial evaluating the clinical impact of mPOCT for SARS-CoV-2 in adult patients presenting to hospital with suspected COVID-19, using the QIAGEN QIAstat-Dx PCR testing platform with the QIAstat-Dx Respiratory SARS-CoV-2 Panel (27). The trial took place during the first wave of the pandemic, from 20th March to 29th April 2020, and prior to the availability of approved COVID-19 treatments. All patients were recruited from the Acute Medical Unit (AMU), Emergency Department (ED) or other acute areas of Southampton General Hospital. The FluPOC study was a multicentre randomised controlled trial evaluating the clinical impact of mPOCT for influenza in hospitalised adult patients with acute respiratory illness, during influenza season, using the BioFire FilmArray platform with the Respiratory Panel 2.1 (28). The trial took place during influenza seasons over the two winters of 2017/18 and 2018/19. All patients were recruited from the AMU and ED of Southampton General Hospital and Royal Hampshire County Hospital.
Extraction of RNA from clinical samples and Illumina sequencing
Total RNA was extracted from PAXgene BRT using the PAXgene Blood RNA Kit (PreAnalytix), according to the manufacturer’s protocol at Containment Level 3 in a Tripass Class I hood. Extracted RNA was stored at −80°C until further use. Following the manufacturer’s protocols, total RNA was used as input material into the QIAseq FastSelect–rRNA/Globin Kit (Qiagen) protocol to remove cytoplasmic and mitochondrial rRNA and globin mRNA with a fragmentation time of 7 or 15 minutes. Subsequently the NEBNext® Ultra™ II Directional RNA Library Prep Kit for Illumina® (New England Biolabs) was used to generate the RNA libraries, followed by 11 or 13 cycles of amplification and purification using AMPure XP beads. Each library was quantified using Qubit and the size distribution assessed using the Agilent 2100 Bioanalyser and the final libraries were pooled in equimolar ratios. Libraries were sequenced using 150 bp paired-end reads on by an Illumina® NovaSeq 6000 (Illumina®, San Diego, USA). Raw fastq files were trimmed to remove Illumina adapter sequences using Cutadapt v1.2.1 (29). The option “−O 3” was set, so that the 3’ end of any reads which matched the adapter sequence with greater than 3 bp was trimmed off. The reads were further trimmed to remove low quality bases, using Sickle v1.200 (30) with a minimum window quality score of 20. After trimming, reads shorter than 10 bp were removed.
Data QC and alignment
Quality control (QC) of read data was performed using FastQC (31) (v0.11.9) and compiled and visualised with MultiQC (32) (v1.5). Samples with <20 million total reads were excluded from further analysis. The STAR index was created with STAR’s (33) (v2.7.6a) genomeGenerate function using GRCh38.primary_assembly.genome.fa and gencode.v34.annotation.gtf (34) (both downloaded from GENCODE), with –sjdbOverhang 149 and all other settings as default. Individual fastq files were aligned using the --twopassMode Basic flag, with the following parameters specified (following ENCODE standard options): --outSAMmapqUnique 60, outFilterType BySJout, -- outFilterMultimapNmax 20, --alignSJoverhangMin 8, --outFilterMismatchNmax 999, -- outFilterMismatchNoverReadLmax 0.04, --alignIntronMin 20, --alignIntronMax 1000000, -- alignMatesGapMax 1000000 and all other options as default. For rMATs (35) (v4.1.0) analysis, STAR was run again as before, but with the addition of --alignEndsType EndToEnd. Samtools (36) (v1.8) was used to sort and index the aligned data.
Comparisons of baseline clinical characteristics
Baseline clinical characteristics of the patient groups were assessed using R (37) (v4.0.2) and RStudio (38) (v1.3.959) for comparisons between COVID-19 versus influenza, and COVID-19 survivors versus non-survivors. Extreme outliers (values < Q1 - 3 interquartile range, or > Q3 + 3 interquartile range) were identified with the R package rstatix (39) (v0.7.0) and removed. Statistical testing was performed including a Shapiro-Wilk test to assess for data normality followed with either an unpaired parametric T-test (Shapiro-Wilk test p-value > 0.05) or an unpaired non-parametric Wilcoxon test (Shapiro-Wilk test p-value < 0.05) for continuous data, or a Chi-square test for categorical data. The R package Table1 (40) (v1.3) was used to plot the baseline clinical characteristics.
Systems immunology-based analysis of blood transcript modules
Blood transcript module (BTM) analysis was performed with molecular signatures derived from 5 vaccine trials (41) as a reference dataset, and BTM activity was calculated using the BTM package (41) (v1.015) in Python (42) (v3.7.2) using the normalized counts as input. Module enrichment significance was calculated using CAMERA (43) (v3.46.0). The significance threshold for the linear model was set at false discovery rate (FDR) 0.05 for the comparison between patients with COVID-19 or influenza.
Unbiased gene clustering analysis
Gene co-expression analysis was performed with BioLayout (44) (v3.4) using a correlation value of 0.95, other settings were kept at default. Clusters were manually assessed to determine gene expression differences depending on for example patient cohort. Genes were subsequently analysed with ToppGene (45) gene list enrichment analysis using the default settings.
Differential gene expression analysis between patient groups
HTSeq (46) (v0.11.2) count was used to assign counts to RNA-seq reads in the Samtools sorted BAM file using GENCODE v34 annotation. Parameters used for HTSeq were --format=bam, --order=pos, -- stranded=reverse, --type=exon and the other options were kept at default. EdgeR (47) (v3.30.3) was used for differential gene expression analysis with R (v4.0.2) in RStudio (v1.3.959). Genes with low counts across all libraries were filtered out using the filterByExpr command. Filtered gene counts were normalised using the trimmed mean of M-values (TMM) method. Differentially expressed genes were identified, after fitting the negative binomial models and obtaining dispersion estimates, using the exact test and using a threshold criteria of FDR p-value < 0.05 and log2 fold change < −1 and > 1. Genes which were within the threshold criteria were used for ToppGene gene list enrichment analysis. A principal component analysis (PCA) graph was constructed based on all differentially expressed genes to assess sample clustering.
Assessment of difference in adaptive immune response related gene expression
A higher abundance of transcripts from 83 immunoglobulin genes, overlapping with the genes in the Gene Ontology (GO) (48, 49) biological process term ‘adaptive immune response’ (Additional file 1), was found in patients with COVID-19 compared to influenza. To assess gene transcript abundance differences for these 83 genes in each patient a heatmap was generated and Z-scores were summed to give an overall positive (high) or negative (low) total Z-score. Patient baseline clinical characteristics were explored, as above, for any explanatory factors for the involvement of a high or low total Z-score between patients with COVID-19 or influenza, and those that survived COVID-19 versus those that died within 30 days of hospital admission. Metadata comparison plots were made with the R package ggplot2 (50) (3.3.2) and statistical testing with the R package ggpubr (51) (v0.4.0).
Topological mapping of global gene patterns
TopMD Pathway Analysis (52) was conducted using the differential transcript abundances identified by differential gene expression analysis, generating a map of the differentially activated pathways between all patients with COVID-19 or influenza. The TopMD pathway algorithm measures the geometrical and topological properties of global differential gene expression embedded on a gene interaction network (53). This enables plotting and measurement of the differentially activated pathways through extrapolation of groups of mechanistically related genes, called TopMD pathways. TopMD pathways possess a natural hierarchical structure and can be analysed for enriched GO terms, by chi-square test.
Assessment of differential splicing between patient groups
Three different tools were used to assess differential gene splicing between patients with COVID-19 or influenza, and COVID-19 survivors or non-survivors after 30 days of hospital admission. rMATs (35) (v4.1.0) was run using BAM files with soft clipping suppressed, generated with STAR and GENCODE v34 gene annotation. Additional settings used were -t paired, --readLength 150 and --libType fr-firststrand. Results were filtered for FDR p-value < 0.05. LeafCutter (54) (v0.2.9) was run in stages following the Differential Splicing protocol (55) (bam2junc.sh generated junction files from BAMs, leafcutter_cluster.py grouped junctions into clusters, leafcutter_ds.R tested for differential splicing, all with default settings, except –min_samples_per_intron was set to be approximately 60% of the smaller group size for each comparison (46 for COVID-19 vs influenza, 9 for COVID-19 survivors vs non-survivors), and results were filtered to exclude events with delta PSI <10%, based on recommendations (56). The LeafViz script (57), prepare_results.R was used to generate a data table from which gene names for significant events were extracted, while the map_clusters_to_genes R function was used to assign genes to non-significant tested events. Overlap between LeafCutter differentially spliced and EdgeR differentially expressed genes was tested for significance using Fisher’s Exact Test (fisher.test in R (v3.5.1) using a 2×2 contingency table and two.sided alternative hypothesis). MAJIQ (58) (v2.2) was run in two stages (majiq build and majiq deltapsi) with default settings, and results were filtered (delta PSI >20%, probability >0.95) using Voila (58) (v2.0).
In silico immune profiling predicting immune cell levels between patient groups
Relative abundance of 22 immune cell types and their statistical significance was deconvoluted from whole blood using the reference gene signature matrix (LM22) using CIBERSORTx (59). CIBERSORTx analysis was conducted on the CIBERSORTx website (60) using 100 permutations. Immune cell distribution between the groups were compared by Mann–Whitney test.
Identification of immune signatures as a predictor for COVID-19 outcome
Transcript to transcript gene co-expression network analysis with BioLayout 3D (v3.4) (Pearson coefficient 0.85, MCL=1.7) assembled 537 genes differentially expressed (EdgeR, FDR < 0.5 and |log2 fold change > 1|) in blood taken on admission between patients with COVID-19 who either survived or died of COVID-19 within 30 days of admission to hospital. Combinations of 100 genes from the top 4 clusters were assessed as predictor variables for outcome using Boosted Logistic Regression, Bayesian Generalised Linear and RandomForest models within SIMON (61) (v0.2.1) installed with Docker (62) (v20.10.2). TMM normalised gene expression data was centred and scaled. Covariant features were removed based on correlation analysis. Samples were randomly split into train:test subsets at the ratio 75%:25%.
Results
Number of participants
In total RNA-seq was done for 80 patients with COVID-19 and 88 patients with influenza. Five patients with influenza failed QC (read count < 20M) leaving 83 patients with influenza for analysis, of which 76% were infected with the influenza A virus and 22% with influenza B virus. Two patients with COVID-19 were identified by PCA as outliers, subsequent assessment revealed an elevated white blood cell and lymphocyte count caused by pre-existing underlying chronic lymphocytic leukaemia, and these patients were excluded from further analyses (Supplementary figure 1). This left 78 patients with COVID-19, of whom 62 survived and 16 died within 30 days of hospital admission, and 83 patients with influenza.
Clinical differences
The baseline clinical characteristics of the patients used in this study for the comparison between influenza and COVID-19 were assessed and no differences in distribution of sex or age were detected between patient groups, however, more patients with influenza were of White British ethnicity (p-value 1.12×10−05) and more were current smokers (p-value 9.07×10−05). There were also differences in the proportion of cases with underlying comorbidities, with patients with COVID-19 more commonly having hypertension (p-value 1.42×10−02), liver disease (p-value 3.63×10−02) and diabetes mellitus (p-value 6.44×10−03) than those with influenza. However, underlying respiratory disease was more common in patients with influenza (p-value 1.22×10−03). Patients with COVID-19 generally exhibited more severe clinical symptoms. While the National Early Warning Score 2 (NEWS2) was not different between patients with COVID-19 or influenza, patients with COVID-19 had a higher respiratory rate (p-value 2.79×10−02) and a greater proportion of patients with COVID-19 were on supplementary oxygen at hospital admission (p-value 6.81×10−03). Laboratory results indicated higher levels of C-reactive protein (p-value 1.73×10−03) and lymphocytes (p-value 2.76×10−02) in patients with COVID-19. Furthermore, COVID-19 patients had a longer duration of symptoms prior to presentation to hospital (p-value 1.17×10−05) and once admitted a longer length of stay (p-value 5.51×10−10). Longer stay time was associated with increased 30 day mortality after hospital admission and patients with COVID-19 were more likely to have died compared to patients with influenza (p-value 4.42×10−05) (Table 1).
Between patients with COVID-19 who survived and those who died, a fatal outcome occurred in older patients (p-value 2.58×10−09). COVID-19 non-survivors also had a shorter duration of symptoms before being admitted to hospital (p-value 5.38×10−03). COVID-19 non-survivors more commonly had underlying comorbidities including hypertension (p-value 1.93×10−03), cardiovascular disease (p-value 3.97×1003), diabetes mellitus (p-value 2.31−10−02) and underlying respiratory disease (p-value 1.06×10−02). While the NEWS2 scores were not different, COVID-19 survivors had a higher heart rates than COVID-19 non-survivors (p-value 9.27×10−03). Laboratory results showed an increase of white blood cell count (p-value 3.83×10−02), total protein levels (p-value 2.5×10−03), creatinine (p-value 3.87×10−02), alanine aminotransferase levels (p-value 2.85×10−02), troponin levels (p-value 2.37×10−04), tumour necrosis factor α (TNFα) (p-value 1.43×10−02), interleukin (IL)-6 levels (p-value 2.78−10−03), IL-8 (p-value 2.24×10−02), IL-1β (p-value 3.78×10−02) and IL-10 (p-value 7.51×10−02) in COVID-19 non-survivors. Patient outcome and length of hospital stay were different due to separation based on patient survival (Table 2).
Molecular differences
RNA-seq was used to investigate potential blood transcriptomic signatures of immune activation between patients infected with SARS-CoV-2 versus influenza, and COVID-19 survivors versus those that died. A median of 60.4 million reads in patients with COVID-19, and 58.9 million reads in patients with influenza was obtained (Supplementary figure 2A). In patients who died of COVID-19 a median of 55.7 million reads was obtained and for COVID-19 survivors the median was 62.6 million reads (Supplementary figure 2B). Clustering analysis between patients with COVID-19 or influenza indicated a homogeneity of blood transcriptome profiles suggesting any variation between groups to be subtle (Supplementary figure 3A). A partial separation was found between patients who survived or died of COVID-19 based on patient outcome after 30 days of hospital admission, indicative of a larger variation in the blood transcriptome (Supplementary figure 3B).
Contrasting innate and adaptive immune programmes
Previous studies have suggested that severe COVID-19 is associated with aberrant immune pathology (63, 64), and therefore BTM analysis and gene co-expression analysis were used to investigate the balance between the innate and adaptive response in patients with either COVID-19 or influenza virus and to identify patterns of changes associated with each arm of the immune system. A systems immunology-based analysis of BTMs between patients with COVID-19 or influenza revealed several differences (Figure 1). For the upregulated BTMs in COVID-19, signatures were observed related to the cell cycle and adaptive immune response, primarily CD4+ T cells, B cells, plasma cells and immunoglobulins. In contrast, the downregulated BTMs showed signatures associated with monocytes, inflammatory signalling and an innate antiviral and type I IFN response.
Gene co-expression analysis was done on a total of 4,093 transcript abundances for unbiased gene clustering between patients with COVID-19 or influenza (TMM normalised counts per million EdgeR FDR p-value <0.05) and identified 50 clusters of 4 or more genes (BioLayout 3D, Pearson R >= 0.85, MCL = 1.7). These clusters are clearly separated into groups comprising increased transcript abundances in blood of patients with influenza or COVID-19 (Figure 2) and the top 12 clusters are shown in Table 3.
Interestingly, an increased abundance of gene transcripts in patients with COVID-19 are involved in adaptive immunity, pointing to activation/priming of T cells and B cells, including induction of proliferation (cluster 4, FDR p-value 3.97×10−57), Additionally, an increased abundance of gene transcripts encoding neutrophil degranulation (cluster 9) and blood coagulation (cluster 6) clearly differentiated patients with COVID-19 from patients with influenza (FDR p-value 4.33×10−19 and FDR p-value 2.84×10−12 respectively). In contrast, an decreased abundance of gene transcripts in the blood transcriptome of patients with COVID-19 in comparison to patients with influenza were associated with innate immunity, including biological processes involved with defence response to virus (cluster 2) (FDR p-value 1.34×10−37), type 1 helper T cell stimulation (cluster 10) (FDR p-value 4.53×10−03), dendritic cell morphogenesis (cluster 11) (FDR p-value 1.37×10−02), and myeloid cell activation (clusters 1 and 8) (FDR p-value 5.16×10−13 and FDR p-value 4.15×10−04 respectively). Importantly, the largest decrease of transcript abundances in patients with COVID-19 comprised genes expressed in plasmacytoid dendritic cells (pDC) (FDR p-value 4.17×10−22), indicating impaired immune responses to viruses (FDR p-value 1.34×10−37) and impaired IFN signalling (FDR p-value 5.56×10−30). This was suggestive of contrasting innate and adaptive immune programmes between the different infections and these were further investigated.
High abundance of immunoglobulin genes associated COVID-19
A total of 20,542 abundance measures of gene transcripts were obtained after filtering out transcripts with low counts, of which 4,094 transcripts were found to be significantly different between patients with COVID-19 or influenza (FDR p-value < 0.05) of which, 197 transcripts exceeded a log2 fold change of < −1 or >1, with 126 transcripts showing higher abundance in patients with COVID-19 and 71 transcripts showing higher abundance in patients with influenza (Figure 3A and Additional file 2). Complimentary to the findings from gene co-expression analysis, the transcripts with increased abundance in patients with COVID-19 were found to be involved with humoral immune response, complement activation and B cell mediated immunity (Figure 3B).
83 immunoglobulin genes, associated with the GO biological process term ‘adaptive immune response’, were found to have higher transcript abundance in the majority of patients with COVID-19 than those with influenza (p-value < 2.22×10−16, Wilcoxon test) (Figure 3C and Supplementary figure 4) and by using a total Z-score, patients with COVID-19 or influenza were classified as having either a high or low abundance of these 83 immunoglobulin genes. A high abundance was associated with a total positive Z-score (1.46 to 175.46) which was identified in 59 patients with COVID-19 and 21 patients with influenza indicating a higher than average abundance of these 83 adaptive immune response related immunoglobulin genes. While a low abundance was associated with a total negative Z-score (−0.12 to −154.93) identified in 19 patients with COVID-19 and 62 patients with influenza indicating a lower than average abundance of adaptive immune response related immunoglobulin genes. COVID-19 patients with lower abundance of adaptive immune response related immunoglobulin genes, a total negative Z-score, were found to be significantly older (p-value 6.32×10−3, T-test) and had a shorter duration of symptoms before being admitted into hospital (p-value 5.9×10−04, Wilcoxon test). Additionally, COVID-19 patients with high abundance of adaptive immune response related immunoglobulin genes, a total positive Z-score, were significantly more likely to be still alive 30 days after admitted into hospital (x2 13.39 and p-value 2.52×10−04, Chi-square test) (Figure 4).
Topological mapping of global gene patterns
Topological analysis allows the measurement of the global profiles of transcript abundances relative to gene pathways without data reduction and this was used to define a global map of differentially activated pathways between COVID-19 and influenza. The first differentially activated TopMD pathway was enriched for ribosomal and insulin related pathways, with peak gene UBA52: named by GO analysis as cytoplasmic ribosomal proteins (adjusted p-value 1.55×10−146). This pathway was also found to be enriched for genes expressed by transcription factor Myc (adjusted p-value 7.07×10−53) against the ChEA 2016 transcription factor database and of dendritic cells in the ARCHS4 transcription factors’ co-expression database (adjusted p-value 1.34×10−36). Activated Myc represses interferon regulatory factor 7 (IRF7) and a significant lower abundance of IRF7 was found in patients with COVID-19 compared to influenza (Supplementary figure 5). The second differentially activated TopMD pathway had peak gene NDUFAB1; named by GO analysis as mitochondrial complex I assembly model OXPHOS system WP4324 (adjusted p-value 2.81×10−66). The third differentially activated TopMD pathway was named by GO analysis as proteasome degradation WP183 (adjusted p-value, 1.46×10−64), with PSMD14 as the peak gene (Figure 5 with full detail in Additional file 3 and the global map of differentially activated pathways available online (65)).
Cell subsets supporting innate and adaptive immune response differences
Analysis of the blood transcriptome can be used to predict the immune cells present (64). Levels of different predicted cell types were assessed to determine whether there were differences in immune system associated cells between patients with COVID-19 or influenza (Figure 6). Statistical testing was done on cell type levels identified with CIBERSORTx. M0 macrophages (p-value 3.63×10−06), plasma cells (p-value 5.05×10−04), cytotoxic CD8+ T cells (p-value 4.58×10−03), regulatory T cells (p-value 7.30×10−03) and resting natural killer cell (p-value 8.90×10−03) were found to be significantly higher in COVID-19 patients, while in influenza patients activated dendritic cells (p-value 2.23×10−02) were significantly higher.
Predicted cell type levels between COVID-19 survivors and non-survivors indicated an increase of neutrophils (p-value 2.84×10−04) in patients who died of COVID-19. In contrast, an increase of naïve CD4+ T cells (p-value 1.92×10−03), M0 macrophages (p-value 1.20×10−02), M2 macrophages (p-value 1.48×10−02), naïve B cells (p-value 1.57×10−02) and naïve cytotoxic CD8+ T cells (p-value 2.31×10−02), were identified in patients who went on to survive COVID-19 (Figure 7).
Efficient adaptive immune response associates with COVID-19 survival
As already noted, a high abundance of the GO biological process ‘adaptive immune response’ related transcripts, mostly immunoglobulin genes, was associated with COVID-19 survival (Figure 4). Here a direct assessment was done of the blood transcriptome differences between patients who, at 30 days after hospital admission, survived or who died of COVID-19. A total of 23,850 abundance measures of gene transcripts were obtained after filtering out transcripts with low counts, of which 6.645 transcripts were found to be significant (FDR p-value < 0.05) of which, 537 transcripts exceeded a log2 fold change of < −1 or > 1, with 265 transcripts showing higher abundance in patients who survived COVID-19 and 272 transcripts showing a higher abundance in patients who died of COVID-19 (Figure 8A and Additional file 4).
In patients who died of COVID-19 an enrichment for biological processes involved with an inflammatory response including interleukin signalling and neutrophil activation and degranulation was detected (Figure 8B). While in COVID-19 survivors biological processes involved with the adaptive immune system including complement activation, B cell mediated immunity and a humoral immune response mediated by circulating immunoglobulins was found to be enriched (Figure 8C). Additionally, transcript abundances associated with T cell and B cell proliferation were significantly higher in COVID-19 survivors (p-value < 1.0×10−04, paired non-parametric T-test) (Figure 8D).
Immune signatures as predictors of COVID-19 outcome
Distinct immune signature genes were selected and assessed for their prediction accuracy in stratifying patients with COVID-19 for disease outcome, fatality or survival. A signature of 47 genes was identified (Figure 9A), representative of the four biggest clusters of genes associated with either patients with COVID-19 who survived or died. The associated GO biological process terms were ‘humoral immune response mediated by circulating immunoglobulin’ (FDR p-value 2.23×10−46), ‘nucleosome assembly’ (FDR p-value 5.46×10−19), ‘regulation of T-helper 1 cell cytokine production’ (FDR p-value 4.24×10−03) and ‘regulation of T cell activation’ (FDR p-value 4.51×10−04) (Supplementary figure 6). This was highly predictive for outcome, with a maximum specificity of 75% and sensitivity of 93% (Figure 9B and Table 4).
Discussion
As previously reported transcriptomic analysis of blood samples provide a relatively non-invasive window on the immune response, as previously shown by differentiating patients with Ebola virus disease at the acute stage (66). In this study we explored the functional blood transcriptomic differences, focussing mainly on the immune response, between patients with COVID-19, admitted to hospital during the first wave of the pandemic, and patients with a well-characterised stereotypical seasonal respiratory virus infection, influenza. Furthermore, we compared the blood transcriptomes of COVID-19 survivors and non-survivors for promising prognostic signatures indicative of COVID-19 survival.
35 variables that can provide prognostic information on COVID-19 associated mortality and 14 variables that can provide prognostic information on COVID-19 severity have previously been reported (67). We compared these known COVID-19 prognostic variables (67) between patients with COVID-19 or influenza and found more active smokers among influenza patients. High C-reactive protein (CRP), which previously has been reported to be similar upon admission to hospital between patients with COVID-19 or influenza (3), hypertension and diabetes were more common among patients with COVID-19. We also found an increase of liver disease, which has been classified as a low or very low certainty predictor (67), in patients with COVID-19. In our cohort more patients with influenza had an underlying respiratory disease. Similar to what has been previously reported (3) upon admission to hospital both patients with COVID-19 or influenza presented with similar WBC and neutrophil counts, and although we detected a difference in lymphocytes between patients with COVID-19 or influenza, there was no difference the N/L ratio. Similar to Piroth et al. (4) we found that the average length of stay was higher for patients with COVID-19 compared to influenza, and more patients with COVID-19 needed supplementary oxygen, and finally while Piroth et al. (4) report a roughly three times higher relative risk of death for COVID-19, in our cohort no influenza patients died whilst admitted to hospital and so this could not be assessed. In addition, we compared COVID-19 survivors and non-survivors, and as reported the high certainty prognostic variables for mortality and/or severity of increased age, hypertension, cardiovascular disease, diabetes, underlying respiratory disease (including COPD) and a high WBC (67) were increased in non-survivors. While it has previously been reported that CRP and N/L ratio were elevated in patients with COVID-19 who became critically ill (3), in our study we saw no difference in CRP, neutrophil count and lymphocyte count between COVID-19 survivors and non-survivors. However, we found that urea, creatinine, alanine aminotransferase, troponin and several cytokines, including IL-1β, IL-6, IL-8, IL-10 and TNFα, to be higher in patients who died of COVID-19.
Our initial global analysis of blood transcriptomic differences between patients with COVID-19 or influenza detected contrasting innate and adaptive immune programmes. An impaired immune response to viruses and interferon signalling in patients with COVID-19 was found, as described previously (6–9), compared to patients with influenza, which are known to produce an IFN response (3). Furthermore, in accordance with accumulating evidence of aberrant blood clotting in patients with COVID-19 (68, 69), transcripts expressed by megakaryocytes and platelets associated with blood coagulation were in a higher abundance in COVID-19 patients. Gene clusters associated with an innate immune response were found to be associated with influenza. While, in contrast, gene clusters associated with an adaptive immune response and an increase of predicted plasma cells and CD8+ T cells with COVID-19, pointing to T cell and B cell activation / priming.
Further analysis revealed various immunoglobulin genes had increased transcript abundance in patients with COVID-19 compared to patients with influenza. This significant over representation of a wide range of heavy chain and light chain V genes in patients with COVID-19 has been described before (70) and the implementation of antibody analysis in plasma samples has been used as an additional tool in diagnosing COVID-19 (71). We found that the 86.8% (53/61) of patients who survived COVID-19 had a higher than average transcript abundance of 83 immunoglobulin genes, which overlap with the GO biological term ‘adaptive immune response’, while this was 37.5% among the patients who died of COVID-19. Further analysis revealed that the aforementioned higher than average transcript abundance is associated with a younger age of the patient, a longer symptom duration before admittance into hospital and a positive survival outcome 30 days after hospital admission. A lower than average transcript abundance of 83 immunoglobulin genes was detected in 62.5% (10/16) of patients who died of COVID-19, compared to 14.8% (9/61) of patients who survived COVID-19.
We subsequently detected an increased transcript abundance from genes associated with T cell and B cell proliferation, an enrichment for gene pathways involved with an adaptive immune response, and an increase in predicted CD4+ and CD8+ T cells and naïve B cells in patients who survived COVID-19, highlighting the importance of an efficient adaptive immune response as previously reported (17). The predicted cell fraction of naïve CD4+ T cell was found to be higher compared to CD8+ T cells indicating a higher CD4+ T cell response to SARS-CoV-2 than a CD8+ T cell response, supporting previous observations (17, 72), which has been found to control primary SARS-CoV-2 infection (22). We note that the CD8+ T cells were mostly seen in COVID-19 survivors, compared to COVID-19 non-survivors, which has been associated with a positive COVID-19 outcome (22, 73).
In contrast, we detected in COVID-19 non-survivors an enrichment of pathways involved with the negative regulation of lymphocyte activation and increased neutrophil activation and degranulation, supported by a significant decrease in predicted cell fraction of naïve B cells and naïve CD4+ and CD8+ T cells and an increase of the neutrophil cell fraction. This is consistent with previous studies finding elevated levels of neutrophils in blood (74) and lungs (75–78) in severe COVID-19. Furthermore, gene pathways involved with an inflammatory response and cytokine signalling were enriched in COVID-19 non-survivors and we detected that a higher transcript abundance of several IL genes (IL1-RAP, IL-10, IL1-R1, IL1-R2, IL18-R1 and IL18-RAP) and laboratory results indicated a increase of TNFα, IL-1β, IL-8, and IL-33 with the largest increase for IL-6 and IL-10. This is consistent with the previously reported positive regulation of genes encoding the activation of innate immune system, viral and IFN response (3), and increase of proinflammatory macrophages (79) and elevated IL-6 and IL-10 in severe COVID-19 cases (14–16).
When comparing the immune response between patients who either survived or died of COVID-19 it appears that, as Sette and Crotty (24) summarised, that COVID-19 severity is largely due to an early virus-driven evasion of innate immune recognition leading to a subsequent delayed adaptive immune response with a fatal COVID-19 outcome, as shown by Lucas et al. (80), where the innate immune response is ever-expanding due to an absence of a quick T cell response. A delayed adaptive immune response to COVID-19 can occur in the elderly due to their reduced ability to mount a successful adaptive immune response leading to an increased risk of death (22). This reduced ability to mount an adaptive immune response in the elderly is due to a scarcity of naïve T cells caused by aging (18–20) and the association of age and severe or fatal COVID-19 is already known, for example, as of April 15th 2021 in the United States 95.4% of COVID-19 deaths have occurred in 50-year-olds and older, and 59.3% in 75-year-olds and older (81). Similarly, we found that patients who survived COVID-19 were younger, had a higher predicted naïve CD4+ T cell and naïve B cell fraction, and had an increased heart rate compared to non-survivors. Further research is needed to assess the causality of these factors, for example the relationship between increased age and heart rate in non-survivors.
Topological analysis was performed to identify the global map of gene pathways differentially activated between COVID-19 and influenza. The first differentially activated pathway was enriched for genes related to ribosomal and insulin pathways indicating differences in effects on translational machinery and supporting the reported roles of insulin resistance linked to COVID-19 severity (82). Although highly speculative, insulin signalling differences may reflect the role of angiotensin converting enzyme 2 (ACE2), the binding site for SARS-CoV-2, which degrades angiotensin 2, protecting against oxidative stress and insulin resistance driven by the renin-angiotensin-aldosterone system (83). Additionally, ACE2 expression has been found to be increased in rats given a high sucrose diet or insulin sensitisers (84). Furthermore, the first pathway was also found to be enriched for genes transcribed by Myc. Activated Myc represses IRF7 which regulates type I IFN production (85), and correspondingly we found a significant lower IRF7 expression and a lower induction of IFN in patients with COVID-19 compared to influenza. This low IFN induction in COVID-19 may be due to the virus avoiding or delaying an intracellular innate immune response to type I and type III IFNs (6–9). The second most differentially activated pathway, peak gene NDUFAB1, involved with the mitochondrial complex I assembly model OXPHOS system supports reported increased COVID-19 disease severity linked to SARS-CoV-2 being able to highjack mitochondria of immune cells, replicate and disrupt mitochondrial dynamics (86). The third differentially activated pathway was associated with the cellular ubiquitin-proteasome pathways which are known to play important roles in coronavirus infection cycles (87). The protein synthesis and ubiquitination-related pathways might reflect mechanisms of increased viral replication and suppression of host interferon signalling pathways, including increased degradation of IκBα which suppresses the IFN-induced NF-κB activation pathway. Also, in SARS-CoV, accessory protein P6, whose sequence is conserved in SARS-CoV-2 (88), promotes the ubiquitin-dependent proteasomal degradation of N-Myc interactor, thus limiting IFN signalling (89). However, the peak marker of this pathway PSMD14 which prevents interferon regulatory factor 3 (IRF3) autophagic degradation and therefore, permits IRF3-mediated type I IFN activation (90); shedding light on the complex mechanistic differences regulating interferon production between COVID-19 and influenza.
Conclusions
In this study, we have compared side-by-side SARS-CoV-2 and a stereotypical respiratory viral infection (influenza), and COVID-19 survivors and non-survivors. Distinct patterns of transcript abundances and cellular composition were found in whole blood that can differentiate the infection source, furthering our understanding of the antiviral immune response differences. Additionally, we observed a proinflammatory signature associated with a negative outcome in patients with COVID-19. Finally, a signature of transcript abundances in the blood transcriptome of COVID-19 patients, upon admission to hospital, was identified with prognostic potential to stratify patients into those likely to survive or die.
Data Availability
Following publication of major outputs all anonymised data will be made available on reasonable request to the corresponding author providing this meets local ethical and research governance criteria.
Declarations
Ethics approval and consent to participate
The COV-19POC trial was approved by the South Central - Hampshire A Research Ethics Committee (REC): REC reference 20/SC/0138 (March 16th, 2020); and REC reference 17/SC/0368 (September 7th, 2017) for the FluPOC trial. For full inclusion and exclusion criteria details see (25) and (26). Written informed consent was given by the patients, or consultee assent was obtained where patients were unable to give consent.
Consent for publication
Not applicable.
Availability of data and materials
Following publication of major outputs all anonymised data will be made available on reasonable request to the corresponding author providing this meets local ethical and research governance criteria.
Competing interests
TWC has received speaker fees, honoraria, travel reimbursement, and equipment and consumables free of charge for the purposes of research from BioFire diagnostics LLC and BioMerieux. TWC has received discounted equipment and consumables for the purposes of research from QIAGEN. TWC has received consultancy fees from Biofire diagnostics LLC, BioMerieux, Synairgen research Ltd, Randox laboratories Ltd and Cidara therapeutics. TWC has been a member of advisory boards for Roche and Janssen and has received reimbursement for these. TWC is member of two independent data monitoring committees for trials sponsored by Roche. TWC has previously acted as the UK chief investigator for trials sponsored by Janssen. TWC is currently a member of the NHSE COVID-19 Testing Technologies Oversight Group and the NHSE COVID-19 Technologies Validation Group. JPRS is a founding director, CEO, employee and shareholder in TopMD Precision Medicine Ltd. FS is a founding director, CTO, employee and shareholder in TopMD Precision Medicine Ltd. PJS is a founding director, employee and shareholder in TopMD Precision Medicine Ltd. AG is an employee and shareholder in TopMD Precision Medicine Ltd. No competing interest were reported by the other authors.
Funding statement
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: the CoV-19POC trial was funded by University Hospital Southampton Foundation Trust (UHSFT) and the FluPOC trial by the National Institute of Health Research (NIHR) Post-Doctoral Fellowship Programme. In addition, the CoV-19POC and FluPOC trials were supported by the NIHR Southampton Clinical Research Facility and NIHR Southampton Biomedical Research Centre (BRC). J Legebeke was supported by a PhD studentship from the NIHR Southampton BRC (no. NIHR-INF-0932). RP-R was supported by a PhD studentship from the Medical Research Council Discovery Medicine North Doctoral Training Partnership (no. MR/N013840/1). NJB was supported by the NIHR Clinical Lecturer scheme. JAH, CH and XD were supported by the US Food and Drug Administration (no. 75F40120C00085), and this work was partly supported by U.S. Food and Drug Administration Medical Countermeasures Initiative (no 75F40120C00085) awarded to JAH. MEP was supported by a Sir Hendy Dale Fellowship from Welcome Trust and The Royal Society (no. 109377/Z/15/Z). TWC was supported by a NIHR Post-Doctoral Fellowship (no. 2016-09-061). DB was supported by a NIHR Research Professorship (no. RP-2016-07-011). The views expressed are those of the authors and not those of the funding agencies.
Authors’ contributions
TWC and DB conceptualized the study. SP and NJB screened and recruited the patients and collected the data in the FluPOC and CoV-19POC trials. RP-R and CH sample processing and experiments. J Legebeke, J Lord, RP-R, AFP, XD, FS, AG, JPRS, JAH and MEP performed data analysis. J Legebeke, J Lord, RP-R, AFP, FS, AG, JPRS, JAH and MEP drafted the article, and editing by J Legebeke, J Lord, RP-R, SP, NJB, GW, JPRS, PJS, JAH, MEP, TWC and DB. Project advice was given by JH, JSL, JAH, MP and TW. All authors read and approved the final manuscript.
Supplementary information
Supplementary figures
Additional file 1: Eighty-three immunoglobulin genes associated with the GO biological process term ‘adaptive immune response’ and their Z-scores.
Additional file 2: Differential gene expression results between patients with COVID-19 or influenza.
Additional file 3: Gene pathways identified by TopMD topological analysis between patients with COVID-19 or influenza.
Additional file 4: Differential gene expression results between COVID-19 survivors and non-survivors.
Additional file 5: Differential splicing results and discussion
Additional file 6: Differential splicing results between patients with COVID-19 or influenza.
Acknowledgements
The authors would like to acknowledge and gives thanks to all the patients who kindly participated in the FluPOC and CoV-19POC studies and to all the clinical staff at University Hospital Southampton Foundation Trust who cared for them. In addition, the authors acknowledge the use of the IRIDIS High Performance Computing Facility, and associated support services at the University of Southampton, in the completion of this work.
Footnotes
↵* Joint last authors
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.
- 8.
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.
- 16.↵
- 17.↵
- 18.↵
- 19.
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.
- 77.
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵