Summary
Differential host responses in coronavirus disease 2019 (COVID-19) and multisystem inflammatory syndrome in children (MIS-C) remain poorly characterized. Here we use next-generation sequencing to longitudinally analyze blood samples from pediatric patients with acute COVID-19 (n=70) or MIS-C (n=141) across three hospitals. Profiling of plasma cell-free nucleic acids uncovers distinct signatures of cell injury and death between these two disease states, with increased heterogeneity and multi-organ involvement in MIS-C encompassing diverse cell types such as endothelial and neuronal Schwann cells. Whole blood RNA profiling reveals upregulation of similar pro-inflammatory signaling pathways in COVID-19 and MIS-C, but also MIS-C specific downregulation of T cell-associated pathways. Profiling of plasma cell-free RNA and whole blood RNA in paired samples yields different yet complementary signatures for each disease state. Our work provides a systems-level, multi-analyte view of immune responses and tissue damage in COVID-19 and MIS-C and informs the future development of new disease biomarkers.
INTRODUCTION
At the onset of the coronavirus disease 2019 (COVID-19) pandemic, SARS-CoV-2 was thought to only cause mild or asymptomatic infection in children. Large-scale surveillance studies have since demonstrated cases of severe COVID-19-associated pneumonia occurring throughout the pandemic, especially in children with underlying comorbidities1,2. To date, mortality from COVID-19 in children has exceeded the pediatric mortality from influenza in any given season3,4. Furthermore, children are at risk for the post-infectious multisystem inflammatory syndrome in children (MIS-C) associated with COVID-19, which manifests as severe disease with systemic hyperinflammation and multiorgan involvement5. The diagnosis of MIS-C currently relies on clinical symptoms, and the pathogenesis of MIS-C is incompletely understood. Of note, cases of MIS-C were initially misdiagnosed as Kawasaki disease (KD)6, another systemic inflammatory syndrome, because of overlapping clinical features including fever, hyperinflammation, mucocutaneous involvement, and vascular endothelial dysfunction. However, subsequent studies have demonstrated distinct clinical7 and immunological8 phenotypes associated with MIS-C as compared to KD. Thus, a better understanding of the pathogenesis of MIS-C is critical to improve clinical diagnosis and inform targeted interventions, particularly as new variants of SARS-CoV-2 continue to emerge.
Initial characterization by proteomics and RNA sequencing revealed that MIS-C has an inflammatory profile similar to KD and severe COVID-19, with key differences, including specific increases in IFNy, IL-6, IL-17, IL-10, alarmin-related proteins, and other proinflammatory factors9–11. Autoantibody profiling revealed a unique autoantigen profile targeting organs often injured during MIS-C, but the damage to the organs and tissues has not yet been quantified on a system-wide level12. Immune cell profiling showed that MIS-C is associated with an expansion of specific subsets of NK cells, T cells, and B cells10,13, and gene expression profiling showed that a high fraction (∼24%) of T cells in patients with MIS-C are T cell receptor beta variable 11-2 (TRBV11-2) positive13–15. T cell receptor repertoire analyses have led to the hypothesis that the SARS-CoV-2 spike protein has a superantigen effect, causing T cell dysregulation that contributes to the development of MIS-C15. Finally, flow cytometry analyses have demonstrated that children with MIS-C have reduced virus-specific CD4+ and CD8+ T cells compared to children with COVID-19 and controls16.
Despite these prior studies, much remains unclear about the pathogenesis of MIS-C, and there is a lack of biomarkers that could be leveraged to develop diagnostic and prognostic assays. Here we performed unbiased profiling of whole blood cellular RNA (wbRNA), plasma cell-free RNA (cfRNA), and plasma cell-free DNA (cfDNA) from 211 pediatric patients diagnosed with COVID-19 or MIS-C and 26 controls across 3 separate pediatric hospital systems in the United States. Longitudinal sampling of these 3 blood analytes enable a complementary and systems-level view of immune responses and cell/tissue damage associated with MIS-C and COVID-19.
RESULTS
Clinical COVID-19 and/or MIS-C cohort
We collected 416 blood and/or plasma samples from 237 patients from the University of California at San Francisco (UCSF), Emory University/Children’s Healthcare of Atlanta (EMORY), and Children’s National Hospital (CNH) (Fig. 1A and B, Table; Methods). Patients from Emory were prospectively enrolled into a specimen collection and biobanking protocol, permitting the acquisition of longitudinal samples during hospitalization, one month post-hospitalization, and ≥3 months post-hospitalization. Remnant clinical samples from patients at the other sites were biobanked and analyzed under approved institutional review board (IRB) protocols with waiver of consent. All samples were stratified by diagnosis, time of collection, and severity of disease (Fig. 1C). Patients were either diagnosed with COVID-19 (without MIS-C) or MIS-C, or were uninfected control subjects. The control subjects were healthy outpatient children prospectively enrolled at Emory. Hospitalization time points were stratified as acute (0-4 days after hospital admission) or post-acute (>4 days after hospital admission), and patients were classified according to clinical severity at time of presentation as having asymptomatic, mild, moderate, or severe disease (Supplementary Table). cfRNA and cfDNA profiling by next-generation sequencing (NGS) were performed from plasma, and transcriptome RNA profiling (RNA-Seq) was performed from whole blood.
(A) Sample collection and processing overview. (B) Distribution of samples across analytes. (C) Distribution of disease severity for each sample group.
Demographic and clinical characteristics of the study cohort
Circulating cell-free RNA profiling
We performed transcriptome sequencing of plasma cfRNA on 132 samples from 124 pediatric patients. Of the 132 samples, 88 (67%) were classified as MIS-C, 31 (23%) as moderate-to-severe COVID-19 and 13 (10%) as negative controls (Fig. 2A). Recent work by Vorperian et al. demonstrated the possibility to quantify cell-types-of-origin (CTO) of cfRNA using reference-based deconvolution17. Here, we implemented BayesPrism and the “Tabula Sapiens” human single-cell transcriptome atlas as a reference to quantify the cfRNA CTO18,19. We consistently observed significant increases in cfRNA from neutrophils, kidney epithelial cells, thymocytes, and solid organ-derived cell types in moderate-to-severe COVID-19 as compared to control individuals, and further increased contributions from these cell types in MIS-C (p-value<0.05 by Mann-Whitney U test) (Fig. 2B and C; Supplemental Fig. 1A). We also observed significant increases in cfRNA from endothelial cells and neuronal Schwann specifically in children with MIS-C (p-value<0.05 by Mann-Whitney U test) (Fig. 2B and C). Platelet cfRNA was decreased in MIS-C and moderate-to-severe COVID-19 compared to controls. The CTO of cfRNA were highly similar for samples collected at different hospitals, highlighting the potential for cfRNA as a diagnostic biomarker for COVID-19 or MIS-C.
(A) Study design and analysis overview. (B) Average cfRNA deconvolution results for COVID-19, MIS-C, and controls over various time points. (C) Cell-free RNA deconvolution results of endothelial cell, neutrophil, and Schwann cell derived cfRNA (D) Diversity of cell type contributions to the cell-free transcriptome as measured by Simpson’s Index. (E) Dissimilarity of samples as compared to controls. Each point represents a comparison to a control, as measured with the Bray-Curtis dissimilarity measurement. (F) Scaled counts per million (CPM) values of significantly differentially abundant genes (DAGs) (DESeq2, Benjamini-Hochberg adjusted p-value < 0.01, |Log2FoldChange| > 1.5). Number of DAGs indicated to the left of the heatmap. Samples and genes are clustered based on correlation. (G) Normalized CPM values of TGM2 and antiviral gene RSAD2 across sample groups at the acute timepoint. (H) Cumulative CPM of genes in significant gene ontology groups (topGO, adjusted p-value < 0.05). Panels A and B show the number of samples in each group. Box plots show cumulative CPM distribution of controls and acute timepoint MIS-C and moderate-to-severe COVID-19. Points represent average cumulative CPM and bars represent standard error. Outliers are indicated with arrows and values. Asterisks indicate statistical significance by Mann-Whitney U test using Benjamini-Hochberg adjusted p-values as follows: ns, non-significant; *, p < 0.05; **, p < 0.01; ***, p < 0.001; ****, p < 0.001.
Next, we sought to investigate longitudinal changes in cfRNA CTO in MIS-C. We observed significantly greater CTO diversity using Simpson’s index20 in acute MIS-C versus convalescent MIS-C (≥1 month post-hospitalization) or controls (p-value<0.05 by Mann-Whitney U test) (Fig. 2D). In contrast, cfRNA CTO diversity was not significantly different for acute moderate-to-severe COVID-19 compared to controls or convalescent COVID-19 (p-value>0.05 by Mann-Whitney U test) (Fig. 2D). Samples from patients with acute moderate-to-severe COVID-19 separated into 2 groups, one group with high and one with low cfRNA CTO diversity, consistent with the heterogeneity of cell types involved in COVID-19 as previously described21. We further compared the cfRNA CTO diversity of post-acute MIS-C and COVID-19. We found that samples from post-acute MIS-C patients had high cfRNA CTO diversity while samples from post-acute moderate-to-severe COVID-19 patients had low cfRNA CTO diversity (Supplemental Fig. 1B and C). These trends were also observed using the Shannon diversity index22. Next, we analyzed pairwise dissimilarity in cfRNA CTO between COVID-19 and MIS-C samples and controls. The Bray-Curtis dissimilarity of cfRNA CTO within controls was significantly lower than between controls and acute MIS-C (mean dissimilarity = 0.18 versus 0.56, p-value<0.05 by Mann-Whitney U test) and between controls and acute moderate-to-severe COVID-19 (mean dissimilarity = 0.18 versus 0.38, p-value<0.05 by Mann-Whitney U test) (Fig. 2E). Among convalescent COVID-19 or MIS-C samples, only the COVID-19 one-month follow up cohort exhibited significant divergence of cfRNA CTO compared to controls (Fig. 2E). Patients from 2 different hospitals (Emory and CNH) exhibited similar diversity and pairwise dissimilarity metrics (Fig. 2D and E). These findings revealed that acute MIS-C exhibited a higher diversity of cell types than either COVID-19 or controls, and that the observed cell types based on cfRNA gene expression were significantly different between MIS-C and COVID-19.
We identified differentially abundant genes (DAGs) in cfRNA associated with dead or dying cells to gain insight into disease pathogenesis and to characterize potential diagnostic biomarkers23. Patients diagnosed with acute MIS-C or moderate-to-severe COVID-19 were compared pairwise to controls. Using an absolute log2 fold change cutoff of 1.5 and a Benjamini-Hochberg adjusted p-value of <0.01, 1,409 DAGs were identified between MIS-C and controls, 265 DAGs between COVID-19 and controls, and 102 DAGs between MIS-C and COVID-19. Unsupervised clustering revealed distinct gene expression profiles separating MIS-C and COVID-19 from controls (Fig. 2F). Samples from acute MIS-C patients were assigned into three groups based on unsupervised clustering, each group with a distinct CTO profile (Supplemental Fig. 1D). The three groups consisted of cfRNA predominantly derived from (1) endothelial cells, NK cells, and respiratory ciliated cells, (2) monocytes, neutrophils, and myeloid progenitors, and (3) platelets. The third group clustered with samples from controls and COVID-19 patients and may hence be a technical artifact associated with elevated platelet lysis during sample preparation.
We analyzed significant DAGs between acute MIS-C and moderate-to-severe COVID-19 to obtain insights into disease pathogenesis and differential immune responses associated with these 2 diseases. Acute MIS-C was associated with elevated levels of endothelial cell markers (AKAP12, CNN3, FZD4), neuronal markers (GAS7, FEZ1, VAT1), actin-related genes (FSCN1, AFAP1L1, ITGA9), and an autoantigen also found in patients with celiac disease (TGM2) (Fig. 2G, Supplemental Fig. 1E). In contrast, acute COVID-19 was associated with elevated levels of interferon genes (IFI6, IFIT1, IFI44L, IFI27, IFITM1), antiviral genes (RSAD2, MX1, CMP2, LY6E), chemokine genes (CXCL5, CXCL3), and ciliated olfactory cell markers (OR2B6, ENKUR) (Fig. 2G, Supplemental Fig. 1E). Next, we performed gene ontology analysis using the R package topGO24. Gene ontology terms enriched in samples from COVID-19 patients included those associated with programmed cell death, response to viral infection, and regulation of the viral life cycle, while those enriched in MIS-C patients included actin cytoskeleton organization, endothelial cell migration, cytokine responses, and cell migration. To identify disease-specific pathways, we calculated counts per million (CPM) for each gene ontology group by summing the counts of DAGs in that group25. Compared to controls, the cumulative CPM score for endothelial cell migration was significantly increased in acute MIS-C (p-value<0.05 by Mann-Whitney U test), while cumulative CPM scores for myeloid cell differentiation were increased in acute MIS-C and moderate-to-severe COVID-19 (p-value<0.05 by Mann-Whitney U test) (Fig. 2H). Control samples were associated with increased gene ontology groups related to cell division and cell communication, consistent with the observation that the baseline cfRNA signal is predominantly derived from extracellular vesicles and cells undergoing apoptosis, as typically occurs during mitosis26. Next, we perfomed pathway analysis using Ingenuity Pathway Analysis (IPA). Pathways enriched in MIS-C included pyroptosis, synaptogenesis, NF-kB signaling, IL-6 signaling, IL-8 signaling, antiviral responses including interferon induction, and cholesterol biosynthesis, while a different set of pathways enriched in COVID-19 included macrophage production of nitric oxide, coronavirus pathogenesis, LXR/RXR activation, GP6 signaling, PTEN signaling, summylation, gustation (taste), and HMGB1 signaling (Supplemental Fig. 1F and G).
Whole blood RNA profiling by transcriptome sequencing (RNA-Seq)
Whole blood transcriptome profiling was performed on 217 samples from 187 pediatric patients. Of the 217 whole blood samples, 135 (62%) were classified as MIS-C, 56 (26%) as moderate-to-severe COVID-19 and 26 (12%) as negative controls (Fig 3A). Samples from Emory and UCSF were used for the differential expression anlaysis (DEA) and no batch effect was detected via unsupervised clustering (Supplemental Fig. 2A). Samples from CNH were used as a validation group (Fig. 3B-C). Pairwise comparisons of MIS-C and severe COVID-19 relative to controls showed a large degree of overlap in shared differentially expressed genes (DEGs) between MIS-C and COVID-19 (DESeq2, Benjamini-Hochberg corrected p-value < 0.01, |Log2FoldChange| > 1.5) (Fig. 3B-C, Supplemental Fig. 2A). The top 2 shared DEGs in both diseases were ADAMTS2, a metalloprotease that processes procollagen (Fig. 3D and Supplemental Fig. 2B), and CD177, a neutrophil activator (Supplemental Fig. 2B and C). Notably, ADAMTS2 has been previously implicated in severe COVID-19 in a study using peripheral blood mononuclear cell (PBMC) single-cell transcriptome sequencing27, whereas CD177 has been reported to be upregulated in the blood of MIS-C and COVID-19 patients28. We also observed elevated levels of ADAMTS2 during the post-acute stage of MIS-C and COVID-19; however, one month after hospitalization levels returned to baseline in children with MIS-C but were still elevated in children with COVID-19 (Supplemental Fig. 2D). Certain inflammatory genes such as interferon-stimulated gene 15 (ISG15) and macrophage-associated sialic acid binding Ig-like lectin 1 (SIGLEC1) were significantly upregulated in COVID-19 but not in MIS-C versus controls (Fig. 3D and Supplemental Fig. 2B). In contrast, T-cell receptor beta variable 11-2 (TRBV11-2) was more highly expressed in MIS-C than in COVID-19 versus controls (Fig. 3D), a finding that was also seen in a direct head-by-head comparison between MIS-C and COVID-19 (Supplemental Fig. 2D). This observation is consistent with two studies showing that TRBV11-2 is overexpressed by T-cells in most MIS-C patients, but not in patients with COVID-19, Kawasaki disease (KD), or toxic shock syndrome (TSS)14,29. These differences in TRBV11-2 expression in MIS-C versus COVID-19 or controls were observed at the post-acute timepoint, but not at the 1 month or ≥3 months timepoints (Supplemental Fig 2D). Interestingly, expression of gene paralogs KLRF1 and KLRB1, natural killer (NK) cell surface receptors, were found to be significantly decreased in COVID-19 and MIS-C compared to controls (Fig. 3D, Supplemental Fig. 2C), consistent with lower expression of these genes reported in severe COVID-19 versus controls30. Decreased KLRF1 and KLRB1 expression in COVID-19 and MIS-C was also observed at the post-acute timepoint, but at 1 month post-hospitalization returned to baseline in MIS-C yet remained decreased in COVID-19 (Supplemental Fig. 2D). In the head-to-head comparison between MIS-C and COVID-19, the top upregulated genes in COVID-19 were most likely to be those related to the antiviral type 1 interferon response pathway (e.g., IFIT2, SIGLEC1, IFI27, IFI44L, ISG15, IFIT3) (Supplemental Fig. 2E; Supplemental Dataset 1). The top upregulated genes in MIS-C were related to multiple pathways, likely due to the heterogeneous nature of MIS-C, including those associated with cell-to-cell communication (e.g., ITGA7, CDHR1, CD177, PGF, ERFE, MMP8) (Supplemental Dataset 1).
(A) Study design and analysis overview. (B) Scaled CPM values of significantly differentially expressed genes (DESeq2, Benjamini-Hochberg corrected p-value < 0.01, |Log2FoldChange| > 1.5). Differentially expressed genes were discovered using Emory and UCSF samples and the total number is indicated to the left of each heatmap. Samples and genes are clustered based on correlation. (C) Scaled CPM values of the top 300 significantly expressed genes from each comparison (union of genes, DESeq2, Benjamini-Hochberg adjusted p-value < 0.01, ranked by absolute Log2FoldChange). Samples and genes are clustered based on correlation. (D) CPM of ADAMTS2, TRBV11-2, SIGLEC1, and KLRB1 in controls, acute MIS-C, and acute moderate-to-severe COVID-19. Points represent average CPM and bars represent standard error. Asterisks indicate statistical significance by Mann-Whitney U test using Benjamini-Hochberg adjusted p-values as follows: ns, non-significant; *, p < 0.05; **, p < 0.01; ***, p < 0.001; ****, p < 0.001. Panels 3A show the number of samples in each group. (E) Top 20 differential pathways between controls and MIS-C or COVID-19 ranked by activation z-score. Lines connect matching pathways. Pathways highlighted in red are distinct to either MIS-C or COVID-19. (F) Top 20 differential pathways between MIS-C and COVID-19 ranked by activation z-score.
Next, we compared differentially expressed pathways in MIS-C and COVID-19 relative to controls (Fig. 3E) and to each other (Fig. 3F). In both MIS-C and COVID-19 patients, we found increased activation of immune-related pathways which include phagosome formation, macrophage production of nitric oxide, dendritic cell maturation, TREM1, and IL-6 and IL-8 signaling. Pathways associated with hypoxia (hypoxia-inducible factor 1-alpha, or HIF1-alpha signaling), neuroinflammation, and cardiac hypertrophy were also upregulated in both MIS-C and COVID-19. MIS-C patients showed a marked inhibition of T cell receptor, interleukin-2 (IL-2), and focal adhesion kinase (FAK) signaling pathways, concurrent with an activation of tumor environment, IL-1, and IL-13 signaling pathways. In contrast, COVID-19 patients showed a more pronounced activation of cytokine, B-cell, and adrenomedullin pathways (Fig. 3F). We analyzed differentially expressed pathways and their associations with diseases or biological functions and observed striking differences between MIS-C and COVID-19 compared to controls (Supplemental Fig. 3). MIS-C was characterized by downregulation of multiple pathways related to inflammatory response, cell death and survival, cell-to-cell signaling, and immune cell trafficking, whereas activation of these same pathways was predicted in COVID-19. These results are consistent with the downregulation of exhausted T cells that has been previously reported in children with MIS-C31.
Cell-free DNA tissues-of-origin by methylation profiling
We performed whole genome bisulfite sequencing of cfDNA extracted from plasma samples from 67 children with MIS-C (n=41), COVID-19 (n=21), or from controls (n=5) (Fig. 4A) and compared the cfDNA data to previously published data from an adult COVID-19 cohort32. The highest mean levels of total cfDNA were found in children with MIS-C as compared to pediatric COVID-19, adult COVID-19, or controls (4.12 ng/uL, p-value<0.03, by Mann-Whitney U test) (Fig. 4B). A subset of MIS-C patients (n=3) showed a markedly elevated burden of cfDNA (>10 ng/ul), likely secondary to widespread tissue injury. There were also significantly higher mean levels of cfDNA in more severe COVID-19 pediatric and adult cases as compared to mild or asymptomatic COVID-19 cases and controls (p-value<0.05 by Mann-Whitney U test) (Fig. 4B).
(A) Study design and analysis overview. (B) Total cfDNA concentration and solid organ derived cfDNA concentration. (C) cfDNA concentration derived from innate and adaptive immune cell types. (D) Abundance of mitochondrial derived cfDNA. Asterisks indicate statistical signifi ance by Mann-Whitney U test using Benjamini-Hochberg adjusted p-values as follows: ns, non-significant; *, p < 0.05; **, p < 0.01; ***, p < 0.001; ****, p < 0.001. (E) Correlation of cfDNA metrics with standard clinical measurements, along with the Pearson correlation Benjamini-Hochberg adjusted p-value for each comparison. Panels A and B show the number of samples corresponding to each group. Abbreviations: ASX, asymptomatic.
We next examined the utility of cfDNA tissues-of-origin (TOO) profiling to identify tissue profiles by comparison to a reference set of methylation profiles of purified cells and tissue samples, as previously described32. We observed significantly elevated levels of solid-organ-derived cfDNA in pediatric acute MIS-C and pediatric moderate-to-severe COVID-19 as compared to pediatric controls, pediatric acute mild or asymptomatic COVID-19, adult controls, and adult mild-to-severe COVID-19 cases (p-value<0.03, by Mann-Whitney U test) (Fig. 4C). Levels of solid-organ-derived cfDNA in acute MIS-C tended to be increased relative to acute moderate-to-severe COVID-19, although this difference was not significant (p-value=0.12, by Mann-Whitney U test) (Fig. 4C). We further observed elevated levels of cfDNA derived from innate immune cell types in acute MIS-C and moderate-to-severe pediatric COVID-19 compared to all other groups; however, this difference was not significant (Fig. 4C). In addition, we identified extensive heterogeneity in the TOO profiles from patients in the moderate-to-severe MIS-C cohort, including elevated levels of eosinophil, neutrophil, erythroblast, liver, heart, kidney, lung, and spleen-derived cfDNA (Supplemental Fig. 4). Although recent studies have reported increases in mitochondrial cfDNA in plasma from patients diagnosed with COVID-1933, here we observed a significant reduction in the concentration of mitochondrial cfDNA in pediatric patients with MIS-C and COVID-19 (p-value<0.04, by Mann-Whitney U test) (Fig. 4D). Finally, we found that cfDNA metrics associated with affected tissues and organs mirrored organ-specific clinical laboratory parameters. Significant correlations between kidney cfDNA concentration and creatinine levels (Pearson correlation; R=0.40, p-value=0.004), total cfDNA concentration and C-reactive protein (CRP) levels in the blood (Pearson correlation; R=0.34, p-value=0.022), and liver cfDNA concentration and ALT levels (Pearson correlation; R=0.303, p-value=0.036) were observed (Fig. 4E).
Comparative analysis of plasma cell-free RNA and whole blood RNA
A subset of cellular wbRNA and cfRNA samples were derived from the same blood draw (n=97), providing the opportunity to directly compare these two different analytes (Fig. 5A). First, we assessed the correlation of cfRNA and wbRNA abundance for genes with high average abundance (mean log-transformed CPM>10) across all sample categories. We found 1002 genes with significantly correlated wbRNA and cfRNA abundance (Pearson coefficient Benjamini-Hochberg adjusted p-value < 0.05). Positive correlation was observed in 992 (99%) of these 1002 genes, predominated by genes associated with myeloid cell transcription such as BNIP3L, HEMGN and NFKBIA (Supplemental Fig. 5A and B). Randomized permutation testing (n=1000 permutations) using either randomly paired genes or samples yielded on average far fewer significantly correlated genes (mean=250 and <1 with gene and sample randomization, respectively at a Benjamini-Hochberg adjusted p-value of <0.05) and decreased positive correlation (56% and 74% with gene and sample randomization, respectively), confirming the robustness of these observations. Next, we examined the degree of overlap in DEGs/DAGs among patients with either MIS-C (n=26) or moderate-to-severe COVID-19 (n=13), or controls (n=13) (Fig. 5B). We observed a substantial overlap in DEGs/DAGs in wbRNA and cfRNA when comparing MIS-C to controls (n=494) and COVID-19 to controls (n=153), but very little overlap when comparing MIS-C to moderate-to-severe COVID-19 (n=9). Of the 9 DEGs/DAGs that overlap when comparing MIS-C to moderate-to-severe COVID-19, seven had been previously reported in association with COVID-19 (IFI6, IFI44L, RSAD2, LY6E, EPSTI1, XAF1, MX1)34–38. Similar to the gene abundance profiles, differential pathway analysis revealed minimal overlap in top enriched pathways between wbRNA and cfRNA for each subgroup comparison (Supplemental Fig. 5C and D).
(A) Study design and analysis overview. (B) Overlap of differentially abundant genes (DAGs)/differentially expressed genes (DEGs) between whole blood RNA-seq and cfRNA-seq using paired samples (DESeq2, Benjamini-Hochberg adjusted p-value < 0.05, |Log2FoldChange| > 1). Venn Diagrams represent the overlap of upregulated genes between analytes in each sample group, as indicated by fill color. (C) Comparison of clustering topology between paired whole blood RNA-seq and cfRNA sequencing samples. Samples were clustered based on correlation of DAGs/DEGs from their respective analyses. Lines between trees connect paired wbRNA and cfRNA samples. Correlation was calculated using Baker’s Gamma and p-values were calculated using a Monte Carlo permutation test. (D) Comparison of cell-types-of-origin diversity between cfRNA and wbRNA as calculated with a Simpson Index. Paired samples are connected with a line. Asterisks indicate statistical significance by Mann-Whitney U test using Benjamini-Hochberg adjusted p-values as follows: ns, non-significant; *, p < 0.05; **, p < 0.01; ***, p < 0.001; ****, p < 0.001. (E) Cell-type-of-origin fractions, normalized to blood cell types and split by diagnosis group.
Next, we compared the agreement in unsupervised hierarchical sample clustering based on wbRNA and cfRNA profiling of DEGs (Fig. 5C). Samples clustered similarly in the MIS-C versus control and COVID-19 versus control comparisons (Baker’s Gamma Index = 0.60 and 0.64, respectively; two-sided p-values <0.01 for both, n=39 and n=26, respectively). In contrast, the MIS-C versus COVID-19 comparison yielded clustering that was less similar but still significantly correlated (Baker’s Gamma Index = 0.11; two-sided p-value = 0.013, n=39). We used subsampling and bootstrapping to determine whether the high Baker’s Gamma index in the COVID-19 versus control comparison was an artifact due to a smaller sample size (n=26). We did not observe a significant difference between the originally calculated values and bootstrapped distribution of Baker’s Gamma Index values after subsampling (p-value = 0.88 for the MIS-C versus control comparison; p-value=0.40 for the MIS-C versus COVID-19 comparison). Thus, despite the lack of overlap in DEGs, cfRNA and wbRNA sample grouping by unsupervised clustering was similar for the MIS-C and COVID-19 samples relative to controls, indicating that these two analytes provide complementary information.
Finally, we compared the cell-types-of-origin (CTO) in wbRNA and cfRNA between paired samples. We observed significantly different wbRNA and cfRNA CTO diversity in paired samples from acute moderate-to-severe COVID-19 patients and paired samples from controls (p-value = 0.032 and p-value = 0.0007, respectively, by paired Wilcoxon test) (Fig. 5D). However, the CTO diversity was not significantly different between wbRNA and cfRNA in paired samples from patients with acute MIS-C (p-value = 0.41 by paired Wilcoxon test) (Fig. 5D). Furthermore, in unpaired analyses, we observed a significant difference in wbRNA CTO diversity between acute moderate-to-severe COVID-19 and controls, but not between acute MIS-C and controls (Mann-Whitney U Test, Benjamini-Hochberg adjusted p-value = 0.02 and 0.10) (Supplemental Fig. 5E). Conversely, we observe a significant difference in cfRNA CTO diversity in acute MIS-C and controls, but not in acute moderate-to-severe COVID-19 and controls (Mann-Whitney U Test, Benjamini-Hochberg adjusted p-value = 0.03 and 0.32) (Supplemental Fig. 5E). Finally, we compared patterns of blood-derived CTO in paired wbRNA and cfRNA samples from patients with MIS-C, patients with acute moderate-to-severe COVID-19, and controls (Fig. 5E-G). The wbRNA and cfRNA CTO profiles corresponding to MIS-C and COVID-19 (Fig. 5F and G), characterized by multiple blood cell types, were distinct from the control group profiles, which consisted of a single predominant baseline cell type (erythroid precursors in wbRNA and platelets in cfRNA) (Fig. 5E). Within each disease group, wbRNA and cfRNA CTO profiles were also distinct. For COVID-19, wbRNA CTO profiles had higher contributions from neutrophils, NK cells, T cells, and monocytes, whereas cfRNA CTO profiles had lower proportions of these inflammatory cells (Fig. 5F). In contrast, for MIS-C, wbRNA CTO profiles were predominated by neutrophils, whereas cfRNA CTO profiles had relatively higher contributions from myeloid progenitor cells, NK cells, and monocytes (Fig. 5G).
DISCUSSION
Here, we report a systems-level, longitudinal analysis of COVID-19 and MIS-C by next-generation sequencing of nucleic acids (cfRNA, wbRNA, and cfDNA) in a large multi-hospital study of 416 blood samples from 237 patients. Using plasma cfRNA profiling, we identify signatures associated with cellular injury and death that distinguish MIS-C and COVID-19, as well as the involvement of previously unreported cell types in MIS-C. wbRNA analysis reveals substantial overlap in pro-inflammatory pathways between MIS-C and COVID-19, yet also reveals pathways that are specific to each disease state. Plasma cfDNA profiling suggests increased cfDNA and solid organ involvement in MIS-C compared to COVID-19 and controls. Comparative analyses of paired cfRNA and wbRNA samples demonstrate that these analytes yield separate, but complementary, signatures associated with MIS-C and COVID-19 that reflect distinct cell types of origin. These results provide novel insights into the differential pathogenesis of MIS-C and COVID-19 and lay the groundwork for the development of minimally invasive diagnostic tests for these two disease states.
Whole blood RNA sequencing (RNA-Seq) of cellular RNA has traditionally been considered the gold standard method for assaying gene expression in blood. However, the signal from wbRNA is primarily derived from circulating leukocytes due to primarily sampling cell types found in the blood, thus measuring the activation of a patient’s inflammatory and immune response to an infection. In contrast, plasma cfRNA and cfDNA both measure levels and types of cell death from circulating cell types and peripheral tissues17,39. With respect to cells that are turning over or dying, cfDNA enables precise quantification of cell numbers, whereas cfRNA enables characterization of gene expression and pathways40–42. cfRNA profiling can leverage the extensive information regarding cell type specific gene expression that is available from recent large-scale human cell atlas projects, whereas reference data for cfDNA methylation profiling is currently more limited17. Overall, these cell-associated (wbRNA) and cell-free (cfRNA and cfDNA) approaches mutually complement each other. When combined, they provide a more complete picture of the dynamic, “yin-yang” interplay between host and pathogen or between cell activation / proliferation and cell death than achievable using a single modality alone.
cfRNA is an analyte that probes cellular death and immune dynamics on a systems level. Previous analyses of MIS-C and COVID-19 from the blood have relied on single cell or bulk RNA-Seq of whole blood cells, which generally only characterizes the host immune response, or on proteomic and cytokine-based assays, which use a limited number of markers or for which there is a lack of standardized reference data. In contrast, the signals from cfRNA are derived from any cell or vascularized tissue type, and there is a plethora of RNA-Seq reference data that can be used to interpret results. Consistent with prior studies, here we observe increased levels of cfRNA from endothelial cells in MIS-C43 and from neutrophils and thymocytes in MIS-C and COVID-199,44,45 as well as increased signaling from disease-specific pathways in MIS-C (IL-6, IL-8, and NF-kB)12,14 and COVID-19 (olfactory, gustation, sumoylation, coronavirus replication, and HMGB1)11,13,46–48. However, the cfRNA data also uncovered several novel features of MIS-C, such as enrichment of neuronal genes associated with synaptogenesis and increased cfRNA burden from Schwann cells. These findings suggest that peripheral nervous system damage may be a common feature of MIS-C. Interestingly, both central and peripheral nervous system involvement in MIS-C have been previously described49,50, although these clinical manifestations are infrequent. Notably, peripheral nervous system damage has also been documented in pediatric and adult COVID-19 and in post-acute sequelae such as long COVID51–53. Future studies are needed to elucidate the mechanisms and clinical spectrum of neurologic involvement in acute MIS-C and their association with long-term neurodevelopment. Furthermore, we observe an enrichment of genes associated with the pyroptosis pathway in MIS-C, likely related to inflammasome activation54. Pyroptosis is a form of rapid cellular death that occurs during highly inflammatory states55. Previous reports have shown that pyroptosis occurs in vascular endothelial cells in Kawasaki disease, a similar systemic inflammatory syndrome56. Based on the observed increase of cfRNA from endothelial cells and cfRNA signatures of pyroptosis, our data thus support the likely critical role of pyroptosis and endothelial cells in MIS-C pathogenesis, and may help explain the overlapping clinical presentations between MIS-C and Kawasaki disease in acutely ill pediatric patients.
Whole blood RNA-Seq reveals a high degree of overlap in shared, largely pro-inflammatory genes and pathways between COVID-19 and MIS-C. This is expected as both diseases are caused by SARS-CoV-2 and are highly inflammatory states. However, different levels of expression are observed for certain genes, such as upregulation of ISG15 and SIGLEC1 in COVID-19 and upregulation of TRBV11-2 in MIS-C, as well as for certain pathways, such as inhibition of T-cell receptor, IL-2, and FAK signaling pathways in MIS-C. The latter finding is consistent with the observation of “T-cell exhaustion” associated with downregulation of NK and CD8+ T cells driving a sustained inflammatory response in MIS-C31. Genes showing differences in levels of gene expression (e.g. ISG15, SIGLEC1, TRBV11-2, CREB3L) or in persistence of gene expression as revealed by longitudinal data (e.g. ADAMTS2, KLRFB1), may be useful target biomarkers for diagnostic and prognostic assays that can discriminate between MIS-C, COVID-19, and other hyperinflammatory conditions.
Like cfRNA, cfDNA allows monitoring and quantification of tissue injury and cell death in a minimally invasive manner from blood. Previous studies have shown that the relative concentrations of cfDNA in specific tissues vary in different disease states, including COVID-19 in adults, solid-organ transplant rejection, graft-versus-host disease following stem cell transplantation, urinary tract infection, and cancer32,57–60. The profiling of cell-type specific methylation in cfDNA can also be used to estimate cfDNA tissues-of-origin (TOO). Here we quantify the concentration of cfDNA and perform cfDNA TOO profiling to assess the extent of cell, tissue, and organ injury related to MIS-C and COVID-19. We observe an increase in cell death and high levels of heterogeneity in TOO of cfDNA in MIS-C compared to COVID-19 and controls, consistent with the systemic inflammatory nature of MIS-C that manifests clinically with the involvement of multiple organs and organ systems. In addition, we observe a decrease in the concentration of mitochondrial cfDNA in MIS-C and COVID-19, which is discrepant from previous studies reporting an increase of mitochondrial cfDNA by PCR61.
Most prior host-biomarker studies of MIS-C and pediatric COVID-19 analyze samples from the acute symptomatic timepoint. Here, we report longitudinal sampling of cfRNA and wbRNA in patients with MIS-C and COVID-19 at acute, post-acute, one-month post-hospitalization, and ≥3 months post-hospitalization timepoints. We observe that most, but not all, gene measurements return to “baseline” levels similar to those in control samples by one-month psot-hospitalization. In wbRNA we observe opposing dynamics of ADAMTS2 levels in MIS-C and COVID-19, in which elevated ADAMTS2 levels return to baseline in MIS-C but not in COVID-19 at one-month post-hospitalization. Similarly, we observe recovery of wbRNA KLRB1 levels in MIS-C at one-month post-hospitalization, but not in COVID-19. These findings may be related to the post-acute sequelae of SARS-CoV-2 infection (“long COVID”) that has been postulated to be caused by persistent immune dysregulation and that can occur after COVID-19 62. In contrast, despite the severity of the initial presentation, most clinical and laboratory abnormalities from MIS-C tend to quickly resolve within a few weeks, along with normalization of inflammatory and injury biomarkers63. In cfRNA, we found that most biomarker measurements, such as CTO values and gene modules scores, persist at 1 month but return to baseline ≥3 months post-hospitalization. These results are consistent with the generally accepted time frames of recovery after MIS-C64.
We report a large-scale comparison (n=96) of wbRNA and cfRNA profiles from paired samples in MIS-C and COVID-19. A previous study compared paired wbRNA and plasma cfRNA from healthy individuals but was limited by small sample size (n=3) and lack of a disease group for comparison65. Our results reveal distinct, largely nonoverlapping sets of DAGs/DEGs associated with MIS-C, COVID-19, and non-inflammatory controls in wbRNA and cfRNA. Thus, both analytes provide complementary information with regards to the ability to discriminate MIS-C and COVID-19 from controls and from each other. These findings are consistent with the origin of wbRNA and cfRNA; wbRNA being primarily derived from active immune cells in blood and cfRNA from dying cells from the blood and peripheral tissues. They are also consistent with our CTO data showing predominantly erythrocytes in wbRNA and platelets in cfRNA and a greater diversity of cell types from peripheral tissues represented in cfRNA as compared to wbRNA. Overall, our results underscore the potential utility of cfRNA and cfDNA as complementary biomarkers to more traditional diagnostic methods (wbRNA, cytokines, proteomics) in better diagnosing and enhancing our understanding of complex disease states such as MIS-C.
Limitations
This study has multiple limitations. First, we have a limited sample size of asymptomatic-to-mild COVID-19 (wbRNA, n=6; cfRNA, n=6; cfDNA, n=10), as well as a limited number of longtudinally collected samples (MIS-C or COVID-19 during post-acute, one-month, or ≥3 months timepoints: wbRNA, n=63; cfRNA, n=45) and controls (cfDNA, n=5). Second, the accuacy of our deconvolution analyses may be limited by the reference set used as a comparator. The cfDNA deconvolution reference set consists of a limited number of cellular and tissue derived methylomes. The cfRNA/wbRNA deconvolution reference set is derived from a single cell RNA sequencing (scRNA-seq) atlas that sequenced poly-adenylated transcripts, while here we performed total RNA-Seq with host ribosomal RNA depletion. Third, we lacked samples from “look-alike” inflammatory conditions (e.g., Kawasaki disease, toxic shock syndrome, macrophage activation syndrome), and thus cannot determine whether the signatures in MIS-C and COVID-19 we observe are distinct from these other conditions. Future studies are needed to address these limitations.
METHODS
Ethics Statement
The protocols for this study were approved locally at each site by the University of California, San Francisco (UCSF) Institutional Review Board (IRB) (#21-33403), San Francisco, CA; Emory University IRB (STUDY00000723), Atlanta, GA; Children’s National Medical Center IRB (Pro00010632), Washington, DC; and Cornell University IRB for Human Participants (2012010003), New York, NY. The protocols at UCSF and Children’s National Medical Center were “no subject contact” sample biobanking protocols under which data was extracted from the medical chart and consent was not obtained. The protocol at Emory University IRB was a prospective enrollment study under which parents provided consent and children assent as appropriate for age. De-identified samples and patient information were shared with collaborating institutions for sample processing (UCSF and Cornell University) and analysis.
Sample Acquisition UCSF
Pediatric hospitalized patients who tested positive and negative for COVID-19 were identified from SARS-CoV-2 real-time PCR (RT-PCR) results from the UCSF Clinical Laboratories daily. Residual whole blood samples were collected in EDTA lavender top tubes from residual blood available. 250 µl of sample was aliquoted with 250 µl of 2x DNA/RNA shield (Zymo Research) for a 1:1 ratio. The remaining blood was centrifuged at 2500 rpm for 15 min and the available plasma was obtained. Samples were properly identified and added to the biobanking registry. All samples were stored at -80D freezer until used.
Sample Acquisition Emory and Children’s Healthcare of Atlanta
At Emory and Children’s Healthcare of Atlanta, pediatric patients with COVID-19, MIS-C, or controls were enrolled into a specimen collection protocol following informed consent and assent, as appropriate for age. For this study, patients were classified as having MIS-C if they met the CDC case definition, and as having COVID-19 if they had any PCR-confirmed SARS-CoV-2 infection. Controls were healthy outpatients with no known history of COVID-19 who volunteered for specimen collection. The specimen collection protocol was approved by the Emory University IRB. Residual whole blood and plasma samples were collected from the clinical laboratory, and prospective blood samples were additionally collected in EDTA lavender top tubes. Longitudinal samples were also collected at 1-month and ≥3-months timepoints for participants who returned for follow-up. From the EDTA tubes, whole blood was aliquoted, and the remaining blood was centrifuged at 2500 rpm for 15 min to obtain the available plasma. All samples were de-identified and assigned study IDs. Samples were stored at -80D and shipped on dry ice to either UCSF or Cornell for analysis.
Sample Acquisition Children’s National
Patients with MIS-C were identified by a multidisciplinary task force according to the CDC case definition. Remnant whole blood samples from this population were identified, collected, and processed 12-72 hours after collection. Samples were centrifuged at 1300 xG for 5 minutes at room temperature. Plasma was aliquoted into a cryovial and frozen at -80°C. A DMSO-based cryopreservative (Cryostor? CS10) was added in a 1:1 ratio to the cell pellet and then frozen at -80°C in a controlled rate freezing container (i.e., Mr. Frosty tm). After freezing the pellets with Cryostor they were transferred to liquid nitrogen cryostorage within 1 week.
Clinical Data
For the purposes of this study, MIS-C was defined as any patient who met the CDC case definition5. Multidisciplinary teams which adjudicated whether a patient met the case definition of MIS-C. COVID-19 was defined as any patient with PCR-confirmed SARS-CoV-2 infection within the preceding 14 days who did not also meet the MIS-C case definition. Clinical data was abstracted from the medical record and entered into a shared REDCap66,67 database housed at UCSF.
Cell-free RNA Sample Processing and RNA Extraction
Plasma samples were received on dry ice and stored at -80°C until processed. Prior to extraction, plasma was thawed at room temperature and spun at 1300xg for 10min at 4°C. The supernatant was taken and cfRNA was isolated from plasma (115-1000 µl) using the Norgen Plasma/Serum Circulating and Exosomal RNA Purification Mini Kit (51000, Norgen). Extracted RNA was DNase treated with 14 µl of 10 µl DNase Turbo Buffer (AM2238, Invitrogen), 3 µl DNase Turbo (AM2238, Invitrogen), 1µl Baseline Zero DNase (DB0715K, Lucigen-Epicenter) for 30 min at 37°C and then concentrated into 12 µl using the Zymo RNA Clean and Concentrate Kit (R1015, Zymo).
Cell-free RNA Library Preparation and Sequencing
Sequencing libraries were constructed from 8 µl of concentrated RNA using the Takara SMARTer® Stranded Total RNA-Seq Kit v2 – Pico Input Mammalian (634418, Takara). Briefly, extracted RNA was reverse transcribed using random priming, barcoded using the SMARTer RNA Unique Dual Index Kit (634451, Takara), rRNA depleted, and further amplified. Library concentration was quantified using a Qubit(tm) 3.0 Fluorometer (Q33216, Invitrogen) with the dsDNA HS Assay Kit (Q32854, Invitrogen). Libraries were quality-controlled using an Agilent Fragment Analyzer 5200 (M5310AA, Agilent) with the HS NGS Fragment kit (DNF-474-0500, Agilent). Libraries were pooled to equal concentrations and sent to the Cornell Genomics core for 150-base pair, paired-end sequencing on an Illumina NextSeq550 machine for an average of 10 million reads per sample.
Cell-free DNA Sample Processing and Extraction
Plasma samples were received on dry ice and stored at -80°C until processed. Prior to extraction, plasma samples (75-650 µl) were thawed at room temperature and spun at 1300xg for 10min at 4°C. The supernatant was taken and cfDNA was isolated from plasma using the Qiagen Circulating Nucleic Acid Kit (55114, Qiagen) and eluted to 45 µl.
Cell-free DNA Library Preparation and Sequencing
Sequencing libraries were constructed from 20 µl of extracted DNA as previously described32. Library concentration was quantified using a Qubit(tm) 3.0 Fluorometer (Q33216, Invitrogen) with the dsDNA HS Assay Kit (Q32854, Invitrogen). Libraries were quality-controlled using an Agilent Fragment Analyzer 5200 (M5310AA, Agilent) with the HS NGS Fragment kit (DNF-474-0500, Agilent). Libraries were pooled to equal concentrations and sent to the Cornell Genomics core for 150-base pair, paired-end sequencing on an Illumina NextSeq550 machine for an average of 33 million reads per sample.
Whole Blood RNA Sample Processing and RNA Extraction
Whole blood samples were received on dry ice and stored at -80°C until processed. Before extraction, all samples were thawed and pretreated with a 1:1 ratio of 2X RNA/DNA Shield (R1200, Zymo Research) if this was not added prior to freezing. RNA was extracted from whole blood samples (400 µl) using the Quick-RNA Whole Blood kit (R1201, Zymo Research) following manufacturer’s instructions. Ribosomal depletion was not performed. RNA was eluted in 15 µl of RNase-free water and stored at -80 °C until use. The concentration of eluted RNA was measured using a Qubit(tm) Flex Fluorometer (Q33326, Invitrogen) with the RNA HS Assay Kit (Q32852, Invitrogen). RNA Integrity was assessed on a subset of samples using the Agilent Bioanalyzer RNA 6000 Nano/Pico Chip (5067-4626) to determine the RNA Integrity Number (RIN). All samples analyzed were partially degraded with RIN values between 2 and 5.
Whole Blood RNA Library Preparation and Sequencing
Extracted RNA (7 µl or 10-200 ng) was processed using the NEBNext rRNA Depletion Kit (Human/Mouse/Rat) (E6310, NEB) and the NEBNext Ultra II Directional RNA Library Prep Kit (E7760 & E7765, NEB) following manufacturer’s specifications. Briefly, samples were first treated for ribosomal RNA depletion and DNase digestion, fragmented for 8 min, and reverse transcribed. Adapters were ligated to the purified cDNA (1:25 diluted adaptor), followed by library amplification and barcoding using NEBNext Multiplex Oligos (E6609L, NEB) sets 1 to 4. Libraries were purified using NEBNext Sample Purification Beads (E7103, NEB).
Libraries were quantified using the Qubit(tm) Flex (Q33327, Invitrogen) with the dsDNA HS Assay Kit (Q32854, Invitrogen). Libraries were pooled and sent to the UCSF Center of Advanced Technology (CAT) for sequencing on an Illumina NovaSeq 6000 Sequencing System using 150-base pair paired-end sequencing. Negative controls (nuclease-free water) were included in every run to monitor for contamination.
RNA Bioinformatic Processing
Sequencing data was processed using a custom bioinformatics pipeline utilizing the Snakemake workflow management system (v7.7.0) for the cfRNA samples and a bash script for the wbRNA samples. Samples were quality filtered and trimmed using BBDUK (v38.90), aligned to the Gencode GRCh38 human reference genome (v38, primary assembly) using STAR (v2.7.0f) default parameters, and features quantified using featureCount (v2.0.0). cfRNA samples were also deduplicated using Picard MarkDuplicates prior to feature quantification (v2.19.2). Mitochondrial, ribosomal, X, and Y chromosome genes were removed prior to analysis.
RNA Sample Quality Filtering
wbRNA and cfRNA samples were filtered using different QC metrics due to differences in RNA concentration and quality. wbRNA samples with less than 10% of reads aligning to the transcriptome were removed from all analyses. cfRNA samples were filtered on the basis of DNA contamination, rRNA contamination, number of counts, and RNA degradation. DNA contamination was estimated by calculating the ratio of reads mapping to introns and exons. Samples with an intron to exon ratio above three were removed. rRNA contamination was measured using Samtools (v1.14). Total counts were calculated using featureCounts. Degradation was estimated by calculating the 5-3’ bias as calculated by Qualimap (v2.2.1). Samples with rRNA contamination, total counts, or 5-3’ bias greater than three standard deviations from the mean were removed. Also, samples with less than 75,000 total counts were removed.
RNA Cell Deconvolution and Diversity
Cell type deconvolution was performed using BayesPrism (v1.1) with the Tabula Sapiens single cell RNA-seq atlas (Release 1) as a reference. Cells from the Tabula Sapiens atlas were grouped as previously described in Vorperian et al. Cell types with more than 100,000 unique molecular identifiers (UMIs) were included in the reference and subsampled to 300 cells using ScanPy (v1.8.1). Providing an equal number of cells to the deconvolution ensured an unbiased prior for the Bayesian algorithm used in BayesPrism. Cell-type contribution diversity metrics were calculated using the vegan R package (v2.5.7).
RNA Differential Abundance/Expression Analysis
Comparative analysis of DEGs was performed using a negative binomial model as implemented in the DESeq2 package (wbRNA: v1.28.1, cfRNA: v1.34.0) using a Benjamini-Hochberg corrected p-value cutoff <0.01, unless otherwise stated. Heatmaps were constructed using the pheatmap package in R (v1.0.12), samples and genes were clustered using correlation based hierarchical clustering. Gene ontology analysis was performed using the topGO R package (v2.46.0)24 using a Benjamini-Hochberg-corrected p-value cutoff of <0.05. Cumulative CPM values for gene ontology terms were calculated by taking the sum of normalized counts from all the significant genes in each gene ontology module. Canonical pathways, diseases and functions were analyzed using QIAGEN Ingenuity Pathway Analysis (IPA) software (v73620684). Pathway analysis was run twice: (1) with all available samples and (2) with samples that have both whole blood and cell-free data available.
RNA Paired Sample Correlation Analysis
Log CPM normalized RNA counts were correlated between paired cfRNA and wbRNA samples. Genes with a low average expression (<10 CPM) in either cfRNA or wbRNA were removed. Samples with extreme cfRNA or wbRNA counts (CPM z-score >3) were removed to eliminate outlier bias. A Pearson correlation was calculated for each gene and a Benjamini-Hochberg– corrected p-value cutoff <0.05 was used to determine significance.
We performed two permutation tests (n=1000 permutations) to determine the reliability of the number of genes observed to be correlated. First, we randomly shuffled the sample labels and recalculated the total number of significant genes. Second, we randomly shuffled gene labels and recalculated the total number of significant correlations.
Entanglement Analysis
Differential abundance/expression analysis was performed using only paired cfRNA and wbRNA samples as previously described, using a Benjamini-Hochberg–corrected p-value cutoff <0.05. Samples were clustered using correlation based hierarchical clustering. Using the dendextend package in R (v1.15.2), dendrograms were plotted, paired samples were connected by lines, and Baker’s gamma correlation coefficient was calculated. Non-exact, two-sided p-values were calculated using a monte carlo permutation test.
DNA Concentration
Eluted DNA was quantified using a Qubit(tm) 3.0 Fluorometer (Q33216, Invitrogen) with the dsDNA HS Assay Kit (Q32854, Invitrogen). Total cfDNA concentration was estimated using the following formula:
DNA Bioinformatic Processing
Sequencing data was processed using a custom bioinformatics pipeline utilizing the Snakemake workflow management system (v7.7.0). Samples were quality filtered and trimmed using BBDUK (v38.46), aligned to the Gencode GRCh38 human reference genome (v38, primary assembly) and deduplicated using Bismark (v0.22.1) with default parameters, and quality filtered using samtools (v1.14). Prior to any analysis, X and Y chromosome mapped reads were removed.
DNA Deconvolution
DNA tissues of origin deconvolution were performed as previously described32. Briefly, a custom bioinformatic pipeline utilizing the Snakemake workflow management system (v7.7.0) was used to process publicly available methylation references, convert to a standard data format, and normalize across samples. Metline (v0.2-7) was used to discover differentially methylated regions and a quadratic programming algorithm estimated relative contributions of the tissues represented in the methylation references.
Quantification and Statistical Analysis
All statistical analyses were performed using R (cfDNA: v4.1.0, cfRNA: v4.1.0, wbRNA: v4.0.3). Data wrangling and visualization was performed using Python (3.9.1), Pandas (1.3.0) matplotlib-venn (0.11.6), R (v4.1.0), Tidyverse (v1.3.1), and ggplot2 (v3.3.5). Statistical significance was tested using Wilcoxon signed-rank tests and Mann-Whitney U tests in a two-sided manner, unless otherwise stated. All sequencing data was aligned to the GRCh38 Gencode v38 Primary Assembly and features counted using the GRCh38 Gencode v38 Primary Assembly Annotation.
Data Visualization
Figures were created using Qiagen IPA, Adobe Illustrator, Affinity Designer, and BioRender.com software.
Data Availability
Data and Code Availability All code will be made available on Github. Processed sequencing data will be deposited in the National Institutes of Health (NIH) / National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) and Gene Expression Omnibus (GEO) repositories under restricted access via Database for Genotypes and Phenotypes (dbGAP).
Author Contributions
C.J.L., R.D., C.A.R., I.D.V., and C.Y.C. conceived and designed the study. C.J.L., A,S.-G., V.S., J.L., A.P.C., and A.B. performed sequencing experiments. J.N., J.L., M.E.W., P.S., N.B., J.S., W.S., C.H., N.W., A.F., A.G. K.B., M.A.P., L.H., E.J.A., A.C., M.D., R.D., C.A.R., and C.Y.C. identified and collected patient samples and clinical metadata. C.J.L., A.S.-G., V.S., B.B.,M.D.,R.D.,C.A.R., I.D.V., and C.Y.C. analyzed sequencing data. M.D., A.B., R.D., C.A.R., I.D.V., and C.Y.C. supervised the study. C.J.L., V.S., S.B., I.D.V., and C.Y.C. wrote the manuscript and prepared the figures. All authors read and edited the manuscript and agree to its contents.
Conflicts of Interest
I.D.V. is a member of the Scientific Advisory Board of Karius Inc., Kanvas Biosciences and GenDX. C.Y.C. is a founder and a member of the Scientific Advisory Board of Delve Bio. A.P.C. is listed as an inventor on submitted patents pertaining to cell-free DNA (US patent applications 63/237,367, 63/056,249, 63/015,095, 16/500,929) and receives consulting fees from Eurofins Viracor. C.A.R.’s institution has received funding to conduct clinical research unrelated to this manuscript from BioFire Inc., GSK, MedImmune, Micron, Merck, Novavax, PaxVax, Regeneron, Pfizer, and Sanofi-Pasteur. She is co-inventor of patented RSV vaccine technology (International PCT Application No. PCT/US2016/058976, filed 12/28/2016 by Emory University), which has been licensed to Meissa Vaccines, Inc. with royalties received. Her institution has received funding from NIH to conduct clinical trials of Moderna and Janssen COVID-19 vaccines. E.J.A has consulted for Pfizer, Sanofi Pasteur, GSK, Janssen, and Medscape, and his institution receives funds to conduct clinical research unrelated to this manuscript from MedImmune, Regeneron, PaxVax, Pfizer, GSK, Merck, Sanofi-Pasteur, Janssen, and Micron. He also serves on a safety monitoring board for Kentucky BioProcessing, Inc. and Sanofi Pasteur. He serves on a data adjudication board for WCG and ACI Clinical. His institution has also received funding from NIH to conduct clinical trials of Moderna and Janssen COVID-19 vaccines. Atul Butte is a co-founder and consultant to Personalis and NuMedii; consultant to Mango Tree Corporation, and in the recent past, Samsung, 10x Genomics, Helix, Pathway Genomics, and Verinata (Illumina); has served on paid advisory panels or boards for Geisinger Health, Regenstrief Institute, Gerson Lehman Group, AlphaSights, Covance, Novartis, Genentech, and Merck, and Roche; is a shareholder in Personalis and NuMedii; is a minor shareholder in Apple, Meta (Facebook), Alphabet (Google), Microsoft, Amazon, Snap, 10x Genomics, Illumina, Regeneron, Sanofi, Pfizer, Royalty Pharma, Moderna, Sutro, Doximity, BioNtech, Invitae, Pacific Biosciences, Editas Medicine, Nuna Health, Assay Depot, and Vet24seven, and several other non-health related companies and mutual funds; and has received honoraria and travel reimbursement for invited talks from Johnson and Johnson, Roche, Genentech, Pfizer, Merck, Lilly, Takeda, Varian, Mars, Siemens, Optum, Abbott, Celgene, AstraZeneca, AbbVie, Westat, and many academic institutions, medical or disease specific foundations and associations, and health systems. Atul Butte receives royalty payments through Stanford University, for several patents and other disclosures licensed to NuMedii and Personalis. Atul Butte’s research has been funded by NIH, Peraton (as the prime on an NIH contract), Genentech, Johnson and Johnson, FDA, Robert Wood Johnson Foundation, Leon Lowenstein Foundation, Intervalien Foundation, Priscilla Chan and Mark Zuckerberg, the Barbara and Gerson Bakar Foundation, and in the recent past, the March of Dimes, Juvenile Diabetes Research Foundation, California Governor’s Office of Planning and Research, California Institute for Regenerative Medicine, L’Oreal, and Progenity. The authors have declared that none of these companies or competing interests had any role in this work or manuscript.
Data and Code Availability
All code will be made available on Github. Processed sequencing data will be deposited in the National Institutes of Health (NIH) / National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) and Gene Expression Omnibus (GEO) repositories under restricted access via Database for Genotypes and Phenotypes (dbGAP).
Acknowledgements
We would like to acknowledge staff members at the UCSF Clinical Laboratories and the UCSF Clinical Microbiology Laboratories for their help in identifying and retrieving patient whole blood samples. We thank the Cornell Genomics Center and the UCSF Center for Advanced Technology for help with sequencing libraries. At Emory, we thank Christopher Choi, Caroline Ciric, Khalel De Castro, Theda Gibson, Hui-Mien Hsiao, Wensheng Li, Austin Lu, Lisa Macoy, Kathy Stephens, Madeline Taylor, Ashley Tippett and the Children’s Healthcare of Atlanta Research Laboratory for their contributions to specimen and data collection. We thank the patients and their families for contributing their blood to further our understanding of pediatric COVID-19 and MIS-C. This work was supported by the National Institutes of Health (NIH) / National Institute of Child Health and Human Development (NICHD) grant R61HD105618 (R.D., C.A.R., I.D.V., and C.Y.C.). The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.