Abstract
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the third virus that caused coronavirus-related outbreaks over the past 20 years. The outbreak was first reported in December 2019 from Wuhan, China, but rapidly progressed into a pandemic of an unprecedented scale since the 1918 flu pandemic. In addition to respiratory complications in COVID-19 patients, clinical characterizations of severe infection cases indicated a number of other comorbidities, including multiple organ failure (liver, kidney, and heart) and septic shock. In an attempt to elucidate COVID-19 pathogenesis in different human organs, we interrogated the presence of the virus in the blood, or any of its components, which might provide a form of trafficking or hiding to the virus. By computationally analyzing high-throughput sequence data from patients with active COVID-19 infection, we found evidence of traces of SARS-CoV-2 RNA in peripheral blood mononuclear cells (PBMC), while the virus RNA was abundant in bronchoalveolar lavage specimens from the same patients. To the best of our knowledge, the presence of SAR-CoV-2 RNA in the PBMC of COVID-19 patients has not been reported before, and this observation could suggest immune presentation, but discounts the possibility of extensive viral infection of lymphocytes or monocytes.
Introduction
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the third virus that caused coronavirus-related outbreaks over the past 20 years. The first outbreak occurred in Asia in 2002-2003 causing Severe Acute Respiratory Syndrome (SARS), hence, the name, SARS-CoV, which back then was not related to any of the known viruses (Marra et al. 2003; Rota et al. 2003). Between 2002 and 2003, 8,098 people became sick with SARS, and of those 774 died (i.e., a mortality rate of 9.5%). Since 2004, there have been no more reports of SARS cases (via NHS, WHO, and CDC).
The second coronavirus-related outbreak started in the Arabian Peninsula in 2012 (Zaki et al. 2012) causing a more fetal disease, Middle East respiratory syndrome (MERS), with a significantly higher mortality rate of 40% of the cases infected by MERS-CoV virus (Zumla, Hui, and Perlman 2015).
More recently, in December 2019, the third coronavirus-related outbreak was first reported in Wuhan, China, by the emergence of the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), initially dubbed “the 2019 novel coronavirus” (2019-nCoV). The spread of the virus led to a pandemic of an unprecedented scale since the 1918 flu pandemic.
As of May 10, almost 4 million confirmed COVID-19 cases globally and 27,000 deaths have been reported by WHO (WHO Dashboard, continuously updated). In addition to the respiratory complications in COVID-19 patients, clinical characterizations of severe infection cases indicated further comorbidities, including multiple organ failure (liver, kidney, and heart) and septic shock (Poston, Patel, and Davis 2020; Cascella et al. 2020; Hui Li et al. 2020).
The genome sequence of SARS-COV-2 has been determined and made public (Lu et al. 2020), and since then thousands (more than 17,000) of genomes have been sequenced from all around the world (Shu and McCauley 2017). The availability of those genomic sequences allows the rapid screening of viral RNA in human tissues as well as environmental samples (e.g., sewage (Bibby and Peccia 2013)) using multi-omic wet lab technologies, as well as in silico screening tools, for publicly available metatranscriptomic samples.
In an attempt to elucidate COVID-19 pathogenesis in different human organs, we conducted this study to interrogate the presence of the virus in the blood, or any of its components, which might provide a form of trafficking or hiding to the virus, notably that some precarious studies reported the ability of the virus to infect lymphocytes (X. Wang et al. 2020) whereas others went as far as to suggest the virus exerts its pathogenesis through “attacking hemoglobin,” although this hypothesis has been heavily criticized (Read 2020). Moreover, the virus was reported to be found in the plasma of COVID-19 patients (Huang et al. 2020). Finally, peripheral blood mononuclear cells (PBMC) were shown to harbor other infectious viruses, such as HIV, HCV, and HBV (W.-K. Wang et al. 2002; Z. Li, Hou, and Cao 2015).
For the aforementioned reasons, we computationally analyzed high-throughput sequence data from patients with active COVID-19 infection, and found evidence of traces of SARS-CoV-2 RNA in their PMBCs, while their bronchoalveolar lavage samples had large amounts of the viral RNA.
Methods
Dataset
Publicly accessible raw RNA-Seq FASTQ sequences published by Xiong et al (Xiong et al. 2020) were obtained from the Genome Sequence Archive (Y. Wang et al. 2017) (GSA accession CRA002390).
Comparison of RNA Abundance and Gene Expression Profiles
Filtered RNA-Seq reads were searched (with blastx (Altschul et al. 1990)) against the RefSeq protein database (O’Leary et al. 2016) (release 99) using DIAMOND (Buchfink, Xie, and Huson 2015) with an e-value cutoff < 1e-10. The counts of the matching RefSeq accessions were normalized to the total number of reads per library. Principal Component Analysis (PCA) was performed on the normalized abundance of the RefSeq proteins.
Detection of Viral RNA
Raw sequences were processed for quality control using fastp (Chen et al. 2018). Filtered FASTQ sequences were aligned to the SAR-CoV-2 reference genome (GenBank accession NC_045512) using BWA (Heng Li and Durbin 2009). Generated BAM files were filtered for mapped sequences with a quality score > 40 and alignment score > 90 using Sambamba (Tarasov et al. 2015). Identified SAR-CoV-2 matching sequences were manually inspected and searched against NCBI “nt” using BLAST (blastn) for verification (Altschul et al. 1990).
Results
We analyzed the RNA-Seq dataset published by Xiong et al (Xiong et al. 2020). Table 1 describes the processed samples and the sequence reads statistics of each sample:
Nine RNA-Seq samples (Xiong et al. 2020) were analyzed, including 3 PBMC from healthy donors, 3 PBMC from patients, and 2 BALF from patients with an additional replicate each. The numbers are paired-end reads.
Based on the global gene expression in PBMC, PCA showed a strong, expected separation between BALF and PBMC samples, and further slight separation in the PBMC from healthy controls and COVID-19 patients (Figure 1). However, one healthy control was placed within the COVID-19 cluster.
Samples are color-coded, red for healthy donors, and green for COVID-19 patients. PC1 explains 63% of the variance and PC2 explains 30% of the variance.
We found viral sequences in all of the BALF samples with a median of abundance 2.15% of the total reads. We also identified 2 paired-ended viral reads in a PBMC sample of a COVID-19 patient, matching the SAR-CoV-2 polyprotein (pp1ab) (accession NP_828849) and SAR-CoV-2 surface glycoprotein (accession YP_009724390) (Figures 2, 3).
Viral RNA sequences from the COVID-19 patients were mapped against the SARS-CoV-2 reference genome (GenBank accession NC_045512). The x-axis is the nucleotide position on the virus genome. The y-axis is the coverage (in reads; not normalized) of the genomic position by RNA-Seq viral sequences. Panel "A” represents the BALF samples. Panel "B” represents the PBMC samples.
The two paired-end viral reads from a COVID-19 PBMC sample aligned to the SARS-CoV-2 reference genome (GenBank accession NC_045512). Panel A is the alignment first paired-end read. Panel B is the alignment second paired-end read. Perfect identities between the reference and the two overlapping mates are indicated by “*”.
Discussion
Coronavirus-related infections have been reported to be associated with hematological changes including lymphopenia, thrombocytopenia, and leukopenia through infecting blood cells, bone marrow stromal cells, or inducing autoantibodies (Yang et al. 2003).
In an early study characterizing the clinical features of COVID-19 patients, Huang and coworkers indicated that using RT-PCR allowed them to detect coronavirus in plasma isolated samples from the patients (Huang et al. 2020). In their report, they preferred to use the term "RNAaemia,” rather than "viraemia,” which they defined as the presence of a virus in the blood because they did not perform tests to confirm the presence of an infectious SAR-CoV-2 virus in the blood of the patients.
On the other hand, there have been no reports of detecting viral RNA in blood cells, notably PBMC. On the contrary, in their preliminary analysis of RNA isolated from PBMC, Corley et al confirmed that they did not detect viral sequences (currently, a preprint 10.1101/2020.04.13.039263v1).
Using high-throughput sequencing has been repeatedly demonstrated to be an effective approach for the identification and quantification of viruses in the blood (Moustafa et al. 2017) following similar methods as those used in viral metagenomics and uncultivated viral genomics Breitbart 2003 (Breitbart et al. 2003), Aziz et al (Aziz et al. 2015) and Roux et al (Breitbart et al. 2003).
Therefore, we planned to exhaustively look for SAR-CoV-2 RNA sequences in publicly available RNA-Seq PBMC datasets. As of the writing of this report, there has been only one publicly available RNA-Seq PBMC dataset published by (Xiong et al. 2020), in which the group profiled global gene expression in BALF and PBMC specimens of COVID-19. Predictably, we detected viral RNA in all BALF samples (2 patients, 2 replicates each) with an average abundance of 2.15% of the total RNA in those samples, which included human RNA. However, we identified two paired-end reads in RNA isolated from PBMC from only one (accession CRR119891) out of three patients, which aligned to the SAR-CoV-2 genome. Expectedly, no SARS-CoV-2 RNA was detected in the PBMC from healthy controls (3 donors). These RNA traces are certainly quite rare; however, they confidently and specifically belong to SARS-COV-2. One viral RNA read translates into polyprotein (pp1ab, accession NP_828849), which is the largest protein of coronaviruses and involved in the replication and transcription of the viral genome. The other viral RNA translates into surface (spike) glycoprotein (accession YP_009724390), which mediates the entry of the viral into the human cells expressing human angiotensin-converting enzyme 2 (hACE2) (Ou et al. 2020; Walls et al. 2020).
Although we are not rejecting the possibility of cross-contamination or barcode bleeding (Mitra et al. 2015; Kircher, Sawyer, and Meyer 2012) for detecting the viral RNA in one PBMC RNA-Seq sample, such possibility is unlikely, given that control samples had zero hits to SARS-CoV-2 RNA. We are also considering the possibility of SAR-CoV-2 being sampled by antigen-presenting cells (most likely dendritic cells) or presented to T lymphocytes, which are in the PBMC population.
One more possibility, which we believe needs many more samples to consider, is that SARS-CoV-2 may be specifically or coincidentally internalized by one of the mononuclear cell types, which may suggest a mechanism for the chronicity of the SAR-CoV-2 infection. This hypothesis requires further testing. However, in the light of our data, it is hard to support the early reports that SARS-COV-2 targets T-lymphocytes in vivo, as suggested earlier, in a correspondence, based on cell culture experiment with pseudotyped viruses (X. Wang et al. 2020).
With more data becoming publicly available, it will be possible to revisit this hypothesis and others to improve our understanding of the progression and replication of SAR-CoV-2 in infected individuals.