ABSTRACT
Background Both SARS-CoV-2 reinfection and persistent infection have been reported, but sequence characteristics in these scenarios have not been described. We assessed published cases of SARS-CoV-2 reinfection and persistence, characterizing the hallmarks of reinfecting sequences and the rate of viral evolution in persistent infection.
Methods A systematic review of PubMed was conducted to identify cases of SARS-CoV-2 reinfection and persistence with available sequences. Nucleotide and amino acid changes in the reinfecting sequence were compared to both the initial and contemporaneous community variants. Time-measured phylogenetic reconstruction was performed to compare intra-host viral evolution in persistent SARS-CoV-2 to community-driven evolution.
Results Twenty reinfection and nine persistent infection cases were identified. Reports of reinfection cases spanned a broad distribution of ages, baseline health status, reinfection severity, and occurred as early as 1.5 months or >8 months after the initial infection. The reinfecting viral sequences had a median of 17.5 nucleotide changes with enrichment in the ORF8 and N genes. The number of changes did not differ by the severity of reinfection and reinfecting variants were similar to the contemporaneous sequences circulating in the community. Patients with persistent COVID-19 demonstrated more rapid accumulation of sequence changes than seen with community-driven evolution with continued evolution during convalescent plasma or monoclonal antibody treatment.
Conclusions Reinfecting SARS-CoV-2 viral genomes largely mirror contemporaneous circulating sequences in that geographic region, while persistent COVID-19 has been largely described in immunosuppressed individuals and is associated with accelerated viral evolution.
Summary Reinfecting SARS-CoV-2 viral genomes largely mirror contemporaneous circulating sequences in that geographic region, while persistent COVID-19 has been largely described in immunosuppressed individuals and is associated with accelerated viral evolution.
Funding This study was funded in part by the NIH grant 106701.
Disclosures Dr. Li has consulted for Abbvie.
BACKGROUND
After resolution of coronavirus disease 2019 (COVID-19) following SARS-CoV-2 infection, antibodies against SARS-CoV-2 persist in the majority of patients for 6 months or more [1]. Despite this, there have now been a number of reports of COVID-19 reinfection, spanning a broad range of age groups, time frames, and disease severity [2-7]. There remains a great deal of uncertainty over the viral characteristics of reinfection cases, including the degree of sequence heterogeneity and the location of new mutations between the initial and reinfecting variants, if any. In addition, the diagnosis of COVID-19 reinfection has been complicated by the increasing reports of persistent COVID-19 infection, especially in immunosuppressed individuals. Like reinfection cases, persistent COVID-19 can also span the range of disease severity, from asymptomatic to severe disease, and recurrent symptoms can last for months [8-11]. Differentiating between persistence and reinfection can be challenging, and little is known about differences in the location and quantity of SARS-CoV-2 mutations in these scenarios. We performed an analysis of SARS-CoV-2 sequences from published cases of COVID-19 reinfection and persistence, characterizing the hallmarks of reinfecting sequences and the rate of viral evolution in persistent infection.
METHODS
Data search and selection criteria
We conducted a systematic literature review in PubMed through March 8, 2021 for cases of persistent COVID-19 using the search term “((covid or sars-CoV-2) AND (persistent or persistence or prolonged)) AND (sequence or evolution)”. A search for COVID-19 reinfection reports was made using the terms “(covid or sars-CoV-2) AND (reinfection)”. Both peer-reviewed and preprint results were evaluated. We used the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) for reviewing literature and for reporting search results. Additional preprints that appeared through Google search and that met our criteria were also included. For cases of reinfection, papers were included if the authors described it as a case of reinfection diagnosed >30 days after the initial infection and if whole genome SARS-CoV-2 sequences or sites of mutations relative to a reference sequence (e.g., Wuhan-Hu-1) from both infection time-points were available. Of the 291 results from the search, 14 articles met the inclusion criteria and were included in the present report along with 2 additional preprints that were identified (Supplemental Figure 1A).
Persistent cases were included if the authors described it as a case of persistent COVID-19 infection and if longitudinal whole genome SARS-CoV-2 sequences were available. The search returned 129 results, 7 of which met the inclusion criteria and were included in the present report along with one other preprint (Supplemental Figure 1B). Only sequences obtained directly from patient respiratory tract samples were included in our analysis to exclude the possibility of sequence changes during the ex vivo culture process. Three cases were excluded due to uncertainty in their classification as either reinfection or persistent infection cases (Supplemental Methods, Supplemental Table 1, Supplemental Figure 2).
Sequences were analyzed for mutations using NextClade (https://clades.nextstrain.org/) and snp-sites (https://github.com/sanger-pathogens/snp-sites). The degree of reinfection severity, either more or less severe compared to the first infection, was classified based on an explicit determination by the authors of each article or by comparing symptoms, duration of illness, and hospitalization status between both episodes.
Sequencing dataset compilation and phylogenetic tree construction
The sequencing dataset contained a total of 262 globally representative SARS-CoV-2 genomes selected from GISAID and sequences from the reinfection and persistence cases (Supplemental Methods; Supplemental Data 1). The sampled sequences were chosen to be representative of global sequence diversity throughout the time course of the pandemic. Sequences of variants of concern B.1.1.7 and B.1.351 were also included. Nucleotide sequence alignment was performed using MAFFT (Multiple Alignment using Fast Fourier Transform) [12]. Best-fit nucleotide substitution was calculated using model selection followed by maximum likelihood (ML) phylogenetic tree construction using IQ-Tree with 1000-bootstrap replicates [12].
Mutation analysis
For reinfection cases, mutations were determined in two ways. First, nucleotide and amino acid changes were identified for the reinfection sequences relative to the first infection sequence. The frequency of nucleotide or amino acid changes within each gene was compared to the frequency of changes in the remainder of the genome by Fisher’s exact tests with a Bonferroni correction (for multiple comparisons). The relationship between disease severity and number of nucleotide or amino acid changes in the genome was assessed using a Mann-Whitney test. Second, to identify unique characteristics of reinfecting viruses, each of the first and reinfection sequences were compared to circulating sequences in the community as defined by the same NextStrain clade sampled within one month obtained from the same geographic location uploaded to GISAID (https://www.gisaid.org/; Supplemental Table 2, Supplemental Methods, Supplemental Data 2). Rare mutations were determined as polymorphisms that were present only in the reinfecting sequence (not the initial variant) and found in less than 1% of contemporaneous community sequences. Mutation locations are graphically represented in Circos plots [13].
For persistent infections, sequence changes were assessed at two time intervals: before or after convalescent plasma or monoclonal antibody treatment. Sequences sampled before convalescent plasma or antibody treatment were compared to the first sequence sampled. For sequences sampled after convalescent plasma or antibody treatment, sequence changes (both nucleotide and amino acid) were determined relative to the last pre-treatment sequence. Linear regression was used to estimate the rate of viral changes between two intervals. The slope of the trendline was compared to the latest global clock rate (March 29, 2021) as estimated by NextStrain (https://nextstrain.org/ncov/global/).
Time-measured phylogenetic analysis
The temporal signal of the ML tree was examined in TempEst [14] regressing on root-to-tip divergence, and outliers were inspected in the distribution of residuals. A high degree of clock-like behavior in the whole dataset was observed (R2 = 0.721) in root-to-tip regression analysis with the slope rate as 8.26E-4 and the rough ancestral time of the sample was calculated as 2019.84. This suggests that the whole dataset has a realistic temporal signal and it is appropriate for an estimation of temporal parameters. No outliers were found in this sample. To further examine the temporal signal in the sequences from persistent patients (especially these with > 2 sequences), separate root-to-tip regression analysis also supported temporal signal for a time-measured phylogeny. To compare the evolutionary rates between the reported persistent infections and the general population infections, time-measured phylogenetic reconstruction was conducted in Bayesian Evolutionary Analysis Sampling Trees (BEAST) v1.10.4 [15]. Nine partitions, including eight persistent patients and the global sequences, were used as separate groups of taxa, to estimate separate evolutionary rates. Due to large uncertainties with small samples, persistent patients with only two viral sequences were excluded from this analysis. A general time reversible (GTR) model was applied with gamma-distributed rate variations among sites. A lognormal relaxed molecular clock was used with an initial mean of 0.0008 and a uniform prior ranging from 0.0 to 1.0. A logistic growth tree prior was applied. Four independent Bayesian Markov Chain Monte Carlo (MCMC) chains of 100 million generations were performed with a sampling step every 10,000 generations to yield 10,000 trees per run. To ensure a sufficient effective sample size ESS > 200, the convergence of three runs was diagnosed in Tracer v 1.7.1 (http://tree.bio.ed.ac.uk/software/tracer/) for all parameters. LogCombiner v1.10.4 as part of the BEAST software package was used to combine the multiple runs to generate log and tree files after appropriate removal of the burn-in from each MCMC chain. The comparison of the evolutionary rates from the combined log file is analyzed and visualized in R v4.0.2 (https://www.r-project.org/).
Statistical analysis
Nonparametric Wilcoxon rank sum or matched pairs signed rank tests were used to compare the number of amino acid changes between sequences. Statistical analyses were performed using GraphPad Prism 9 (GraphPad Software, San Diego, CA).
RESULTS
Sequence analysis of reinfection cases
A total of twenty cases from sixteen reports were included in this analysis (Table 1) [2-7, 16-25]. A broad range of age groups were represented and 90% were under the age of 70 years. Most (80%) of the cases had no reported comorbidities and while one patient had diabetes and end-stage renal disease, none had high-level immunosuppression. The interval between diagnosis of the first infection and the second infection ranged from 44 days to 282 days with a median of 113.5 days. Five patients had more severe illness during the second infection, while six had less severe symptoms on reinfection, including two who were asymptomatic on reinfection. Two cases were asymptomatic in both infections, five cases reported the same severity for both infections and no information on infection severity was available for two cases (Table 1). Six cases reported reinfection with a virus from the same clade.
Phylogenetic analysis demonstrated distinct branching for the two sequences in each of the reinfection cases, corroborating results discussed in the original reports (Figure 1). We compared nucleotide and amino acid changes in the reinfecting viral sequence compared to the initial sequence and found a median of 17.5 nucleotide changes (range 9-37) and 9 amino-acid changes (range 6-24) compared to the original sequence (Figure 2A). The nucleotide changes between the initial and reinfecting sequences were distributed across the SARS-CoV-2 genome, with significantly higher frequencies of changes in ORF8 (P<0.001) and N (P=0.001) (Figure 2B). A similar pattern was observed with amino acid changes (Supplemental Figure 3A). All but two reinfection cases had at least one substitution or deletion in the S gene (Supplemental Table 3). Next, we assessed whether reinfection with a more divergent second virus resulted in more severe disease. We found no significant differences in the number of nucleotide or amino acid changes in the reinfecting virus compared to the original viral variant when categorized by the severity of the reinfection (Figure 2C; Supplemental Figure 3B). Both the initial and reinfecting SARS-CoV-2 variants were similar to the sequences circulating in the community at the time of reinfection. The initial infecting variant harbored a median of only 2 rare nucleotide mutations compared to contemporaneous circulating variants in the community and the reinfecting variant contained a median of only 1 rare nucleotide mutation (Figure 2D-E; Supplemental Figure 3C).
Sequence analysis of persistent COVID-19 cases
A total of nine cases from seven reports describing persistent infection were retrieved from our literature search. Of these nine cases, all but one had B cell immunodeficiency [8-10, 26-29]. Four were treated with B cell-depleting therapy for lymphoma or autoimmune disorders, while four had B cell lymphomas treated with chemotherapy (Table 2). One patient had advanced HIV infection with a CD4+ count of 0 cells/mm3 and diminished CD19+ cell counts. The median length of infection was 154 days and 33% of the cases ended in death. One patient had asymptomatic disease throughout [9]. Four patients were treated with convalescent plasma at least once during their illness [9, 10, 26, 28], and one patient was treated with the monoclonal antibodies casirivimab and imdevimab [8].
Phylogenetic analysis revealed that, for each of the nine patients, sequences formed a distinct cluster, confirming what was found in the original reports (Figure 1). New mutations emerging over time were detected in all of the persistent COVID-19 patients with further changes identified after treatment with convalescent plasma or monoclonal antibodies (Supplemental Figure 4). Mutations occurred with significantly higher frequency in S (P<0.001) and ORF7a (P=0.02) and lower frequency in ORF1a (P=0.02) (Figure 3A; Supplemental Figure 5A). The rate of viral evolution was plotted for each patient both for the interval before and after convalescent plasma/antibody treatment. Before antiviral treatment, the rate of sequence changes over time appeared faster than the NextStrain estimate for the global rate of SARS-CoV-2 evolution (dotted purple line) (Figure 3B; Supplemental Figure 5B). Treatment with convalescent plasma or antibody cocktail treatment was insufficient to halt intra-host viral evolution (Figure 3C; Supplemental Figure 5C).
We also performed time-measured phylogenetic reconstruction with the pre-treatment persistent sequences to compare the rate of intra-host viral evolution in persistent COVID-19 to the rate of community-driven evolution. This analysis provided further evidence that SARS-CoV-2 evolution appeared faster in these persistent infection individuals compared to the rate in the general public population, though substantial uncertainties are shown in these estimates given the limited sequence sampling in each patient (Figure 3D; Supplemental Table 4).
DISCUSSION
We conducted a systematic review and pooled analysis of sequences from reports of COVID-19 reinfection and persistent infection. Reports of reinfection cases demonstrate a wide range of situations: spanning a broad distribution of ages, baseline health status and reinfection severity compared to the initial infection. Reinfection occurred as early as 1.5 months or >8 months after the initial infection. Common explanations for the presence of reinfection involves either waning SARS-CoV-2 antibodies or the presence of viral escape mutations [30, 31]. While most cases of SARS-CoV-2 reinfection did involve infection with a different clade (including the variants of concern B.1.1.7 and P.1), it is noteworthy that mutations were identified throughout the genomes and the frequency of mutations within the S gene was not elevated relative to the rest of the genome. In addition, individuals with more severe reinfections did not have significantly greater frequency of S gene mutations. Interestingly, the genes with the highest frequency of mutations was ORF8 and N. ORF8 is a rapidly evolving accessory protein that may antagonize host immune function [32] while the nucleocapsid is a vital structural protein that also serves as a target for both humoral and cell-mediated immune responses [33]. Finally, the presence of rare mutations was uncommon in the reinfecting virus, which largely mirrored the contemporaneously circulating variants in the region of infection. However, the reinfecting variants generally contained a substantial number of mutations compared to the initial variant, including frequent changes in the S gene, and additional studies are needed to assess whether these changes may have contributed to the risk of repeat infection.
While the number of immunosuppressed individuals with available sequences remains limited, the results suggest that the rate of viral evolution (measuring both synonymous and non-synonymous changes), is accelerated within immunosuppressed individuals. In addition, treatment with convalescent plasma or monoclonal antibody cocktails was insufficient to fully halt viral evolution and the emergence of viral escape with treatment has been documented [26, 34]. Mutations associated with immune escape and/or more efficient replication kinetics, including E484K, S494P, N501Y and N-terminal spike deletions, have been observed in both immunosuppressed individuals and the novel variants of concern [35, 36]. The results raise the possibility that novel variants, including those harboring escape mutations against current treatments, could arise from immunosuppressed individuals and suggest that immunosuppressed individuals should be a focus of public health efforts. Amongst the current reports of persistent COVID-19, B-cell dysfunction appears to be a common thread, including in reports that were not included in this analysis due to a lack of available full-length sequences [37-41]. It is important to note, though, that T cell function may also play a role in protection against SARS-CoV-2 [42] and a subset of these patients also included concurrent suppression of other aspects of the immune response. Additional studies are needed to fully define the type and intensity of immunosuppression that would place patients at greatest risk of persistent COVID-19.
Two factors generally differentiated between reinfection and persistent infection scenarios: first, reinfections have so far been largely described in immunocompetent individuals while the majority of persistent COVID cases have been in immunosuppressed patients. Secondly, phylogenetic analysis can generally differentiate between reinfection and persistent infection, especially in cases where persistent infection allowed the longitudinal collection of >2 sequences. However, given the slow rate of SARS-CoV-2 evolution and limited viral diversity [43], it can be challenging to differentiate between reinfection and persistent infection, especially in situations with limited sampling and/or duration between samples.
A limitation of this work is that it relies on case reports, which can be influenced by publication bias and limits our statistical power. However, to date, there have been no systematic, large-scale sequence-based studies of COVID-19 reinfection or persistent infections. This is partly due to the rarity of these types of cases and that initial infecting sequences are frequently unavailable for comparison with reinfecting or persistently infecting variants. Overall, our results demonstrate the need to further explore factors that increase the risk of breakthrough reinfections and persistent COVID-19. This line of investigation will have important implications on the durability of current available vaccines and for preventing the rise of novel variants.
Data Availability
NA
Acknowledgements
We thank Jeremy Luban and Ronald Bosch for their feedback and discussion.
Footnotes
Figure 1, 2 & 3 revised; discussion updated.