ABSTRACT
Since the first report of SARS-CoV-2 in December 2019, genetic variants have continued to emerge, complicating strategies for mitigating the disease burden of COVID-19. Positive SARS-CoV-2 nasopharyngeal swabs (n=8,735) were collected from Missouri, USA, from March-October 2020, and viral genomes (n=178) were sequenced. Hospitalization status and length of stay were extracted from medical charts of 1,335 patients and integrated with emerging genetic variants and viral shedding analyses for assessment of clinical impacts. Multiple introductions of SARS-CoV-2 into Missouri, primarily from Australia, Europe, and domestic states, were observed. Four local lineages rapidly emerged and spread across urban and rural regions in Missouri. While the majority of Missouri viruses harbored Spike-D614G mutations, a large number of unreported mutations were identified among Missouri viruses, including seven in the RNA-dependent RNA polymerase complex and Spike protein that were positively selected. A 15.6-fold increase in viral RNA levels in swab samples occurred from March to May and remained elevated. Accounting for other comorbidities, individuals test-positive for COVID-19 with high viral loads were less likely to be hospitalized (odds ratio=0.39, 95% confidence interval [CI]=0.20, 0.77) and had shorter hospital stays (hazard ratio=0.34, p=0.003) than those with low viral loads. Overall, the first eight months of the pandemic in Missouri saw multiple locally acquired mutants emerge and dominate in urban and rural locations. Although we were unable to find associations between specific variants and greater disease severity, Missouri COVID-positive individuals that presented with increased viral shedding had less severe disease by several measures.
INTRODUCTION
Since the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) emerged in late 2019, the virus has undergone genetic evolution. As of January 28, 2021, the United States had detected 25,301,166 cases and 423,519 deaths due to coronavirus disease 2019 (COVID-19)1. Emerging genetic SARS-CoV-2 variants are expected to cause prolongation of the pandemic and continued disease burdens2.
The massive transmission of SARS-CoV-2 and its accumulation of mutations suggests that the virus is continuing to adapt to its human host3,4. Over 198 sites on the viral genome containing recurrent mutations and 80 viral lineages were identified by May5,6. Sites on the SARS-CoV-2 genome are still undergoing positive selection7,8. Variants of the D614G and N501Y strains (both located on the S gene) have generated global concern for increased transmissibility with little evidence of association with disease severity9-14.
Despite the large number of cases, the power to detect SARS-CoV-2 variants in the United States has been insufficient. As of January 16, 2021, Missouri had experienced 452,937 confirmed cases and 6,709 deaths15. In this study, we investigate the emergence and spread of SARS-CoV-2 genetic variants in Central and Southwest Missouri, examine viral shedding over time, and analyze the associations among emerging genetic variants, viral shedding, and disease severity.
METHODS
Ethical Approval
Approved by University of Missouri Institutional Review Board (#2025449).
Sample selection
Data from 8,735 COVID-19 positive nasopharyngeal swabs from March 7-October 31, 2020 were compiled from the Paternity Testing Corporation (PTC) Laboratories in Columbia, Missouri. Samples were collected and processed using the same protocol throughout this study, and consistently tested using the Centers for Disease Control (CDC) 2019 Novel Coronavirus Real-Time Reverse Transcriptase (RT)-PCR Diagnostic Panel16. Positive tests were defined as those with a threshold cycle (Ct) <45, positive nucleocapsid (N)1 and N2, and an amplified anti-human positive control. The first positive test from each individual at CoxHealth (Springfield, Missouri) or University of Missouri Health Care (UMHC; Columbia, Missouri) was used in analyses.
Viral genomes (n=184) were sequenced from swab samples and virus isolates. Available samples (92 samples from 85 patients) collected from UMHC between March 18-May 14, 2020 were sequenced. The second batch (92 samples, 90 patients) was collected between June 28-July 12, 2020 from CoxHealth after viral loads in swabs were observed to be markedly elevated. Of these samples, 136 complete viral sequences were generated with >70% coverage (eTable1). Viruses from remaining samples yielded partial genomes or had inadequate RNA quantity for sequencing.
Chart review
Demographics, comorbidities, and hospitalization information were extracted from electronic medical records for 1,335 patients (eMethods). Ct-values were provided by PTC Laboratories. Age groups were defined in alignment with CDC reports1.
Viral isolation and growth kinetics
SARS-CoV-2 viruses were recovered in Vero E6 (CRL1586™, ATCC) or Vero (NR-10385, BEI Resources) cells a maximum of three times until cytopathic effect was observed. To determine virus replication kinetics, Vero E6 cells were infected at a starting multiplicity of infection (MOI) of 0.01, and supernatants were harvested and titrated using plaque assays (eMethods).
Genetic sequencing and assembly
SARS-CoV-2 whole genome RT-PCR amplification and next-generation sequencing was conducted using the Access Array (AA) microfluidic system (Fluidigm Corporation) and MiSeq system (Illumina)17. Genome assembly was constructed using Qiagen CLC Genomics Workbench 20.0.4. Amino acid variants were identified with a variant probability threshold >80% and minimum two counts, then curated manually (eMethods).
Phylogenetic and phylogeographic analyses
To identify likely seeding viruses for Missouri outbreaks, all 110,901 SARS-CoV-2 complete genomes available on the Global Initiative on Sharing Avian Influenza Data (GISAID) consortium18(September 20, 2020) were downloaded. The 297 genetically closest sequences to Missouri samples were selected using an alignment-free complete composition vector algorithm19-24. Sequence alignments were performed using MUSCLE25, phylogenetic analyses using BEAST226,27, positive selection analyses through PAML28, and sequence conservation visualization with SimPlot29(eMethods).
Lineages were identified by PANGOLIN v2.0.8 (github.com/cov-lineages/pangolin). Unique Missouri sub-lineages were identified when they contained at least five samples with unpublished mutations, posterior probability >0.99, and sequence identity >99%.
Statistical analyses
Kruskal-Wallis tests were used to analyze continuous variables and Fisher exact tests for categorical variables. Logistic regression models were used to assess the effect of viral load (Ct≤20, high; Ct>20, low) on hospitalization or length of hospital stay, controlling for demographics and comorbidities. The effect of viral load on length of hospitalization was tested with a Cox Proportional Hazards survival analysis accounting for censoring due to death. In all analyses, significance was defined at alpha=0.05. Analyses were performed using SAS Studio v3.8 (Cary, Indiana)
RESULTS
SARS-CoV-2 introductions and outbreaks of COVID-19 in Missouri
The first known introduction of SARS-CoV-2 into Missouri was publicly announced on March 7th. Based on state data30, Missouri had an average of 153 (standard deviation [SD]=96) cases per week between March and May increasing to an average of 280 (SD=159) cases in June and climbed with sporadic spikes throughout the study period (Figure 1A). Meanwhile, the weekly case fatality rate peaked at 12% in early May, then progressively decreased and remained below 0.02% through October.
To study whether changes in viral shedding correlates with increased positivity rates, we analyzed the cycle threshold (Ct)-values derived from quantitative RT-PCR (qRT-PCR) on nasopharyngeal swabs for 8,735 patients in Central and Southwestern Missouri between March and October (Figure 1B-D, eFigure 1). Results showed that Ct-values changed over time (P<0.0001). Viral RNA loads from March through May ranged from 8.97 to 42.33 (median=24.85). The proportion of viral load cases above moderate levels (Ct≤20) began rising in May and remained elevated through October.
Characteristics of Study Population
We conducted chart reviews for 1,335 COVID-19 individuals from March-October 2020 (Table 1). The largest age group was 18-29 years (n=562 of 1335, 42.13%). Our dataset consisted of 55.10% female (n=735) and 44.7% male (n=596). Of hospitalized patients, there were 22 males and 22 females. The population studied consisted primarily of Caucasians (n=899, 67.39%) with 15.74% Black or African American patients (n=210). Primary comorbidities were obesity (n=599, 64.90%), diabetes (n=29, 2.17%), and hypertension (n=64, 4.80%). Overall, 44 (3.30%) patients were hospitalized with an average stay of 9.90 (SD=12.67) days. Only three of the hospitalized patients were under age 30. Four patients died.
Rapid evolution of COVID-19 identified through phylogenetic analyses
We collected nasopharyngeal swab samples from 175 patients for sequencing based on sample availability. 136 samples had adequate viral load for complete genome resolution (132 from clinical swabs, four from isolates) with 14 samples having only partial sequences and 38 samples where RNA quality was too low for sequencing.
Phylogenetic analysis showed that Missouri viruses encompassed eight major PANGOLIN lineages, each of which was associated with an independent introduction (Figure 2, eTable 2). The virus evolved after each initial introduction, and formed four unique Missouri sub-lineages, at least one (MO-B.1.1.b) of which circulated from the May through July collection periods. All sub-lineages were supported by posterior probabilities >0.99 and sequence identities >99.99% (Figure 2A). At least five lineages, including the four Missouri sub-lineages, co-circulated in Missouri during the week of July 2. All five lineages originated from lineage B.1 (containing D614G in the spike [S] protein), which has been predominant in Europe, Australia, and multiple states of United States (eFigure 2-3).
Molecular characterization identified mutations across multiple regions of the viral genome (Figure 2B). The previously reported S-D614G and nonstructural protein [NSP]12-P314L mutations appeared in most of the Missouri samples. Compared to their precursor viruses, the Missouri viruses had 126 new mutations (eTable 3). NSP3 contained the most mutations (28 of 126 distinct mutations), followed by S (13), Nucleocapsid (N) (11), and NSP2 (11). The most common mutations include NSP12-C22F (n=18), NSP4-M366I (18), open reading frame [ORF]8-S47F (12), NSP12-A2V (11), and N-V270L (10).
Six unique mutations (Figure 2B, eFigure 2) were detected in four sub-lineages that appear to have emerged and spread in the Southwestern region of the state. MO-B.1.1.a (n=11 sequences) had mutation NSP12-A2V; 10 of these viruses were from Springfield, Missouri collected between July 2-9; MO-B.1.1.b (n=18) contained NSP4-M366I and NSP12-C22F from patients living within 60 miles of Springfield between May 14-July 9, 2020; MO-B.1.c (n=6) with NSP3-N1178T and NSP3-A1179T includes six samples with all but one arising from Monett, an urban center in Southwestern Missouri (50 miles southwest from Springfield), between July 2-6, 2020; MO-B.1.2.d (n=6) with NSP15-P262L were from Springfield or Brighton, Missouri (20 miles north of Springfield) between July 5-7, 2020.
To explore how SARS-CoV-2 viruses were adapting in Missouri, we determined variant sites undergoing selective pressure (Figure 2B-C and eTable 4). Multiple sites along the S protein, the protein mediating host receptor binding and viral entry, and the RNA-dependent RNA polymerase (RdRp) complex (NSP7, NSP8, NSF12, and NSP13) had evidence of positive selection (Figure 3B). Seven of these positively selective sites were unique to Missouri isolates (i.e., D1163Y in Spike, K36Q and T145I in NSP8, and T172I, T431I, K460R, and S468L in NSP13). NSP8 is a cofactor to NSP12 (RdRp) and is necessary for RNA synthesis, NSP12 functions in viral replication and transcription, and NSP13 works with NSP12 in replication and mRNA capping31.
Re-infection with different viruses
Of the selected samples, we collected multiple samples at different time points for four patients and found one case of reinfection (eFigure 4). This patient was a female in her 20s with asthma, obesity, anxiety, and depression, who reported chills, sore throat, dizziness, rhinorrhea, and fever during her initial positive COVID-19 test in March 2020. She was discharged and instructed to self-isolate. After two weeks, her symptoms had waned to encompass only cough and fatigue, and she was tested again due to her return-to-work requirements with another positive result. Interestingly, the two samples were from two distinct SARS-CoV-2 lineages; the first sample belonged to PANGOLIN A.3 lineage, whereas the second belonged to PANGOLIN B.1.1 lineage, suggesting that the patient was reinfected by a different virus within a two-week period. Compared with those in A.3 lineage, the B.1.1. lineage virus had four new mutations ORF1ab-P4715L, S-D614G, N-R203L, and N-G204R.
Increased viral load in clinical samples is correlated with elevated viral replication ability
Our observations of a temporal pattern to viral load and mutations in receptor binding domains of the S protein and RdRp complex led us to hypothesize that differences may have existed in virus replication properties. Eighteen isolates selected as representative sample strains from each viral load category and lineage were recovered from swab samples with Ct-values ranging from 9.09-37.76 (median=16.32) (Figure 3A). We determined growth kinetics for these isolates (Figure 3B) and observed a large diversity in viral growth patterns, especially during the initial 24 hours. Linear regression of viral proliferation (measured by Ct-values) at 24 hours showed that an increase of 1 cycle correlated with a decrease of 8.83 log10 (plaque forming units/mL) (Pearson correlation coefficient=-0.59, p=0.01) (Figure 3C). Taken together, the growth kinetics analyses revealed that strains with higher viral loads in clinical samples proliferated more efficiently than those with lower viral loads.
High viral load is associated with decreased hospitalizations and shorter hospital lengths of stay
To understand potential clinical implications of the identified mutations, we analyzed associations among genetic variants in the Missouri lineages with disease severity (hospitalization and length of hospitalization) and viral loads in clinical samples. We found no significant associations between the Missouri viruses and viral loads (Chi-Square=3.35, p-value [p]=0.76) (Figure 4A) or hospitalizations (Chi-Square=0.33, p=1.00).
We next examined potential correlations between viral load in clinical swabs and outcomes. A univariable analysis was performed to test the effect of viral load, demographics, comorbidities, and lineages on the risk for hospitalization (Figure 4D). High viral load (Adjusted log odds ratio [ALOR]=-0.34, p=0.025), age >65 years (ALOR=0.79, p<0.0001), obesity (ALOR=0.65, p=0.001), heart disease (ALOR=1.31, p=0.005), diabetes (ALOR=1.56, p<0.0001), and hypertension (ALOR=0.52, p=0.02) were independently associated with hospitalization. Using a multivariable logistic regression model, we found that patients of older age (odds ratio [OR]=1.06, p<0.0001), earlier month of diagnosis (OR=0.69, p=0.0002), higher body mass index (BMI) (OR=1.037, p=0.04), diabetes (OR=0.06, p=0.001), and low viral load (OR=0.39, p=0.03) were more likely to be hospitalized (eTable 5, Figure 4C).
Importantly, patients with high viral loads at sampling had fewer hospitalizations (OR=0.39, p=0.01). Of hospitalized patients, those with high viral load were discharged sooner (hazard ratio = 2.9, p=0.03) (Figure 4B) compared to patients with low viral loads. There was no difference in time from symptom onset to COVID-19 test between low and high viral load groups (5.38±10.48 days and 4.25±4.43 days, respectively) or in time from symptom onset to hospital admission (10.03±10.48 days and 8.75±4.09 days, respectively).
DISCUSSION
Studies assessing SARS-CoV-2 viral load as a marker for COVID-19 disease severity have been inconclusive. Early studies found that viral loads were correlated with age, disease stage, severity, progression, and mortality32-37. Most of these studies observed patients during the early stages of the pandemic and often investigated already hospitalized patients. Our study expanded these observations to thousands of patients over an eight-month period, March-October 2020, which was powered to detect that the average viral load increased over the study period. Furthermore, patients with high viral loads were less likely to become hospitalized than patients with low viral loads even after adjusting for month of diagnosis, age, obesity, and diabetes. Thus, although advances in treatment have improved patient outcomes, this was unlikely the only cause for reduced hospitalizations. Our results did suggest that heart disease and hypertension were confounding variables, not associated with hospitalization after accounting for other variables in the model. Additionally, patients with high viral loads were more likely to become discharged sooner than those with low viral loads. Because viral load in nasopharyngeal swabs typically declines after the first week of infection38,39, one confounding factor may have been delayed testing, especially at the beginning of the pandemic from limited access to testing. We did not, however, find differences in time from symptom onset to initial COVID-19 swab between the high and low viral load groups among hospitalized patients.
The clinical findings and demographics from our study are reflective of national data, lending confidence towards the generalizability of our patient population1. Exceptions include the distribution of cases within the 18-29-year age category where we noted 50% higher proportional incidence than CDC data; correspondingly, our >50 age categories were slightly less than national data. This reflects the catchment area of our study which includes the University of Missouri with a high proportion of college-aged students. Additionally, our population had slightly lower Asian, higher Black or African American and White, and over 50% lower Hispanic or Latino individuals than national data, reflecting the overall racial and ethnic distribution in Missouri. We also found that other risk factors for hospitalization included older age, elevated BMI, and diabetes mellitus, consistent with published studies40,41.
Genomic and phylogenetic evidence revealed multiple viral introductions into and across the state and numerous mutations during the first four months of the pandemic. We identified novel Missouri variants and lineages among both urban and rural communities. Four unique, well-supported sub-lineages were identified in Missouri, of which MO-B.1.1.b appeared to have persisted for at least two months. All four lineages emerged during the July sampling period in Southwestern Missouri and appeared more likely to arise in the urban Springfield area, then spread to neighboring urban and rural communities. Further studies are needed to evaluate whether any of these remained locally prominent.
Over the past year, variants containing the D614G and N501Y mutations dominated global outbreaks. D614G (B.1 lineage), first identified in January 2020, became predominant worldwide by June 202042. The N501Y strain (B.1.1.7 lineage) was detected first in the United Kingdom10and then appeared in other countries9including the United States11. D614G mutations enhanced viral replication and transmission but not pathogenesis in laboratory settings43,44. However, disease transmission and severity of these novel variants in humans remain unclear45. Prior studies suggested that D614G is associated with higher viral load14,46, but the mutation was already predominant in both high and low viral load Missouri strains. In our findings, viral load increased over time (Figure 1). We explored whether a particular genetic variant was associated with the increased viral loads and were unable to find a clear association. Thus, we speculate that throughout the pandemic, all emerging variants of the virus were adapting to the human populations with greater viral replication efficiency. Of interest, multiple sites, especially at the Spike protein and RdRp complex, across multiple Missouri sub-lineages were under positive selection (Figure 3). Selection at the RdRp complex may affect viral replication and transcription47,48, while selection at the S protein may affect host receptor binding and viral entry47. Further examination of mutations from this and other studies will elucidate the phenotypic effects of these mutations.
Increasing case studies of reinfection and studies involving waning neutralizing antibody (Nab) titers raise concerns for herd immunity and long-lasting efficacy of vaccines49-51. CDC criteria for SARS-CoV-2 reinfection include persons with paired respiratory specimens at least 90 days apart and symptomatic persons 45-89 days after initial illness with respective respiratory specimens showing differing lineages52. Recent studies show Nab titers to SARS-CoV-2 decline as early as 23 days following initial infection53. In this study, a young female patient was identified with two genetically distinct SARS-CoV-2 strains within two weeks, indicating that re-infection can occur within a much shorter period than expected.
There are several limitations to this study. Analyses with viral loads are limited by variability in nasopharyngeal swabbing techniques, which may cause inconsistencies in Ct-values, although the same sampling and processing protocol was used throughout this study. Additionally, during the initial phases of the pandemic, testing was generally limited to patients with more severe symptoms, potentially skewing viral load findings. Despite these limitations, we analyzed a large, representative sample, and adjusted for these confounders.
In summary, multiple novel lineages were identified, and locally acquired mutations, present at both the urban and rural levels, remained predominant in the community. Although we were unable to find associations between specific variants and greater disease severity, Missouri COVID-positive individuals that presented with increased viral shedding had less severe disease by several measures. Continued monitoring of the impacts of these novel variants, particularly of those in the regions of vaccine targets, will be essential to the management of this pandemic.
Data Availability
Genome data may be accessed using GenBank Accession Numbers: MW004168, MW521383-MW521516, MW525282.
Funding Support
Cynthia Tang is supported by NIH 5T32LM012410. Global Emerging Infections Surveillance Branch of the Armed Forces Health Surveillance Division (ProMIS ID P0140_20_WR_01.Global).
Data Sharing Statement
Genome data may be accessed using GenBank Accession Numbers: MW004168, MW521383-MW521516, MW525282.
Acknowledgments
Author contributions: XFW conceived this study, XFW, JAM, and CYT designed the analysis, CYT, YW, KS, TL, TH, and CS collected the data, DRS, RH, DR, CS, RT, GML, JH, and XFW contributed data or analysis tools, CYT, YW, CG, DRS, and XFW performed the analysis, CYT and XFW wrote the paper, RW, DRS, JAM, TH, CYT, and XFW revised the paper. XFW and CYT had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
The authors would like to thank Paige Beauparlant (UMHC), CoxHealth research coordinators, and Eren Ufuktepe and Katie Wilkinson from the UMHC Cerner Team for help with chart review, Simone Camp (UMHC) and Michelle Beckwith (PTC Labs) for help with sample collection and serving as the third-party honest brokers, and Varun Goel and Michael Emch (University of North Carolina) for feedback on geographical visualization methods. Additionally, we would like to thank the administrators and curators of the GISAID database, and research groups across the globe for supporting the rapid and transparent sharing of genomic data during the COVID-19 pandemic. The opinions expressed are the private views of the authors and are not to be conveyed as official or signifying the views of the Department of the Army or the Department of Defense. Authors have no conflicts of interest to report.