Abstract
The viral kinetics of documented SARS-CoV-2 infections exhibit a high degree of inter-individual variability. We identified six distinct viral shedding patterns, which differed according to peak viral load, duration, expansion rate and clearance rate, by clustering data from 768 infections in the National Basketball Association cohort. Omicron variant infections in previously vaccinated individuals generally led to lower cumulative shedding levels of SARS-CoV-2 than other scenarios. We then developed a mechanistic mathematical model that recapitulated 1510 observed viral trajectories, including viral rebound and cases of reinfection. Lower peak viral loads were explained by a more rapid and sustained transition of susceptible cells to a refractory state during infection, as well as an earlier and more potent late, cytolytic immune response. Our results suggest that viral elimination occurs more rapidly during omicron infection, following vaccination, and following re-infection due to enhanced innate and acquired immune responses. Because viral load has been linked with COVID-19 severity and transmission risk, our model provides a framework for understanding the wide range of observed SARS-CoV-2 infection outcomes.
Introduction
COVID-19 public health emergency status has lapsed in the United States, but community levels of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) remain significant (https://covid.cdc.gov/covid-data-tracker/#datatracker-home). SARS-CoV-2 immunity in the population is now highly heterogeneous due to varying degrees of prior infection and vaccination1. Also, successive circulating SARS-CoV-2 variants of concern (VOC) with different immune evasion and infectivity properties continue to emerge. This has resulted in a wider variability of viral shedding patterns than those observed during infection with the ancestral strain in the early months of 20202, 3. Understanding the heterogeneous upper respiratory tract (URT) kinetics of SARS-CoV-2 enables informed design of health interventions such as testing, isolation, quarantine, and drug therapies.
Mathematical models are a vital tool for understanding mechanisms underlying observed patterns of viral expansion and clearance4–9. To date, studies fitting SARS-CoV-2 dynamic models to viral load trajectories have estimated the timing of innate and acquired immune responses and predicted transmission parameters, including super-spreader events10–22. These models facilitated estimates of key quantities such as expected duration of the infectious period and the timing of peak viral load relative to symptom onset20, 23–25. They also provided a theoretical means for testing treatment regimens and predicted that treatment within 5 days of symptom onset would likely be associated with higher efficacy11, 22, 24, 26, 27, an outcome that has since been verified in multiple clinical trials28–30. These models were also the first to suggest that viral rebound may occur in the context of early antiviral treatments11.
However, early modeling studies only considered data from a small number of infected individuals11, 20, 22–27, 31–35, and often drew either entirely from previously uninfected and/or unvaccinated cohorts13. Another consistent limitation was that most available data did not capture early timepoints during the pre-symptomatic phase of infection. Model results are therefore, not easily generalized to current SARS-CoV-2 conditions.
The National Basketball Association’s (NBA) daily testing program occurred regardless of symptoms and identified 2,875 infections between June 2020 and January 2022, spanning the alpha, delta, and early omicron VOC waves, as well as the roll-out of vaccines and boosters. Hay et al. used a statistical approach to quantify the impact of immune history and variant on SARS-CoV-2 viral kinetics and infection rebound in this data set36. However, a more mechanistic modeling approach is required to understand observed kinetic variability in this cohort.
Here, we identify six distinct shedding patterns in the NBA cohort data. We then compare how well candidate models which extend the classical target-cell limited model previously published by Goyal et al.11, 22 and Ke at al.20, 23 recapitulate the longitudinal upper respiratory tract (URT) viral load data from 1510 sufficiently documented infections. After obtaining data-validated parameter estimates for each individual infection, we identify the factors underlying differing rates of viral expansion and clearance, peak viral loads, and duration of infection observed in the data. We use the model to identify differences between the timing and intensity of the immune response during initial and re-infections and identify a potential explanation for viral rebound observed in the cohort.
Results
Viral shedding kinetics according to SARS-CoV-2 VOC
We first analyzed viral kinetics observed in the cohort according to VOC. For pre-VOC, alpha, delta, and omicron variants, we observed variable kinetics among cohort participants. Median values differed between variants, with omicron variant having slightly lower peak viral loads and earlier clearance, while delta had the highest peak viral loads and pre-VOC had the longest time to clearance (Fig 1a-d). A high proportion of the infections caused by omicron variants occurred in participants who had received either two or three vaccine doses, whereas pre-delta infections mostly occurred in unvaccinated individuals (Fig 1e).
The age structure of the NBA cohort differs significantly from the general population. Of the cases documented, 46% occurred in individuals under the age of 30, 42% occurred in individuals between the ages of 30 and 50, and only 12% occurred in individuals over the age of 50 (Fig 1f). Symptom status was noted for 59% of infections, of which 71% were symptomatic (Fig 1f). The level of post-vaccination, pre-infection SARS-CoV-2 IgG was measured in 60% of infections. When stratifying patients into tertiles, Hay et al. identified low antibody titers as less than 125 (arbitrary units [AU]/ml), mid-range titers as greater than 125 AU but less than 250 AU, and high titers as greater than 250 AU with the most infections occurring in the highest tertile (Fig 1f). 17% of observed infections were reinfections of individuals followed longitudinally (Fig 1f).
Six distinct SARS-CoV-2 shedding patterns
We identified a subset of infections in the NBA cohort as “well-documented” if they had at least 4 quantitative positive viral load measurements starting within 5 days of detection, and if infection was documented for 3 weeks, or viral elimination was confirmed with 2 sequential negative test results. This reduced the data set to 810 well-documented infections. To eliminate intra-individual variability from this data set, we retained one infection from individuals with multiple documented infections further narrowing our focus to 768 infections. We then applied k-means clustering to the viral load data, clustering infections into 6 distinct viral shedding patterns (Fig. 2a-c) which differed according to time to viral elimination (Fig. 2c,d), area under the viral curve (Fig. 2c,e), peak viral load (Fig. 2c,f) and time to peak (Fig. 2c,g).
The first group had low peak viral loads and early median time to clearance (Fig. 2a-g). The second group had a slightly earlier and significantly higher peak than group 1, but similarly short duration (Fig 2a-g). The third group had a similar peak viral load compared to group 2, but with a longer time to peak viral load and later clearance (Fig 2a-g). The fourth group had the fastest expansion rate, reaching a high, early peak viral load, but maintaining similar median time to clearance as group 3 (Fig 2a-g). The fifth group had the slowest expansion rate, taking the longest time to reach the second lowest peak viral load and had the longest median time to clearance among the groups (Fig 2a-g). In contrast with the prolonged low-level shedding of group 5, the sixth group had high, somewhat early peak and a long shedding duration (Fig 2a-g).
The proportion of cases that fell into each dynamic group varied when we stratified by characteristics included in the data set. The dynamic groups with highest AUC, groups 5 and 6, made up 39% of the infections in the 50 plus age group, whereas 21% of infections in the under 30 group were in the high AUC groups (Fig. 2h). Among confirmed asymptomatic infections, 29% of cases fell into group 1, defined by low peak and early time to clearance, relative to only 14% of confirmed symptomatic cases; the slowly expanding group 3 cases were also less likely to be symptomatic while high, early peak group 4 cases were more often symptomatic (Fig. 2i). High AUC shedding patterns were also more prevalent among infections with SARS-CoV-2 variants from earlier in the pandemic, making up 62% of pre-VOC infections, 27% of delta infections, and only 8% of omicron infections (Fig. 2j). Amongst unvaccinated individuals, high AUC infection patterns were much more frequent—63% of infections in unvaccinated individuals fell into groups 5 and 6, compared with 11% and 9% of infections for those whose most recent SARS-CoV-2 vaccine was their second dose or booster respectively (Fig. 2k).
Mathematical model fit to viral loads from 1510 SARS-CoV-2 infections
To identify factors underlying the varied viral shedding patterns in the NBA cohort, we developed competing mechanistic mathematical models of viral and immune dynamics and selected the best model according to data-fitting criteria. The most complex model tested adapts previously published ordinary differential equations models for within-host SARS-CoV-2 infections by combining elements introduced by Goyal et al.11, 22 and Ke et al.20, 23. For this model we made mechanistic assumptions inherent to many pre-existing viral dynamic models including a viral load dependent infectivity, viral production by infected cells, a limited number of susceptible cells, and a pre-production eclipse phase for infected cells. The possible immune mechanisms included in the model were conversion of susceptible cells to an infection-refractory state dependent on the number of infected cells (presumably representing innate responses to infection), density-dependent death of infected cells as a proxy for an intensifying cytolytic innate response to a higher burden of infection, and a delayed cytolytic acquired immune response (Fig. 3a; Materials and Methods).
We used a nonlinear, mixed-effects framework to estimate model parameters for the 1510 infections documented in 1442 individuals in the NBA cohort that had at least 4 quantitative viral load measurements (Materials and Methods). We first used a representative subsample of these infections to compare model fits for the full model, illustrated in Fig. 3a and written out in equation (1), and reasonable simplifications, in which one or more immune mechanism was removed (Materials and Methods, Table S1). Under model selection criteria that balance simplicity with accuracy, the best model to explain the NBA data was the full model except for the density-dependent death of infected cells. This model has been previously studied by Ke et al.20 We then refit the best model to all infections. It is possible that models outside of the collection tested here could describe the data better; however, the fits that we achieve with this model were highly accurate for most members of the cohort from all 6 shedding clusters (Fig. 3b, Fig. S6).
Differences in timing and intensity of immune response as an explanation for heterogeneous shedding patterns
We next sought to explore possible virologic and immunologic explanations for different observed viral shedding patterns. For relevant model quantities, we calculated the mean within each dynamic group at each time point and a 95% confidence interval, assuming normally distributed values. Mean viral loads projected by the model for each group (Fig. 4a) resembled those from the actual data (Fig 2c). Quantitative kinetic features extracted from model simulation output including peak viral load (Fig S1a), time to peak (Fig S1b), viral area under the curve (Fig S1c), and shedding kinetic group also agreed well with those extracted from the cohort data (Fig S1d-g). Projections for susceptible cells and infected cells suggest dynamics which track closely to viral load that differ accordingly among shedding subgroups (Fig. 4b,c).
To delineate mechanistic drivers of shedding variability, we calculated the Pearson correlation coefficient between individual estimates for each model parameter and 4 viral kinetic quantities predicted by the mathematical model: log of peak viral load, time to peak viral load, shedding duration, and area under the log viral load curve (Fig. S2a-d). Peak viral load correlated strongly with viral production rate, π, and had a strong inverse correlation with the rate of conversion of susceptible cells to a refractory state, ϕ (Fig. S2a). A linear model mapping log(π/ϕ) to log peak viral load predicted by the model explained a large amount of variability (Fig. S2e). The timing of peak viral load inversely correlated strongly with π, β, and ϕ (Fig. S2b). We fit an exponential model for time to peak viral load relative to infection as a function of log10(βπ), which again explained a large amount of observed variability, R2 = 0.76 (Fig. S2f).
Shedding duration correlated most strongly with the time of onset of acquired immunity in the model, τ (Fig. S2c). Overall, the value of τ did not predict the time of clearance very well. This is because for a significant number of individuals particularly in groups 1-4, acquired immunity was established after the virus was already cleared (Fig S2g). In groups 5 and 6, timing of acquired immunity onset was more predictive of shedding duration (R2 = 0.61) because acquired immunity was usually established before the virus was cleared. Numerous model parameters influenced viral AUC though τ and ϕ were most important (Fig S2d).
The viral shedding pattern for group 1 was notable for a low peak and early clearance of infection (Fig S2h). These mild virologic outcomes occurred due to rapid generation of refractory cells. Early onset of acquired immune pressure was only occasionally necessary for viral clearance (Fig. 4d-f, Fig S2j). The higher viral peak in group 2 was driven by relatively higher values of viral production and viral infectivity and low conversion to a refractory state (Fig. S2j-l). In group 2 infections, innate and acquired immunity both play a role in the clearance of infected cells (Fig. 4e). Infections classified as group 3 were distinguished by a slower upslope, resulting from low average values of both viral production and infectivity (Fig. S2k,l). Group 4 infections had a rapid, high peak viral load due to high viral production and infectivity, as well as a relatively low average values for ϕ, the conversion to refractory state. Higher values of viral production and viral infectivity and low conversion to a refractory state mean that these infections rapidly burn through susceptible cells and target cell limitation slows the infection (Fig. 4b, Fig.S2 j-l). Only when the viral load was already decreasing did acquired immunity typically initiate to help clear the infection (Fig. 4e). Infections in group 5 had a late, low peak and a long duration. Similar to group 3, the late peak was due to low rates of viral production and low infectivity (Fig. S2k-l). However, unlike group 3, these individuals also had low values of infected cell clearance, δ, and a very late onset of acquired immunity τ, allowing infection to persist (Fig. S2n-o). Finally, group 6 consisted of long infections with a high peak viral load. These infections were distinguished by high viral production and infectivity (Fig. S2k-l), and globally weak immune responses including refractory cell conversion (Fig. S2j) and time-independent infected cell clearance rates (Fig. S2n). Thus, late-acting acquired immunity was often required to clear the infection (Fig. 4e and Fig. S2o,q), Overall, these results suggest that a complex interplay of viral and immune features dictate how individual infections differ according to peak viral load, viral expansion rate, viral clearance rate, and duration of shedding.
We next examined correlations between estimated model parameters and found several significant patterns (Fig S3). Viral infectivity, β, had a strong positive correlation with viral production rate, π. The viral production rate also had a positive correlation with the intensity of time-independent cytolytic immune pressure, δ. There was a strong negative correlation between the rate of reversion from refractory back to susceptible cells, ρ, and time of onset of late cytolytic immune pressure, τ. These results may suggest that viral fitness properties are related or that the durability of early innate responses is inversely correlated with the onset of delayed acquired immunity, but we cannot disentangle true biological correlations from potential identifiability challenges in the model structure.
Lower peak viral load and earlier clearance during re-infection with omicron due to more effective early immune responses and more rapid late responses
The NBA cohort data set documented initial infection and reinfection in 67 individuals (Fig. 5a, S4). Of the first infections, 52 were caused by a pre-delta variant and 15 by delta. For all individuals, the second infection was caused by an omicron variant. The mean peak viral load documented in the URT for a re-infection was 0.5 log10 viral RNA copies/ml lower than the mean for first infections. Though there was a slight negative correlation between peak viral load during the first infection and that of the second infection (Fig. 5b), the relationship was not statistically significant (rpearson = − 0.18, p = 0.15). The median time to clearance for reinfections was 7.5 days after detection compared with 11 days after detection for first infections (Fig. 5c). Using a slightly different data from the NBA cohort, Kissler et al. found evidence that an individual’s relative clearance speed is roughly preserved across infections37, prompting us to investigate whether this relationship, or others, appear in our model fits. We tested whether relative viral peak, time to peak, infection duration, or area under the curve were conserved by looking at the pearson correlation and did not find any significant relationships (p<0.05). We also checked whether estimated parameter values were conserved across sequential infections in the same individual, and again found no significant correlations across infections in the same individual.
Two model parameter values were significantly different between first versus second infections (Fig. 5d). Reinfections had higher values of ϕ, indicating a faster conversion of susceptible cells to a refractory state. This more potent early immune responses contributed to lower peak viral loads. The timing of the acquired immune response, τ, was also earlier, during reinfection suggesting more rapid activation of immune memory. We found that parameter values estimated during first infection were not predictive of parameter values estimated for a sequential infection in the same individual. Mean model projections recapitulated viral load patterns observed in the data (Fig 5e). We plotted cell populations from the model simulations for the two groups, and reinfection appeared to result in more refractory cells (Fig 5f) and a smaller decrease in susceptible cells (Fig 5g). The acquired immune response initiated sooner and at a higher magnitude during re-infection (Fig 5h).
Waning early immune response and strong initial clearance of infected cells as a cause of off-therapy viral rebound
Recent studies have shown that viral rebound during the natural course of untreated SARS-CoV-2 infection is relatively common, occurring in over 10% of cases by some estimates38 (https://www.fda.gov/media/166197/download), though rates vary according to definition. In their analysis of the NBA cohort, Hay et al. flagged 40 out of 1334 cases (3%) as rebound, defined by a non-monotonic sequence of test results36. As their most inclusive definition of rebound, they identified cases that achieved an initial clearance of at least 2 days with cycle threshold greater than or equal to 30, followed by at least 2 days with cycle threshold < 30.
We defined simulated infections as rebound if there were 2 or more peaks with height > 3 log10 RNA copies/ml and prominence > 0.5 log10 RNA copies/ml. We defined prominence as the height above the preceding local minima, as illustrated in (Fig. 6a). With these criteria, we identified 7.0% of the 1510 cases as rebound. These cases are marked with an “R” and included first in Fig S6. Note that we were unable to connect viral rebound to recrudescence of COVID-19 symptoms because we do not have daily reports of symptom status.
We observed several differences between rebound and non-rebound infection. In cases of rebound, susceptible cells were lost more rapidly initially but also replenished earlier from the refractory compartment (Fig. 6b). On average, rebound trajectories had both an earlier peak in infected cells and an earlier, higher peak in refractory cells (Fig. 6c-d). The large number of refractory cells drives a more rapid replenishment of susceptible cells. The persistence of infected cells with re-emrgence of susceptible cells allowed for a second surge of viral production, which had been reduced by fast early clearance of infected cells (Fig 6e). The delayed onset of the late acquired immune response also allowed sufficient time for this to occur before the infection was ultimately cleared (Fig 6e-f). Cases with rebound had higher viral production rates, π, and higher viral infectivity, β, which combined to allow for growth of the viral population even with a reduced number of target cells. Rebound cases also had a higher clearance rate, δ, and a faster conversion of susceptible cells to a refractory state. Together, these forces preserve susceptible cells through the rapid initial clearance of infected cells and protection in a refractory state. Notably, rebound cases did not have a significantly higher reversion rate, ρ, to account for replenishment of susceptible cells after the first viral peak. Rather, the faster replenishment of susceptible cells occurs due to the high number of refractory cells. The viral rebound group also had a delayed onset of late immune killing, τ, which allowed time for two peaks to occur before pressure from the acquired immune system cleared the infection (Fig 6g).
Discussion
Viral kinetics are vital to understanding the pathogenesis of infection and, ultimately, to optimizing therapies. Here, we use a remarkable cohort from the NBA, which is unique both for its size and because it captures early pre-symptomatic timepoints during infection, to describe the increasing variability in viral load patterns observed in SARS-CoV-2 infected people. We observe that with a general increase in population level immunity due to prior infection and vaccination, peak viral load is often lower and earlier with more rapid elimination of virus.
Our mathematical model identifies testable mechanistic hypotheses for these observed differences. We first predict that low peak viral loads are associated with lower viral production within infected cells and lower viral infectivity. Moreover, for viral loads that also peak early (observed in group 1), the model predicts a rapid conversion of susceptible cells to a refractory state. Both effects are compatible with data observed in animal models and in vitro models describing effects of interferon which limit the extent of viral replication and protect uninfected cells from viral entry39–43. Appropriate follow up experiments to validate this prediction would include local sampling of nasal cytokines and other mediators of local immunity during critical early timepoints of infection as has been done in humans for other respiratory viral infections44.
Different viral shedding patterns are also driven by varying balances between the magnitude of the early cytolytic immune response, which wanes as the number of infected cells and viral load declines following peak, and the late sustained immune response. In our model, we assumed this response does not dissipate with decrease in virus, so we hypothesize that most of the late response is acquired and due to either expanding T cell or antibody levels. Prior work suggested that during primary infection, plasma SARS-CoV-2 IgG levels rise too late to explain reduction in viral load45. However, this study was performed in an immunologically naïve cohort and needs to be reassessed in the current infection environment46, 47. T cell mediated killing of infected cells may also assist in elimination of infected cells during infection46, 48. The results from our study suggest that at the time of the NBA cohort, substantial differences in timing and intensity of acquired immune responses were still present and contributed to variability in viral kinetics.
Our results suggest that upon reinfection, the early/innate response and the late acquired response are both more effective. The mechanisms underlying this observation are unclear. One possibility is that a higher density of tissue resident NK cells, B cells and T cells may exist after first infection and vaccination. In other viral infections, it has been observed that an increase in pre-infection tissue resident T cells predicts earlier initiation of a local innate and acquired response due to early antigen recognition49, 50. These model predictions merit experimental follow up.
Unfortunately, we are not able to link the heterogenous virologic patterns observed in the NBA cohort with severity of symptoms or future development of post-acute sequelae of SARS-CoV-2 infection as this data was not available. For multiple other viruses, viral loads have been identified as relevant correlates of disease51–54, and late SARS-CoV-2 viral loads have been linked with severity of infection among hospitalized people55, 56. During clinical trials, reductions in nasal viral load due to monoclonal antibodies, nirmatrelvir / ritonavir, and molnupiravir correlated with very large reductions in the incidence of hospitalization and death28, 29. Yet, early remdesivir which had a large clinical benefit was associated with no viral reduction in nasal passage several days after treatment30, highlighting that key viral load surrogates may be in the lung rather than nasal passages or that early viral loads are more predictive of outcome22. Because early and peak viral load measurements are so rarely obtained during COVID-19 infection, the clinical importance of these values remain unknown.
Several further limitations of this work are important to highlight. An issue that is universal to the field is that our model does not capture anatomic compartmentalization of viral shedding. Our previous model demonstrated in non-human primates that SARS-CoV-2 kinetics in the lung differ in subtle but important ways from those in the upper airways, and that these differences are particularly significant in the context of antiviral therapy22. It is likely that our subgroups of shedding may cluster differently if we had access to serial whole lung viral loads. The re-seeding of infection in the nose from the lungs or vice versa may also provide alternative explanations for the dynamics observed in this data set, particularly viral rebound. Unfortunately, such detailed studies are not available in any human cohort. Studies using saliva do suggest slightly different kinetics than those from nasal swabs57, but it is doubtful that saliva captures total viral load in the lung.
Another issue shared by all mathematical models in the field is the lack of sufficiently granular, tissue-based immune data to precisely model the innate and acquired immune responses. Rather, our model uses several terms to capture the timing and intensity of what is likely to be a complex, multi-component response. As with multiple other respiratory virus models and based on experimental data showing that interferon-alpha protects cells from infection, we assume that infection temporarily makes susceptible cells refractory to viral entry20–22, 40, 43. Finally, we assume a late, sustained immune response that varies by intensity and timing, compatible with an acquired memory response.
A final limitation shared by all intra-host SARS-CoV-2 models in humans is that we are not able to measure potentially important initial conditions of infection, including viral inoculum and the number of immune cells within a relevant spatial microenvironment of infection. Thus, the model may over ascribe observed differences in observed viral load trajectories to differences in immune responses rather than exposure viral load.
In summary, we identify distinct shedding patterns in adults with SARS-CoV-2 infection, with shorter, lower viral load infection more commonly observed in persons with omicron infection, prior vaccination, and recent prior infection. The mechanistic predictors of rapidly contained infection are more rapid conversion of susceptible cells to a refractory state along with more rapid and intense late cytolytic immune responses.
Materials and Methods
Study Overview
We analyzed SARS-CoV-2 viral load data collected during untreated infections in the NBA cohort. We clustered this data into 6 dynamic groups, which were statistically different in terms of peak viral load, time to peak viral load, area under the viral load curve, and time to clearance. Drawing on previous models in the field, we developed a set of candidate ordinary differential equation (ODE) mathematical models. We then used model selection theory to determine which version the data supported most strongly. With a validated model of SARS-CoV-2 infection, we examined which parameter values differ to explain the variable viral shedding patterns observed in the six dynamic groups. We also used this approach to explain the differing dynamics of first and second infections captured in the NBA cohort, and to explain the mechanisms underlying viral rebound.
Data Pre-processing
We used data from the NBA cohort previously published by Hay et al.36. The group documented 2875 individual SARS-CoV-2 infections in 2678 people through frequent quantitative PCR testing. First, we filtered this data to include only infections with at least 4 positive quantitative samples to provide adequate viral dynamics data for model fitting. This yielded 1510 infections in 1442 individuals, of which 177 were caused by a pre-VOC variant, 46 by alpha, 163 by delta, and 1124 by omicron (Fig. 1a). We further identified a “well-documented” subset of these infections by filtering for infections that had their first quantitative test within 5 days of detection and included test results through 20 days after detection or confirmed elimination of virus prior to day 20 (two consecutive negative tests). This well-documented group consisted of 810 individual infections in 768 people. For clustering, we randomly chose one infection to retain from each individual with multiple documented infections, resulting in a group of 768 infections in 768 individuals. We also filtered the well-documented group for infections with a negative test result within 2 days prior to detection, yielding 266 cases with both early detection and 3 weeks of documentation. We refer to this subset as “fully documented.”
Quantitative Features of Viral Dynamics
To convert cycle threshold (Ct) values to viral genome equivalents, we averaged Ct1 and Ct2 for each individual and applied equation S2 from Kissler et al.58 That is, where the concentration of viral RNA is in copies/ml.
We calculated the peak viral load for a given infection as the maximum measured log10 viral load over all quantitative data points and the time to peak viral load was the day of this measurement. We calculated the area under the log10 viral load curve from the date of detection through the last quantitative measure of viral load, linearly imputing missing values between data points. Note that this quantity is an underestimate for individuals without confirmed clearance. We calculated the median time to clearance by identifying when the cumulative incidence curve for clearance of the virus crossed 50%. The cumulative incidence curve is the inverse of the Kaplan-Meier curve for survival of the virus. The Kaplan-Meier curve, KM, and confidence interval was computed using the Python package scikit-survival 0.21.0 (https://scikit-survival.readthedocs.io/en/stable/). The cumulative incidence curve is then 1-KM.
Data Clustering
We clustered well-documented infections into 6 dynamic groups using k-means clustering as implemented in the Python package scikit-learn 1.2.2 (https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html). As input features, we used these 21 daily test results. These came from the day infection was detected through 20 days after detection. If any daily measurements were missing between recorded test values, we imputed the missing measurements linearly. If the last test date for an individual was prior to day 20, so there were missing daily measurements after the last test, we appended negative test values to reach 20 days (Fig S5a). This occurred only for infections for which clearance was confirmed with 2 consecutive negative tests, since we clustered well-documented infections.
To select these hyperparameters for the k-means clustering, we tested values of k from 2 to 20 for three possible interpolation methods, linear, quadratic, or cubic spline, and two possible surveillance periods, 13 or 20 days post detection (2 or 3 weeks surveillance). Comparing these scenarios, linear interpolation up to 20 days post detection had the lowest within cluster sum of squares (Fig. S5b). Based on the location of the “elbow” in the plots, we chose to proceed with k = 6 clusters. Using k < 6 results in less distinctive behaviors between the groups, while using more clusters resulted in some non-interpretable cluster centers (Fig. S5c-g).
Mathematical Model of SARS-CoV-2 Dynamics
We considered several ordinary differential equations models for SARS-CoV-2 infection dynamics. The full model tracks the number of target cells that are susceptible to infection (S), target cells that are refractory to infection (R), infected cells in an eclipse phase (IE), infected cells actively producing virus (IP), and SARS-CoV-2 virions (V). Susceptible cells are infected at rate βSV, and become refractory at rate ϕIpS. Refractory cells revert to a susceptible state at rate ρR. When cells are first infected, they enter an eclipse phase, from which they transition to a state of producing virus at rate k. Productively infected cells are cleared at rate , where the dependence on infected cells reflects an innate immune response with no memory. When the duration of infection surpasses time τ, the clearance rate of infected cells increases by mIP, capturing the delayed onset of a cytolytic acquired immune response with memory. Productively infected cells produce virus at rate π, and free virions are cleared at rate γV. Under these assumptions, the model has the form:
To ensure that the model did not predict spurious oscillations in viral dynamics, we enforced that viral production was zero when IP was less than 1. In the optimal model, the parameter h = 0 for all individuals, so the early per-cell clearance rate of infected cells is not density dependent.
As initial conditions, we set (S0, R0, IE0, IP0, V0) = (1 × 107, 0, 0, 0, V0). Previous models of SARS-CoV-2 infection in the nasal compartment have used an initial value of 107 − 108 susceptible cells, based on estimates that 2-20% of epithelial cells in the upper respiratory tract display the ACE 2 receptor59–61. We assumed that the initial number of refractory cells is zero, because the early immune response is inactive prior to infection. We initiated simulations with zero infected cells, so IE0= IP0 = 0, and a small viral inoculum to reflect the tight bottleneck that transmission places on viral replication. The number of virions present at the outset of infection was assumed to be below the limit of detection, but the precise inoculum was initially allowed to vary for individuals. During model fitting, we estimated the onset of infection relative to detection (date of first positive test), noting this difference as t0. Among individuals in the NBA cohort for whom symptom onset was known, the mean time of symptom onset was the date of detection, so t0 is correlated with the incubation period of SARS-CoV-2. With this in mind, we restricted estimates of t0 to fall between 0 and 20 days based on a 2022 review by Wu et al., which reported that across 142 studies of SARS-CoV-2 infection, the incubation period ranged from 1.80 to 18.87 days62.
To maintain identifiability, we fixed two parameter values, setting the rate of viral production onset to be k = 4 in accordance with Ke at al.23 and the rate of clearance of free virions to be γ = 15 in accordance with Goyal et al.11
Model Fitting and Selection
We fit the model in Eq. 1, as well as simpler versions that eliminate one or more immune components and/or the eclipse phase, to data from the NBA cohort using a non-linear mixed effect approach63. With this approach, a viral load measurement from individual i at time point k is modeled as log10(yik) = fV(tik, θi) + ϵ, where fV represents the solution of the ODE model for the state variable describing the virus, θi is the parameter vector for individual i, and ϵ ∼ N(0, σ2) is the measurement error for the log10-transformed viral load data. Furthermore, in the population model, each individual’s parameters can be written as the sum of the average population value, θpop, and a random effect encompassing their deviation from the average, ηi. That is, the parameters for individual i are given by θi = θpop + ηi. We fixed σ = 0.5 log10 viral RNA copies/ml when comparing model fits, so that any differences in likelihood of the full model occur due to a change in agreement between model simulations and data rather than a drastic increase in the estimated magnitude of the measurement error.
For model selection, we first worked with the 266 fully documented infections (early detection and at least 3 weeks of follow-up or confirmed clearance). In addition to the raw data, for individuals without confirmed elimination we imputed 5 “assumed negative” test results at 2-day intervals starting at 40 days post-detection. Out of the 1510 infections considered in model fitting, 629 had regular measurements past 40 days and 99.5% of tests collected past day 40 were negative. For viral load observations below the lower limit of quantification or marked as “assumed negative”, we used the probabilistic model that Monolix software provides for left-censored data (https://monolix.lixoft.com/censoreddata/).
The candidate models that we considered are listed in the supplementary material (Table S1). For each candidate model, we used the Stochastic Approximation of the Expectation Maximization (SAEM) algorithm embedded in the Monolix software to obtain the Maximum Likelihood Estimation (MLE) of the vector of fixed effects, θpop, and the MLE of the vector of standard deviations of the random effects, σθ, for the model parameters β, π, ϕ, ρ, δ, ℎ, τ, m, the delay between infection and date of detection, t0, and the initial viral inoculum, V0 (https://monolix.lixoft.com/tasks/population-parameter-estimation-using-saem/). We assumed a lognormal distribution for parameter values, and a logit distribution for initial conditions. The delay between infection and detection, t0, was assumed to fall between 0 and 20 days. The viral inoculum was assumed to fall between 0 and 250 log10 viral RNA copies/ml.
We ran the SAEM algorithm six times for each model using randomly selected initial values for the estimated parameters. Using the parameter set with the highest likelihood, we computed the Akaike Information Criterion (AIC) for each model. Recall that AIC = −2 max(log ℒ) + 2m where ℒ is the likelihood that the data was generated by this model with these parameter values and m is the number of model parameters. Hence smaller AIC scores indicate that a model is statistically more likely to explain the data. The model with the smallest AIC score in the initial model selection phase included an eclipse phase, a refractory cell compartment and time-dependent clearance of infected cells, but not density dependent clearance. All AIC scores are recorded in Table S1.
The best fit run for the optimal model estimated very little variation in the viral inoculum between individuals. The population average was, V0_pop = 97.3 while the standard deviation of the distribution of random effects was only , suggesting that fixing this parameter at the same value for all individuals may still allow for reasonable fits. We fixed V0near the estimated population mean, V0 = 97 for all individuals and re-ran the SAEM algorithm. This yielded very similar fits to the best fit from Table S1 with a slightly lower AIC score of 13731 compared to 13738. While we expect that the actual viral exposure initiating individual infections in the NBA cohort varied, this suggests that estimating V0, π, and t0 simulataneously for each individual does not lend additional flexibility. For further model fitting, we kept V0 fixed at 97.
To test whether variability in viral dynamics can be attributed to differences in prior exposure, age, or infecting lineage, we performed one-way ANOVA for the random effects of each of the estimated model parameters against these covariates (implemented in monolix). In this case, the null-hypothesis is that the mean of the random effects (calculated from the individual parameters sampled from the conditional distribution) is the same for each category of the categorical covariate. Ranking all possible covariates by their p-value, the most likely covariate was between the onset of acquired immunity and vaccination status. We tried adding this as a covariate to the model, which allows for a perturbation of the population mean, τpop, by some value βτ_j for each possible vaccination status j. Including this covariate improved the model fit according to AIC score, improving from 13731 to 13627. We next checked whether this was a meaningful addition to the model with the Wald test, which tests the null hypothesis that βτ_j = 0 for each possible vaccination status j. Infections in unvaccinated individuals were significantly different from infections in individuals who had been boosted (βτ _0 doses = 0.73, p <2.2e-16) and the group that had no record(βτ _ no record = 0.44, p=6.77e-7). However, individuals who had received 1, 2 or 3+ vaccinations were not significantly different from each other (βτ _ 1dose = -.36, p=3.6e-1; βτ _ 2 dose = -.076, p=3.18e-1). This prompted us to regroup vaccination status into a new categorical covariate, indicating unvaccinated, at least one dose, or no record. With this model, the onset of acquired immunity differed significantly for infections in unvaccinated individuals vs. those who received at least one dose of the vaccine (βτ _ >1dose = -0.8, p<2.2e-16), but the difference between infections in unvaccinated individuals and those with no record was not significant (βτ _ no record= -0.2, p = 2.03e-2). Then we regrouped vaccination status into just two categories, one being individuals who are unvaccinated or have no record and the second being individuals with a record of 1 or more SARS-CoV-2 vaccinations. We repeated this process of choosing one new covariate to add according to the lowest significant p-value resulting from the ANOVA, testing its utility using the AIC and Wald test, and coarsening the categorization if indicated, until no further significant p-values resulted from the ANOVA. This resulted in three covariates, unvaccinated/ no record versus at least one recorded vaccination modified the onset of acquired immunity τ and the infecting lineage being pre-VOC/delta versus omicron versus unknown modified both the onset of acquired immunity τ and the rate at which susceptible cells enter a refractory state, ϕ. Including these covariates reduced the AIC score by 149 to 13589. The models tested along the way are reported in Table S2.
For the best fitting model, there were significant correlations between the random effects of model parameters β, π and δ, as well as ρ and τ. We started with the best model from Table S2 and allowed for linear correlations between these parameters in the final model (https://monolix.lixoft.com/statistical-model/individual-model/individualdistribution/). This further improved the AIC score by 111 points (Table S3). The correlation structure of the final set of parameter estimates is shown in Figure S3 and the correlations between the random effects can be found on the github page at https://github.com/lacyk3/SARS-CoV-2Kinetics. Once the final model was selected, we further restricted the standard deviation of the measurement error to σ = 0.4 log10 viral RNA copies/ml to capture examples of viral reboundin the data and ran the SAEM algorithm in Monolix to estimate parameters for all 1510 infections. Population parameter values are included in Table S4 and individual model fits are shown in Figure S6. Estimated individual parameter values are accessible at https://github.com/lacyk3/SARS-CoV-2Kinetics.
Statistics
When comparing quantitative features and parameter values across different groups, we used a two-sided Mann-Whitney U-test. When assessing significance of the results, we adjusted p values using the Bonferroni correction for the number of comparisons before comparing against a significance threshold of p > 0.05.
Data and Code availability
The data analyzed in this work was previously published by Hay et al. and is available on github at https://github.com/gradlab/SC2-kinetics-immune-history. The code for generating all analysis and figures included in this manuscript is available at https://github.com/lacyk3/SARS-CoV-2Kinetics.
Author contributions
K.O. and J.T.S. conceptualized the study and developed mathematical models. K.O. performed experiments and statistical analysis. K.O., S.E., and J.T.S. interpreted results and wrote the manuscript.
Acknowledgements
We thank Yonatan Grad, Stephen Kissler, and the members of the NBA cohort for making this data available. We also thank Dan Reeves and Liz Duke for their helpful advice regarding figures. This work was supported by National Institutes of Health (NIH) grants R01AI169427 & R01AI121129.
Footnotes
Improved convergence of SAEM algorithm during model selection resulted in alternate best model fit to data--now the best model does not have a density-dependent infected cell clearance term. Figures 3-6 and supplement have been updated using the new model fits.