Abstract
The relationships between viral load, severity of illness, and transmissibility of virus, have been the subject of intense interest since the start of the COVID-19 pandemic. They are fundamental to understanding pathogenesis and devising better therapeutic and prevention strategies. In this report we present within-host modelling to examine the viral load dynamics observed in the upper respiratory tract, drawing upon 2172 serial measurements from 605 subjects, collected from 17 different studies. We developed a mechanistic within-host model to describe viral load dynamics and host response, and also contrasted simpler mixed-effects regression analysis of peak viral load and its subsequent decline. The inclusion of age, sex, or disease severity of the subjects did not appreciably improve the fit of either the mechanistic model or the regression. In future work, this model will be used to connect viral load dynamics to underlying host traits, to better understand these complex interactions.
Introduction
COVID-19 exhibits a wide range of severity, from asymptomatic infection to severe disease leading to hospitalisation and death. Age and sex have emerged as important risk factors for poor outcome [1,2]. Viral load in the respiratory tract has also been reported as an additional determinant of severity of illness [3,4] and also a determinant of likelihood of transmission [5]. However, viral load varies over the course of illness due to dynamic interaction with the host immune response, and measurements at single points in time provide limited insight into this dynamic process. Within-host models of viral load can help to distinguish the sequence of events by tracking both viral dynamics and host response over time, accounting for the effect of multiple factors simultaneously [4,6,7].
Studies tracking viral load over time in COVID-19 are beginning to establish viral dynamics and explore correlates of protection in the host response, although findings to-date are somewhat contradictory and these relationships are still not well characterised [8]. Viral load in the upper respiratory tract peaks early in infection, usually before or within a few days of symptom onset [8–12]. Some studies suggest that viral load in the lower respiratory tract may peak later, in the second week after symptoms [9], but this is much harder to measure in a serial manner. Viral load at a given time after diagnosis or detection tends to be similar between asymptomatic and symptomatic cases [13], but evidence tends towards a longer duration of viral shedding in more severe cases and older individuals [14,15]. Despite detection of viral material in samples from some individuals several weeks after onset of symptoms, live virus is not usually present beyond 8-14 days [16–19].
The host response during COVID-19 has many features typical of an anti-viral immune response, including the generation of antibodies and T cell populations [20]. Antibodies are chiefly considered to contribute to viral clearance through pathogen neutralisation, though other effector functions may be of significance to COVID-19 [21]. A diverse array of T cell populations are active during COVID-19, including cytotoxic and helper populations. While these cells will contribute to viral clearance, emerging data indicates that T cells may also hold immunopathogenic functions in severe cases of COVID-19 [22]. Additionally, COVID-19 is associated with lymphopenia in peripheral blood [23], possibly reflecting migration of cells to the site(s) of viral infection. In addition to these adaptive immune processes, innate immune processes are considered to play a major part in the pro-inflammatory state that scales with COVID-19 severity [24].
Within-host models are being developed to characterise viral kinetics in relation to host response and disease outcomes and to guide therapeutic development. For example, Néant et al. [25] found an association between higher viral load late in infection and mortality. Goyal et al. [26] inferred different stages of host response from observing three stages of viral decline: a rapid drop following peak viral load, a period of slower decline, then rapid elimination of the virus. Benefield et al estimated that viral load peaks prior to symptoms, suggesting substantial pre-symptomatic transmission [27]. Other within host models have been used to explore the potential effects of antivirals, immunotherapies, and prophylactic treatment [28,29]. More detailed models have simulated viral load in different parts of the body and detailed components of the innate and cellular immune response [30,31].
Many modelling studies to date have been calibrated to limited longitudinal data on viral load and particularly on host responses, which reduces parameter identifiability and the ability to infer pathways of pathogenesis and protection. Here, we first used linear regression models to assess the associations of age, sex, and severity of disease with viral load. Then we developed a model of viral dynamics, in which we pragmatically represented the complex host response to the virus in two phases: an early phase which restricts the initial rate of viral replication, and a later phase which acts to clear virus. Fitting this model to longitudinal viral load measurements from previously published studies allowed us to make individualised estimates of key metrics such as peak viral load and rate of decline in viral load after the peak, which could also be related to age, sex, and severity. This model represents the first step towards a well-validated, flexible, and open-source framework which can be utilised by different research groups to better interpret immune responses in COVID-19 in the context of viral load, and to understand how different treatments given at different stages of illness might influence viral load, host response, outcome, and transmission potential.
Results
Our literature search revealed 53 studies reporting longitudinal viral load measurements, from which we obtained individual-level viral load measurements from 19 studies (either from the supplementary materials of the publication or preprint, or by emailing the corresponding authors). The analysis presented here utilises data from 17 of these studies (summarised in Table 1). Studies that were identified but did not provide data are shown in Supplementary Table 1. We excluded two studies that contained little or no longitudinal data for upper respiratory tract (URT) samples [32,33]. In all studies, a description of disease severity for each patient was available, although the level of detail varied between studies. In six studies viral loads were fully quantified from concurrent standard curves, whilst in the remaining 11 studies cycle-threshold values were reported. To combine data from different studies we generated an “average” standard curve, using data from 7 previously reported standard curves to convert to viral load per ml (Supplementary Figure 1). Details on antiviral or immunomodulatory treatments were available from some studies (Supplementary Table 2).
In total, 2172 upper respiratory tract samples from 605 patients were used for analysis. Subject-level data on age and sex were available for 576 subjects and missing for 29 subjects. The majority (492 out of 576; 85%) of subjects were under 60, compared to 84 (15%) aged 60 or over; 321 (55%) patients were female. Disease severity was categorised per the WHO scale [34], where 501 (83%) patients (contributing 1698 (78%) samples) experienced mild COVID-19 illness, 65 (11%) patients (contributing 359 (17%) samples) had moderate severity illness, and 39 (6%) patients (contributing 115 (5%) samples) had severe illness. Supplementary Figure 2 shows the viral load data for all patients, presented separately for each study. The vast majority (2163, or 99.6%) of samples were taken after the onset of symptoms, although samples were collected from a minority (8 patients across all 17 studies) prior to symptom onset e.g. if they were identified as a close contact of another patient. The median timing of swabs was 12 days after symptom onset (interquartile range: 6-19; range: 2 days before symptom onset to 54 days afterwards). Pooling the data from all studies, we found that the median viral load peaked one day after symptom onset (Figure 1a), although less data was available on the day of symptom onset compared to subsequent days (Figure 1c). 421 patients had more than one URT swab recorded: for 60.1% of these patients, the first sample had the highest recorded viral load. Viral load estimates at corresponding times after onset of symptoms did not differ systematically between studies in which viral load was calculated by the authors of the original study or inferred from our averaged standard curve (Figure 1c), although less data was contributed by studies which used concurrent standard curve quantification (Figure 1d). The timing of the first sample obtained relative to onset of symptoms varied with severity of illness. Subjects with moderate or severe disease had first samples collected later in their illness than those with mild disease (Supplementary Figure 3a). Accordingly, first viral load and maximum viral load measurements for subjects with moderate or severe disease in these studies were lower than in those with mild disease (Supplementary Figures 3b and 3c).
Fitting a regression model to the viral load data
We fitted two types of models to the viral load data from the first 15 days from onset of symptoms (see Methods). The first was a linear regression model, fitted to log-transformed viral loads. This model included patient- and study-specific random effects, which captured variation from the average behaviour observed across the 17 datasets. There was variation in the peak and slope of the viral loads across different studies (Figure 2, Supplementary Table 3). The variation in the (log-transformed) peak viral load varied over several orders of magnitude. Some of this variation was explained by study-specific differences: the standard deviation of the between-study variation in the peak viral load was 1.01 i.e. approximately one order of magnitude (Supplementary Table 3). This could be due to a number of factors, such as the method of sample collection, or quantification method. Furthermore, viral loads in several studies were estimated using an averaged standard curve, which introduces some uncertainty into the magnitude of the viral load. However, the inclusion of study-specific random effects allows such data to be appraised alongside data from other studies.
Within this regression framework, we incorporated information on age, disease severity and sex (Methods) to see if the goodness of fit could be improved. We added fixed effects for these three variables, both separately and in combination. We examined including age as a continuous variable or a dichotomous one (stipulating whether patients were 60 years of age or over). The goodness of fit to the data did not vary appreciably: here we report results for age as a dichotomous variable. The model that included both age and sex provided the best fit to the data (Figure 3), suggesting that subjects over 60 years old have a slightly lower peak viral load, which then declines more slowly over time. However, the difference in fit between this model and the model with no fixed effects was marginal (Table 2). Due to the relatively small amount of data from subjects over 60, this result should be interpreted with caution. The inclusion of severity of disease did not improve the model fit.
Fitting a mechanistic model to the viral load data
In addition to modelling viral load decline using regression models, we also developed a mechanistic model which we fitted to the dataset. A number of mechanistic models, of varying complexity, have been fitted to viral load data from COVID-19 patients. In this work, we have elected to keep the model relatively simple, representing the multi-faceted immune response to the infection via two components: The exponential growth of viral load in the initial phase of infection is brought under control by an early immune response. Subsequent to this, the infected cells are gradually cleared by a late immune response. In reality the immune response clears both infected cells and virus: in the absence of data to distinguish these two mechanisms, the modelled response only clears infected cells. The effects of this mechanism should be viewed as being representative of the effects of a broader immune response. The early immune response is stimulated by a high load of infected cells and starts to block viral replication and the invasion of susceptible cells. The late response requires a maturation phase before it becomes effective and is therefore, more representative of the adaptive immune response. However, we do not attempt to fully distinguish between innate and adaptive responses in this model, as the interplay between them is complex.
To guide the model fitting (see Methods), we made the assumption that both the peak viral load and the activation of the early immune response should roughly be concurrent with the onset of symptoms. The vast majority of subjects in the dataset were only under observation after the onset of symptoms, which means we are unable to infer the rate of exponential growth during the initial phase of the infection. For each subject, we fitted two parameters in the mechanistic model, holding all other parameters fixed (see Methods). One of these free parameters governs the density of infected cells required to activate the early response, while the other determines the rate at which the late response clears infected cells and, therefore, the rate at which the viral load declines. As we did for the regression modelling, we used a nested random effect structure, with study- and patient-specific random offsets for both of these parameters (Methods). In addition, we incorporated data points that fell below the limit of detection in each study by accounting for censoring in the likelihood (Methods).
This approach allowed us to characterise the average time-course of an infection at the population-level i.e. after removing study- and patient-level offsets (Figure 4), as well as showing the dynamics for the study specific fits to the data. There was no significant relationship between the subject-specific random effects for either of the model parameters and maximum disease severity (Supplementary Figure 4), consistent with the outcome of the preceding regression modelling (Table 2). Similarly, there was no significant relationship between the subject-specific random effects and subject age or sex (not shown). Supplementary Figure 5 shows the 95% credible intervals for the study-specific random effects, showing the between-study variation in both the peak viral load (lower panel) and the rate at which the infection is cleared in the URT (top panel). Supplementary Figure 6 shows modelled viral loads for each of the 519 subjects used to fit the mechanistic model.
Discussion
Understanding the causes and consequences of variation in pathogen load is fundamental to infectious disease research [35]. Increasing pathogen load can drive both pathogenesis and transmission of infection. High viral load in COVID-19 has been associated with severity of illness in some studies [3,4], but not others (see e.g. Ref. [36]), and has been associated with risk of transmission [5]. Pathogen load is dynamic, it varies over time, and is often considered to be the stimulus for the host response, as well as a target of the host response. Therefore, any attempt to establish the determinants of pathogen load and relationships between pathogen load, severity, and transmissibility, should account for these dynamics.
We collated longitudinal URT viral load data from 2172 samples taken from 605 subjects with SARS-CoV-2 infection in 17 studies, to investigate the association of viral load dynamics with age, sex, and severity of illness. We used the WHO clinical progression scale to standardise varied descriptions of disease severity reported in different studies. We used two distinct modelling approaches to characterise viral load dynamics, accounting for systematic differences in viral load estimation between studies. We found little evidence to support the hypotheses that URT viral load dynamics are substantially influenced by age or sex, or that URT viral load dynamics influence severity of illness. We found no association between severity and the latent variables describing early and late immune responses to URT virus in our mechanistic model. Nevertheless, we identified considerable inter-individual variation in URT viral load dynamics. Understanding the biological basis for this variation could help to identify new approaches to reduce transmission of SARS-CoV-2.
The lack of association between URT viral load dynamics and severity of illness is particularly interesting. On one hand, this is not necessarily surprising since severe COVID-19 is a consequence of lower respiratory tract (LRT) and systemic disease processes, and some studies have indicated that viral load in lower respiratory tract samples is more strongly associated with severity of illness [9]. On the other hand, this would imply that distinct processes govern URT and LRT viral load dynamics, or that the extension of infection from URT to LRT and other systemic locations is controlled by different mechanisms to those controlling local viral load. Although it is difficult to obtain serial lower respiratory viral load measurements, evaluating distinct mechanisms controlling local and spatial viral dynamics could be important to understand the pathogenesis of COVID-19 and other respiratory infections.
Our study provides one of the most comprehensive assessments of URT viral load dynamics, but despite collating data from a large number of studies we had a relatively low proportion of patients with very severe illness and very few from those with fatal infection, potentially reducing our ability to distinguish different viral dynamics in these groups. We also had very little data on URT viral loads before the onset of symptoms, which limits our ability to model variation in the rate of increase in viral load early in infection. We lacked data on the interval from exposure to symptoms, forcing us to fix this parameter in our mechanistic model. This means we were unable to properly explore the role that the initial viral dose plays in the dynamics. Furthermore, we did not have sufficient data on ethnicity, or other host characteristics beside age and sex, which may have been of interest to examine associations with viral load dynamics. We extracted data on antiviral treatment in different studies where possible (Supplementary Table 2), but there were too few subjects receiving each treatment to allow meaningful analysis.
We used two modelling approaches to analyse the data. Mixed-effects regression modelling sought to determine if any of the wide variation observed in patients’ upper respiratory viral loads could be explained by age, sex, or disease severity. We observed a marginal improvement in model fit observed when both age and sex were included in the model, but these variables were not strongly associated with viral load. Furthermore, we note that the data analysed here has many more patients under 60 years of age than over 60. This means that the finding that patients over 60 have a slightly lower peak viral load which declines slightly more slowly over time, should be viewed very cautiously and requires further investigation.
In order to utilise as much data as possible, we have included semi-quantitative data (presented as Ct values) alongside fully quantitative viral load data. We have converted the former using an averaged standard curve. We believe that modelling the data with study-specific random effects allows us to analyse all the data collected, without requiring study-specific standard curve data. As this project progresses, however, we will examine whether the inclusion of these data influence the conclusions of our analyses.
It is interesting to compare our mechanistic model to others that have been fitted to viral load data. Goyal et al. [26] developed a more complex model, which was able to capture a changing rate of decline in viral load observed in the patient data available to them. However, one should note that in this work we are fitting to a narrower timeframe (no more than 15 days after symptom onset), so it may be that the changing rate at which viral load declines is not so noticeable during this phase of the infection. Another difference between the two approaches is that we use the time of symptom onset around which to centre the dynamics (i.e. set time equal to zero), whereas Goyal et al. use the time of the first swab. When considering data from lots of different subjects, using symptom onset for temporal alignment helps adjust for the wide variation in the time of sample collection, particularly as we observed a relationship between disease severity and the time of the first sample collection in the studies considered here (Supplementary Figure 3). Among the within-host models already published, one finds wide variation in model complexity. Some models, such as the one presented here, have erred on the side of parsimony, whilst others have sought to capture very intricate within-host processes (see e.g. Ref. [31]). An interesting avenue for further work would be to use goodness-of-fit criteria, such as the one used here for the regression modelling, to explore the extent to which more complicated models explain more of the variation present in the data.
Overall, our analyses indicate considerable variation between individuals in the dynamics of URT viral load, but no association between this variation and severity of illness, or with important biological determinants of severity. This has important implications for investigation of the mechanisms driving both pathogenesis and transmissibility of SARS-CoV-2 infection. We present our mechanistic model as a resource for researchers to relate URT viral load dynamics to biological traits, in order to further unravel mechanistic determinants.
Methods
Data
To collect data on viral load dynamics, we searched PubMed and medRxiv for studies that recorded longitudinal viral load data from individuals with symptomatic COVID-19 infections. In particular, we searched for studies which reported the timing of symptom onset for each patient, which we used to temporally align samples from different patients. We identified 53 studies, data from 5 of which could be extracted from the publication or preprint. We contacted the authors of 48 studies by email to request access to patient-level data, and we received data from 14 of these studies. 2 of the 19 studies were dropped from this analysis, as they contained little or no longitudinal data for upper respiratory tract samples. Studies for which data was obtained are summarised in Table 1: all remaining studies identified by our literature search are summarised in Supplementary Table 1. We extracted as much information as possible on the severity of illness experienced by the patients and extracted demographic information on the patients where available. Two authors independently studied the severity information and matched the descriptions to the WHO clinical progression scale [34]. Some articles contained detailed information on the course of disease for each patient, but this was not available in all the studies. We considered scores of 1-3 to represent mild disease, 4-6 to represent moderate disease and scores above 6 to represent severe disease. This is slightly different to the patient state descriptors of the WHO scale, because the majority of studies did not provide sufficient detail about the method of oxygen delivery to allow us to distinguish between scores of 5 and 6. In some studies, the severity of symptoms was recorded at multiple timepoints, over the course of the infection. Here, we use the term ‘severity’ to describe the maximum severity of disease experienced by each patient.
Although some studies with a long follow-up period demonstrate that some patients can remain PCR-positive for virus for well over a month [37,38], several studies [16,17] have demonstrated that very few people are infectious, based on URT swabs, beyond 10 or 14 days after symptom onset. Data collected after this period is likely to reflect RNA debris that remains in the URT. Therefore, our models of viral load are fitted to data points within the first 15 days of symptoms. This means that we have fewer data points to fit to (Table 1), but we believe that the meaningful dynamics, in terms of how the host controls the infection, are found in this time period.
Viral load Quantification
In all studies real-time polymerase chain reaction (PCR) assays were used to quantify upper respiratory tract virus either as cycle-threshold (Ct) values, or as viral copies /mL (V). As discussed in Ref. [39], calculation of viral copies per ml from Ct values requires the use of a ‘standard curve’, which is calibrated to the experimental set-up in a particular laboratory using reference samples. In general, these curves have the form:
Here a and b are positive numbers that fully specify the viral load for a given Ct value. This relationship indicates that there is a linear relationship between log-transformed viral loads and the raw Ct values, with higher Ct values representing lower viral loads. The units of V vary between studies (e.g. viral copies per ml, per swab, or per 1000 cells), all studies collected here that have quantified viral load have used viral copies per unit of volume. We collected as many different standard curves as possible, from the studies included in this analysis [40,41] and from other papers in the COVID-19 literature [33,42–44], to understand the variation observed. From these, we determined an ‘averaged’ standard curve (Supplementary Figure 1), using the mean observed values for parameters a and b, which we used to estimate the viral loads for studies for which only Ct values were available. This enabled us to pool data from all 17 studies.
Models
We sought to explain variation in viral load (either its peak value or its rate of decline over time) among patients, due to (e.g.) age, sex, and severity of disease. We did this using two types of models, a linear mixed effects regression approach and a mechanistic model, which took the form of a system of first-order differential equations.
Linear regression models
Bayesian regression models were fitted using RStan [45], with some of the analysis carried out using the rethinking package [46]. Linear models were fitted to log-transformed viral loads, with random effects for each patient and study applied to the parameters determining the peak viral load, which we assumed coincides with the onset of symptoms, and rate of its decline over time. Samples for which no virus was detected were treated as being below limit of detection (LOD), rather than truly virus negative. We show the general form of the regression model here, where L is the likelihood of each data point, to illustrate the random-effect structure and the censoring of the data:
This model is relatively simple: the log-transformed viral loads are captured by a linear model, with the intercept describing the viral load at the time of symptom onset, and the slope capturing the rate at which the viral load subsequently declines. The log-transformed viral load for a given patient is normally distributed around the average trajectory, which is described by μ, with a standard deviation given by σ. For data points where no virus was detected, the likelihood takes a different form. We calculate the likelihood as the probability of the viral load being below the LOD, writing NCDF as the cumulative distribution function of the normal distribution. We allow for the fact that the LOD is study-specific (indicated in Figure 2). All remaining terms in Equation 2 define the prior distributions for the parameters. Both the slope and the intercept have a parameter that captures the average behaviour across all patients in all studies (a0 and b0 respectively). Random effects capture patient- and study-specific deviations from the average behaviour. Both the patient- and study-specific random effects are normally distributed with zero mean: here we write N2 to indicate the bivariate normal distribution. We allow correlation between the random effects on the intercept and slope at each level, as indicated by the presence of parameters ρstudy and ρpatient. For these parameters, we use the LKJ distribution [47] as a prior.
Within this framework, fixed effects could then be added to the model, to assess whether (e.g.) severity of disease, or the sex of patient affected the typical viral load trajectory observed. To do this, we simply add terms to % in Equation 1. Here we illustrate this for sex, making female the reference category:
Here variable ‘Male’ is equal to 1 if the patient is male (0 otherwise), and we have also added zero-mean priors for the new parameters. As mentioned above, severity was expressed on the WHO scale, which takes values from 1-10. As 1 is asymptomatic, severity in our dataset is limited from 2 (mild symptoms, not hospitalised) to 10 (dead). We chose to make three groups from these categories: Mild (2 or 3), Moderate (4,5,6) and Severe (7,8,9,10). When severity is included in the model, μ becomes:
Here variable ‘Mod’ is equal to 1 for patients with disease of moderate severity (equal to zero otherwise) and variable ‘Sev’ is equal to 1 for patients with severe disease (0 otherwise). In most studies, patient age is given in years, as an integer value. Some studies expressed age of patients in decades e.g. 10-20 years of age. For these studies, we used the midpoint of the age range given for each patient. We grouped age into two groups: <60 years of age and ≥60 years of age. In the regression modelling, we added a fixed effect for age, making the youngest age group the default. Here, μ takes the form:
Here variable ‘Age’ is equal to 1 for patients aged 60 or over (equal to 0 otherwise). We also looked at including subject age as a continuous variable, but the goodness of fit did not change appreciably. We added the 3 terms (age, severity, sex) to the regression models both separately and in combination. We assessed the goodness of fit using the Watanabe Akaike Information Criterion (WAIC) [47], with the best-fitting model having the lowest WAIC value. As the WAIC for each candidate model is estimated from a finite sample, its standard error was calculated using the rethinking package [46] to appreciate the uncertainty in its value. These standard errors are useful when appraising the differences in WAIC between candidate models. The relative goodness of fit of a given model can also be appraised by calculating its Akaike weight among the set of all considered models. This can be interpreted as the probability that this model, out of the set of models considered, would provide the best fit to new (i.e. out of sample) data [47].
Mechanistic model
We also developed a mechanistic model to describe the viral dynamics. We started modelling the infection 5 days before the onset of symptoms. At this time, an initial dose of virus V0 is introduced. These virons start invading susceptible cells at rate p. Infected cells then produce more virus at rate β, which can then invade subsequent susceptible cells. Meanwhile, free virions are cleared at rate γ. In this way, the viral population grows exponentially. In the model, this growth is brought under control by an early phase of the immune response, and then the infection is gradually cleared by a later phase of the immune response. We do not seek to specify the immunological mechanisms contributing to these phases of the immune response. In each infection, the exponential growth slows as the population of infected cells approaches a certain value (Imax). This reflects a combination of two within-host mechanisms: the depletion of susceptible cells in the URT and the effect of an early phase immune response which is triggered at a high level of infection. Since these mechanisms may be linked, i.e. the immune response may modify the susceptibility of target cells, we do not attempt to distinguish between them, or to explicitly model the population of susceptible cells.
After the exponential growth is brought under control, the infection is cleared by the late immune response. In this model, this is triggered by a certain density of infected cells but requires a maturation stage before it becomes effective at clearing infected cells. We write the system of equations as:
The late immune response, represented here by variables (A1, A2, A3), is stimulated by the presence of infected cells. A Hill function is used to make the adaptive response dimensionless, and to rescale its value (daily input into compartment A1 scales between 0 and 1). Parameter I50 determines the magnitude of the density of infected cells required to stimulate this response. It requires time to mature (governed by rate parameter kc), meaning that only stage A3 is able to clear infected cells. Once mature, the late response does not wane in this model, as we are only interested in its ability to clear an infection, not how long it persists after the infection has been resolved. In this way the late response recapitulates features of adaptive cell-mediated and humoral immunity, without needing to specify their relative contributions. Parameter ka represents the maximum rate at which the late response can clear the infection, and A50 governs the magnitude of the adaptive response required effectively clear infected cells. We note that the magnitude of the late immune response has been set to be of order 1 for convenience, and this determines the magnitude of A50.
For the vast majority of patients for whom we have data, only samples taken after the onset of symptoms are recorded. This means that we are unable to estimate the duration of the incubation period, the initial dose of virus that causes the infection, or the rate at which the virus reproduces before being acted upon by the immune response. We elect to model a five-day incubation period and fix the rate of exponential growth to be the same for all patients, setting β = 0.8 ml day-1virion-1, γ = 13day-1, p = 80day-1. This is a simplification: it is unclear whether, in reality, the time from infection to symptom onset varies by age, sex or disease severity. For each patient the infection starts at Day -5 (in the dataset, Day 0 is the day of symptom onset, not the day of infection onset) with initial conditions given by (I = 0, V = V0, A1 = 0, A2 = 0, A3 = 0) i.e. the infection starts with an initial viral dose, no infected cells, and no prior exposure to the virus (late immune response completely inactive).
The initial viral dose, , for each patient is chosen with the idea in mind that the viral load should peak at or close to the time of symptom onset. Ideally, , should be fitted simultaneously with the free parameters for each patient, but this has proved challenging to date, especially given that peak viral load should coincide approximately with symptom onset but may also not be observed. Briefly, , is chosen by considering the linear subsystem of equations that govern the growth of the virus prior to the activation of the immune response to the infection. As this system of equations is linear, it can be solved exactly (we use a matrix exponential). We require that , be chosen so that an uncontrolled infection would reach the maximum viral load observed just prior to the onset of symptoms. This ensures that the peak viral load modelled for each patient is controlled by the early-stage immune response, as it is too soon for the adaptive response to be effective at clearing infected cells.
When fitting to the data, we allowed the values of ka and Imax to vary and we fixed all the other parameters. As we did for the regression modelling, we included random effects in the model, to account for different behaviour in different patients and across different studies. In the case where we have P patients taken from S studies, we write:
This means that and represents the population-level average value of each parameter. We excluded data recorded more than 15 days after the onset of symptoms, and only fitted to patients with at least one positive swab before this time, leaving 1326 samples from 519 patients for model fitting. The system of equations was solved numerically in R, using the dopri function from the dde package [48]. We write the modelled viral load trajectory at time : for patient ; in study j as Vij (t). Samples from the posterior distribution of the fitted parameter were obtained using Markov Chain Monte Carlo methods, via the R package drjacoby [49]. As with the regression modelling, the form of the likelihood for each data point depends on whether the sample is above the study-specific LOD or is recorded as being virus-negative. We write Dijk to indicate the kth viral load sample, taken tk after the onset of symptoms, from patient i in study j. The likelihood for each data point has the form:
The global likelihood was then obtained by multiplying together the likelihoods for each data point. We provided prior distributions for the parameters to be fitted. These were:
Data Availability
At present, we refer the reader back to the original studies (summarised in Table 1) to access or request the viral load data.
Funding
This work was supported by a research grant (MR/V027409/1) from UKRI (MRC) and DHSC (NIHR). Infrastructure support was provided by the NIHR Imperial Biomedical Research Centre. JDC and LCO acknowledge joint Centre funding from the UK Medical Research Council (MRC)/UK Department for International Development (DFID) under the MRC/DFID Concordat agreement (grant reference MR/R015600/1).
Supplementary Materials
The Supplementary Materials contain Supplementary Tables 1-3 and Supplementary Figures 1-6.
Availability of Data
At present, we refer the reader back to the original studies (summarised in Table 1) to access or request the viral load data.
Availability of Code
As modelling tools are developed, they will be made freely available in the following repository: https://github.com/JDChallenger/Viral_Load_Covid19.
Acknowledgments
We would like to express our gratitude to the study authors who shared their data, as well as all those involved in the data collection. JDC would like to thank Peter Winskill, Robert Verity, and Clare McCormack for useful discussions.