Abstract
Aim The objective of this nationwide study was to investigate the association between SARS-CoV-2 transmissibility, viral load, and age of primary cases in Danish households.
Background Spread in households represents a major mode of transmission of SARS-CoV-2. In order to take proper action against the spread of the disease, it is important to have a better understanding of transmission in the household domain—including the role of viral load of primary cases.
Methods The study was designed as an observational cohort study, using detailed administrative register data. We included the full population of Denmark and all SARS-CoV-2 tests (August 25, 2020 to February 10, 2021) to estimate transmissibility in house-holds comprising 2-6 people. RT-PCR Cycle threshold (Ct) values were used as a proxy for viral load.
Results We identified 63,657 primary cases and 139,882 household members of which 21% tested positive by RT-PCR within a 1-14 day period after the primary case. There was an approximately linear association between Ct value of the sample and transmissibility, implying that cases with samples having a higher viral load were more transmissible than cases with samples having a lower viral load. However, even for primary cases with relatively high sample Ct values, the transmissibility was not negligible, e.g., for primary cases with a sample Ct value of 38, we found that 13% of the primary cases had at least one secondary household case. Moreover, 34% of all secondary cases were found in households with primary cases having sample Ct values >30. An increasing transmissibility with age of the primary cases for adults (≥20 years) and a decreasing transmissibility with age for children (<20 years) were found.
Conclusions Although primary cases with sample high viral loads (low Ct values) were associated with higher SARS-CoV-2 transmissibility, we found no obvious cut-off for sample Ct values to eliminate transmissibility and a substantial amount of household transmission occurred in households where the primary cases had high sample Ct values (low viral load), The study further showed that transmissibility increases with age. These results have important public health implications, as they suggest that contact tracing should prioritize cases according to Ct values and age, and underline the importance of quick identification and isolation of cases. Furthermore, the study highlights that households can serve as a transmission bridge by creating connections between otherwise separate domains.
1 Introduction
The world is in the midst of the pandemic caused by the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2). As the pandemic has huge medical as well as economic consequences, it is essential to understand the SARS-CoV-2 transmission dynamics and associated factors in order to improve interventions to control the spread of the disease, such as contact tracing efforts. Close person-to-person contact represents a main mode of transmission and households are one of the most important domains where such close contacts occur. Elucidation of risk factors for transmission within households is therefore of major importance.
Real-time reverse transcription polymerase chain reaction (RT-PCR) from nasopharyngial and oropharyngial swabs is used worldwide to detect SARS-CoV-2 (Corman et al., 2020). Detection of SARS-CoV-2 by RT-PCR is—similar to other viruses—dependent on the viral load and is reversely correlated with the Cycle threshold (Ct) value of the test (Singanayagam et al., 2020). The viral load changes during the course of disease and has been shown to be highest around the time of symptom onset (He et al., 2020).
Recently, studies have shown that higher viral load (low Ct values) was associated with increased transmissibility of SARS-CoV-2, and it has been argued that contact tracing should be prioritized on cases with a high viral load, as they will have a higher risk of generating secondary cases (Marks et al., 2021; Lee et al., 2021a,b). Furthermore, it has also been described that transmissibility depends on the age of the infected individual and the age of the exposed individual (Lyngse et al., 2020, 2021; Lee et al., 2021a,b).
The aim of this nationwide observational study was to investigate the association between the transmissibility of SARS-CoV-2, RT-PCR Ct values, while taking age of infected cases into account.
2 Data and Methods
In Denmark all residents have access to tax-paid universal health insurance and tests for SARS-CoV-2 are free of charge. The testing capacity has increased during the pandemic and is widespread. Furthermore, SARS-CoV-2 sick leave is fully reimbursed by the state. Thus, neither access to tests nor financial reasons were major obstacles to obtaining a test during our study period. RT-PCR tests for SARS-CoV-2 could be obtained from either community testing facilities at TestCenter Denmark (TCDK) or from clinical microbiology laboratories at hospitals. Statens Serum Institut (SSI) analyses all tests from TCDK. Information on Ct values was only available for samples that tested positive at TCDK.
2.1 Register Data
In this study, we used Danish administrative register data comprising the full population. All residents in Denmark have a unique personal identification number that allows a complete linkage of information across different registers at the individual level. All results from RT-PCR tests for SARS-CoV-2 in Denmark are registered in the Danish Microbiology Database (MiBA), from which we obtained data on the individual level for all national tests for SARS-CoV-2 during the period August 25, 2020 (which was the first date with accessible Ct values) to February 10, 2021. Primary cases were only included until January 25, 2021 in order to allow for secondary cases to present within the following 14 days (see definition of cases below). Moreover, we only included primary cases identified by TCDK, as we only had Ct values on those case samples. To identify secondary cases, we included all tests, regardless of whether they tested at TCDK or hospitals.
Information on the reason for being tested (e.g., symptoms, potential contact with infected persons etc.) was not available. From the Danish Civil Registry System, we obtained information about the sex, age, and home address for all individuals living in Denmark. People who tested positive for SARS-CoV-2 using antigen tests were not included, as these test results were not transferred to MiBA at the time of the study.
2.2 Data Linkage
We constructed households by linking all individuals living at the same address, including identification of apartments (e.g., six single apartments in the same building counted as six independent households). Only households with two to six members were included in the study in order to exclude, e.g., residential institutions, such as long term care facilities. Person-level data, including information on the test result, the date of sampling, and the date the result was available, was linked to individuals within each household. For every household, we identified the person with the first positive test for SARS-CoV-2; referred to as the primary case throughout this paper. If there was more than one possible primary case, i.e., if two or more cases tested positive on the same day—and were the first identified cases in the household—one was randomly assigned as the primary case. We considered all subsequent tests from other members in the same household as being tests taken in response to the primary case and defined secondary cases as those who had a positive test with sampling dates 1 to 14 days after the primary case tested positive. In addition, we assumed that all identified secondary cases were infected by the primary case within the same household. Lastly, we refer to the Ct value of the first positive sample for each primary case (i.e., the identifying sample) as Ct value throughout the paper.
2.3 Laboratory Analyses
Analysis of tests at TCDK was performed using a set of primers that target the E-gene on SARS-CoV-2 (Corman et al., 2020), which is recommended by the WHO and the ECDC, and has a high sensitivity and specificity (Vogels et al., 2020). TCDK have used the same methodology and primers throughout the epidemic, making the Ct values comparable across the study period. An RT-PCR test for SARS-CoV-2 was defined as positive if the Ct value was ≤38 (SSI, 2021).
2.4 Statistical Analyses
We utilized two concepts for transmissibility of the primary case: transmission risk and transmission rate, as described in Lyngse et al. (2021). The transmission risk describes the risk of infecting at least one other person within the household, and equals one if any (one or more) secondary cases are identified within the same household, and zero otherwise. The transmission rate is the proportion of potential secondary cases within the same household that tested positive. Both transmissibility measures were weighted on the primary case level, such that each primary case has a weight of one.
In addition, we utilized one concept for the susceptibility of the potential secondary case: attack rate. The (secondary) attack rate is defined as the proportion of potential secondary cases that tested positive. The attack rate was weighted on the potential secondary case level such that each potential secondary case has a weight of one.
We estimated the association between Ct values and transmissibility for each Ct value separately, using a non-parametric linear regression.
We grouped primary cases according to their Ct values, and we counted their associated positive secondary cases. From this, we calculated the proportion of secondary cases that were associated with the identification of primary cases with a Ct value above a certain threshold.
We estimated the association between age and transmissibility stratified by the median Ct value for each five-year age group separately using a non-parametric linear regression.
We estimated the association between Ct values, age, and transmissibility. We grouped primary cases according to their Ct values in bins of two and ten-year age groups, and we estimated the transmissibility separately using a non-parametric linear regression.
We used a logistic regression model to estimate the odds ratio of the transmissibility for Ct value, age, sex, and household size. We used a univariable model to investigate the effect of each explanatory variable separately, and a multivariable model to investigate the combined effect.
2.4.1 Sensitivity analyses
We estimated the age structured transmissibility stratified by sex to investigate whether there were different patterns across men and women. We also estimated the age structured transmissibility stratified by Ct value quartiles to investigate whether the pattern was independent of the dichotomization using the median Ct value. As Ct values were only available for the primary cases that were identified by being tested at TCDK, there is a potential bias from not including primary cases identified at hospitals. We, therefore, estimated the age structured transmissibility stratified by place of testing.
Because people normally live with a partner around their own age and parents live with their children, susceptibility correlation with age could drive an age structured transmissibility. To address this potential bias, we stratified our sample by Ct value quartiles and estimated the transmission rate with the interaction of the age of the primary and potential secondary case.
Lastly, one could think that primary cases with high Ct values (low viral load) were due to sampling late in the infection and cases classified as secondary cases were actually primary or co-primary cases. To address this concern, we investigated whether changing the criteria for including secondary cases in relation to time of identification of the primary case, i.e., only including secondary cases found positive on days 1-14 (as in the main analyses), 2-14, 3-14, and 4-14, respectively, affected the results.
2.5 Ethical statement
This study was conducted using administrative register data. According to Danish law, ethics approval is not needed for such research. All data management and analyses were carried out on the Danish Health Data Authority’s restricted research servers with project number FSEID-00004942.
3 Results
3.1 Descriptive Statistics
From August 25, 2020 to February 10, 2021, TCDK analyzed 9,347,218 samples for SARS-CoV-2 (75% of all tests in Denmark) and identified 63,937 primary cases up to January 25 (73% of all primary cases in Denmark) (Table S1). A Ct value was available for 99.6% of these primary cases.
A total of 63,657 primary cases with a Ct value were registered as living in households comprising of 2-6 people with 139,882 potential secondary cases, of which 29,739 tested positive for SARS-CoV-2 1-14 days after the primary case was tested and were thus included as positive secondary cases in this study. This implies an attack rate of 21% (29,739/139,882). Across primary cases, potential secondary cases and positive secondary cases, the distribution of sex, age, household size, attack rate, and Ct values of the primary cases are shown in Table 1. (See Table S2 for 5-year age distributions and Table S3 for the distribution of Ct values of primary cases, including attack rates.)
The distribution of Ct values in samples from primary cases is shown in Figure S1: 25% of primary cases had a Ct value ≤25, 50% had a Ct value ≤28, and 75% had a Ct value ≤32. The distribution of Ct values was relatively similar across age groups, suggesting that differences in test strategies across age were not driving the results of this study (Figure S2 and Table S4).
3.2 Associations with transmissibility
Figure 1 shows the association between Ct values and transmissibility. There was an approximately linear decreasing relationship between Ct values and transmissibility. Both the transmission rate (blue) and the transmission risk (red) decreased with an increasing Ct value of the primary case. A primary case with a Ct value of 18 had a transmission rate of 47% and a transmission risk of 58%, compared to a transmission rate of 9% and a transmission risk of 13% for a Ct value of 38.
Figure 2 shows the reverse cumulative distribution of secondary cases by the Ct value of the primary cases. E.g., primary cases with a Ct value ≥30 account for 34% of the total positive secondary cases within this study.
Figure 3 shows the association between age and transmissibility stratified by the median Ct value (Ct=28). Primary cases with a Ct value below the median (red) were significantly more transmissible than primary cases with a Ct value above the median (blue), across all age groups.
Primary cases aged 15-20 years had the lowest transmissibility. Younger children had a higher transmissibility compared to older children, whereas older adults had a higher transmissibility compared to younger adults.
Figure 4 shows the association between age of the primary case, Ct value and transmissibility. Within each age group the transmissibility increased with decreasing Ct values. Within Ct value intervals, the transmission rate generally increased with the age of the primary case, except for children aged 0-10 years. Adults aged 30+ and children aged 0-10 years generally had a higher transmission risk compared to persons aged 10-30.
Tables 2 and 3 provide odds ratio estimates for the transmissibility from a univariable and a multivariable regression analysis. Primary cases with a Ct value of 18 were approximately four times more transmissible compared with primary cases with a Ct value of 38. Children aged 0-5 years were approximately two times more transmissible compared with persons aged 15-20 years; whereas adults aged 70-75 years were 4-5 times more transmissible compared with persons ages 15-20 years. The transmission risk increased with the number of potential secondary cases, i.e., household size. The transmission rate was significantly higher for two person households compared to 3-6 person households.
3.3 Sensitivity analyses
The trends of the age structured transmissibility were similar across males and females (Figure S3). Boys (<15 years) had a significantly higher transmission risk than girls. We found the same overall transmission pattern when stratifying by Ct value quartiles instead of the median, underlining the robustness of dichotomization by the median (Figure S4). We found that primary cases tested at hospital test facilities generally had slightly higher transmissibility (Figure S5).
Furthermore, when estimating the transmission rate by the interaction of the age of the primary case and the age of the potential secondary cases, we found the same overall transmission pattern across Ct value quartiles (Figure S6). Generally, primary cases with lower Ct values had a higher transmission rate. Also, adult primary cases (>30 years) generally had a higher transmission rate to persons around their own age.
Lastly, in Appendix C, the results of the sensitivity analyses for varying the inclusion criteria of secondary cases are provided. When we excluded more cases identified temporally close to the primary case, we found that the association between Ct values and transmissibility decreased, although the patterns remained the same and were still significant (Figure S7). That is, when we excluded secondary cases identified in the days right after the primary case was identified, we generally exclude secondary cases associated with primary cases that had low Ct values. This is further illustrated in the sensitivity analysis of Figure 2, where the proportion of secondary cases associated with Ct values ≥30 increased from 34% to 38%, when we only included secondary cases identified on days 4-14 (Figure S8). The differences in the age structured transmissibility across primary cases with a Ct value below and above the median were reduced when we excluded secondary cases identified shortly after the primary case was identified (Figure S11 and S12).
4 Discussion and Conclusion
In this nationwide study, we found an approximately linear association between Ct value and transmissibility. Under the assumption that the sample Ct value reflects the viral load when transmission occurred, this suggests that cases with a higher viral load are more infectious than cases with a lower viral load. This was expected, because a high viral load implies that there are more virus particles in the sample, and hence the possibility of a larger inoculum. This finding corroborates previous studies (Marks et al., 2021; Lee et al., 2021a,b), however, the present study also highlights that there is no obvious cut-off for Ct values to eliminate transmissibility. Importantly, we found a considerable transmissibility for cases with high Ct values, e.g., primary cases with a Ct value of 38 had a transmission risk of 13% and a transmission rate of 9% within the household. This is similar to a previous study where virus could be isolated from 8.3% of samples with Ct >35 (Singanayagam et al., 2020) and does not corroborate previous studies that have argued that cases with a Ct value above a certain cut-off are not contagious. A Ct value cut-off of 30, or even lower, has been suggested (La Scola et al., 2020; Bullard et al., 2020; Brown et al., 2020; Prince-Guerra et al., 2021), as this Ct value corresponds to the limit of detection of virus cultures in Vero cells and antigen tests. Whereas this study does not test whether cases a low viral load are contagious (as discussed below, viral load changes over times and obtained sample Ct values depend on technical issues), it shows that choosing a cut-off of a Ct≤30 for infectiousness instead of Ct≤38 would have missed transmission to 34% of the secondary cases in this study of household transmission.
Another main result from this study is that age was strongly associated with transmissibility. We found that younger children were more transmissible compared with older children—both for transmission rate and transmission risk. For adults (≥20 years), we found an over overall positive association between age and transmissibility. However, the transmission risk plateaued for persons aged 30-60 years old. The two associations with transmissibility—age and viral load—were also found when investigating them together in the same analysis (Figure S6 and Table 2 and 3).
This age-related transmissibility pattern could be driven by the susceptibility of the potential secondary cases, as people tend to live with their partner, who is around their own age, and parents live with their children (Lyngse et al., 2020; Madewell et al., 2020). We investigated this and did find that primary cases tend to infect other people around the same age within the household. Moreover, it is likely that the higher transmission from smaller children compared with teenagers is related to the fact that children below 10 years, to a larger degree, need parental contact and nursing compared with teenagers who are more independent and can self-isolate more efficiently. The age-related transmissibility among adults may also be due to increased viral exhalation by age (Edwards et al., 2021) or that the immunological response decreases with age (Long et al., 2020). Further research is needed to clarify this. Heald-Sargent et al. (2020) studied the viral load in 46 cases aged 0-5 years, 51 cases aged 5-17 years, and 48 cases aged 18-65 years. They found indications of high viral loads in children and speculated that they could be a main driver of the epidemic. Our results, which was based on a large sample, contradict this, as we did not find any association between the distribution of Ct values and age (Table S4). To calculate the distribution of Ct values across age groups, it is necessary to include a sample comprising all age groups, as in the present study.
During an infection, the viral load is low shortly after exposure, increases over the infection, peaks around the onset of symptoms, and decreases later on (He et al., 2020). Hence, the Ct value is sensitive to the timing of the test, e.g., when a case is pre-symptomatic compared to later when a case is symptomatic. Furthermore, there is a large variation in both severity of symptoms and infectivity across persons. One study found an indication of lower viral loads in asymptomatic persons (Zhou et al., 2020), while another found no differences across symptomatic and asymptomatic cases (Long et al., 2020). Additionally, the test results depend on the quality of the sampling as well as the assay, i.e., the chosen primers, probe and other reagents, which determines the accuracy of the test, making comparisons across laboratories difficult. In this study, the same methodology, set of primers, and probe were used throughout the whole study period, making the Ct values comparable. Theoretically, it could be possible that the finding of a considerable transmissibility for high Ct values (low viral load) were due to sampling late in the infection and secondary cases were actually infected earlier, when the primary case had a lower Ct value (high viral load). If this was the case, we would find a shift in the estimates for higher Ct values, when we exclude secondary cases identified in the days shortly after identification of the primary case. However, even in the extreme case where we only include secondary cases that tested positive 4-14 days after the primary case, we still find a 6% transmission rate and 8% transmission risk for cases with a Ct value of 38, and 38% of all secondary cases come from cases with a Ct≥ 30. Thus, our results show that primary cases with low viral loads judged by the sample Ct value had a transmissibility significantly above zero, even when we only included secondary cases identified on days 4-14. This indicates that primary cases identified at a time with a low viral load were able to generate secondary cases (Figure S7). Despite the variations mentioned above, we found a clear association between Ct values and transmission risk, emphasizing the importance of this association.
There are several strengths in the present study. Our estimates benefit from a large sample size and a systematic selection of potential secondary cases. We included all household members as potential secondary cases (unconditional on them being contacted by the official contact tracing system), of which 82% were tested within 1-14 days of the primary case. It can be argued that some of the secondary cases might represent cases infected outside of the household. Lyngse et al. (2021) used genomic data (whole genome sequencing) to validate the same methods as used in the present study. They found an intra-household correlation of lineages between primary and positive secondary cases of 96-99%, indicating that most secondary cases were infected with the same lineage as the primary case.
Some limitations apply to this study. We defined the primary cases as the first positive test within a household and all other people living in the same household as potential secondary cases. We defined all secondary cases as those testing positive 1-14 days after the primary case. However, some of these co-primary and secondary cases may be misclassified, e.g., if they were infected earlier but not diagnosed, because they were pre- or asymptomatic. Including secondary cases found >14 days after the primary case could result in misclassification of secondary cases being either tertiary cases or having somewhere else as the source of secondary infections. For a thorough discussion of the definition of co-primary cases, see Lyngse et al. (2020).
Our results are important for public health. Optimal contact tracing naturally has to prioritize the order of new cases and their contacts. Our results suggest that contact tracing should prioritize cases according to Ct values and age. However, it should be noted that primary cases with high Ct values (low viral loads) had a transmissibility significantly above zero and that there is no obvious cut-off of Ct values to eliminate transmissibility. Furthermore, households can potentially serve as a transmission bridge by creating connections between otherwise separate domains, such as schools and workplaces. Lastly, our results underline the importance of quick identification and isolation of cases, especially since non-pharmaceutical interventions are difficult to apply in the household domain.
In conclusion, although primary cases with high viral loads (low Ct values) were associated with higher SARS-CoV-2 transmissibility, we found no obvious cut-off for Ct values to eliminate transmissibility. Even in households where the samples from primary cases had a high Ct value (low viral load), transmission occurred, and these households accounted for a substantial number of secondary cases. We further found a clear increasing association between age and transmissibility. These results are important for public health, as they suggest that contact tracing should prioritize cases according to Ct values and age, and underline the importance of quick identification and isolation of cases. Furthermore, households can serve as a transmission bridge by creating connections between otherwise separate domains.
Data Availability
Data access can be granted through application to the Danish Health Data Authority.
Appendix A:
Summary statistics
Appendix B:
Appendix C:
Sensitivity analyses for definition of secondary cases
In this appendix, we varied the inclusion criteria of secondary cases. In the main results, we included secondary cases that tested positive 1-14 days following the primary case. In this appendix, we vary the inclusion criteria for secondary cases. Panel a includes secondary cases that tested positive 1-14 days after the primary case, and is the same as the main results. Panel b includes secondary cases that tested positive 2-14 days after the primary case. Panel c includes secondary cases that tested positive 3-14 days after the primary case. Panel d includes secondary cases that tested positive 4-14 days after the primary case. Table S5 provides summary statistics on the number of primary cases, potential secondary cases and positive secondary cases for each of the four inclusion criteria.
Appendix D:
Statistical analyses
To estimate the association between Ct values and transmissibility (yp), we estimated the non-parametric regression equation: where Ctp,1 is the sample Ct value (rounded to the nearest integer) of the primary case. β measures the transmission risk for each Ct value. εp denotes the error term, clustered on the household (event) level. The regression was weighted on the primary case level, so each primary case has a weight of one.
To estimate the association between age and transmissibility, stratified by the median Ct value (28), we estimated the non-parametric regression equation: where Agep,5 is the age (in five-year groups) of the primary case. β measures the transmissibility for each five-year age group of the primary cases. εp denotes the error term, clustered on the household (event) level. The regression was weighted on the primary case level, so each primary case has a weight of one.
To estimate the association between Ct value, age, and transmissibility, we estimated the non-parametric regression equation: where Ctp,2 is the Ct value (in bi-value groups) and Agep,10 is the age (in ten-year groups) of the primary case. β measures the transmission risk of the interaction between Ct value and age of the primary case. Ages is the age of the potential secondary cases (s). εp denotes the error term, clustered on the household (event) level. The regression was weighted on the primary case level, so each primary case has a weight of one.
To quantify the increased transmissibility across different observable characteristics, we estimated a univariable and a multivariable logistic regression. In particular, to estimate the odds ratio, we estimated the logistic regression equation with the following linear predictor: where β measures the non-parametric association with Ct values, γ measures the non-parametric association with age of the primary case, φ measures the association with sex, and δ measures the association with the size of the household. εp denotes the error term, clustered on the household (event) level. The regression was weighted on the primary case level, so each primary case has a weight of one.
Footnotes
∗ We thank Statens Serum Institut and The Danish Health Data Authority for data access and helpful institutional knowledge. We also thank the rest of the Expert Group for Mathematical Modelling of COVID-19 at Statens Serum Institut.