Summary
We report the first local transmission of the SARS-CoV-2 Delta variant in mainland China. All 167 infections could be traced back to the first index case. Daily sequential PCR testing of the quarantined subjects indicated that the viral loads of Delta infections, when they first become PCR+, were on average ∼1000 times greater compared to A/B lineage infections during initial epidemic wave in China in early 2020, suggesting potentially faster viral replication and greater infectiousness of Delta during early infection. We performed high-quality sequencing on samples from 126 individuals. Reliable epidemiological data meant that, for 111 transmission events, the donor and recipient cases were known. The estimated transmission bottleneck size was 1-3 virions with most minor intra-host single nucleotide variants (iSNVs) failing to transmit to the recipients. However, transmission heterogeneity of SARS-CoV-2 was also observed. The transmission of minor iSNVs resulted in at least 4 of the 30 substitutions identified in the outbreak, highlighting the contribution of intra-host variants to population level viral diversity during rapid spread. Disease control activities, such as the frequency of population testing, quarantine during pre-symptomatic infection, and level of virus genomic surveillance should be adjusted in order to account for the increasing prevalence of the Delta variant worldwide.
During the global spread of the COVID-19, genetic variants of the SARS-CoV-2 virus have emerged. Some variants have increased transmissibility or could exhibit an increased propensity for escape from host immunity, and therefore pose an increased risk to global public health1–3. An emerging genetic lineage, B.1.617, has gained global attention and has been dominant in the largest outbreak of COVID-19 in India since March 2021. One descendent lineage, B.1.617.2, which carries spike protein mutations L452R, T478K and P681R, accounts for ∼28% sequenced cases in India and has rapidly replaced other lineages to become dominant in multiple regions and countries (https://outbreak.info/)4. Lineage B.1.617.2 has been labeled a variant of concern (VOC) and given the name Delta (https://www.who.int/activities/tracking-SARS-CoV-2-variants). Data on the virological profile of the Delta VOC is urgently needed.
On May 21, 2021 the first local infection of the Delta variant in Guangzhou, Guangdong, China was identified. As of the early epidemic in China in January 20205, a suite of comprehensive interventions have been implemented to limit transmission, including population screening, active contact tracing, and centralized quarantine/isolation. However, in contrast to the limited level of onward transmission observed in Guangdong in early 20205, successive generations of virus transmission were observed in the 2021 outbreak of the Delta variant in the region. Here, we investigated epidemiological and genetic data from the well-traced outbreak in Guangdong in order to characterize the virological and transmission profiles of the Delta variant. We discuss how intervention strategies may need to be adjusted to cope with the virological properties of this emerging variant.
Results
A total of 167 local infections were identified during the outbreak, starting with the first index case identified on May 21, 2021 and ending with the last case reported on June 18, 2021 (Figure 1a). All cases could be epidemiologically or genetically traced back to the first index case (Figure 1b). One notable epidemiologic feature of the Delta variant is a shorter serial interval compared with to infection with early Wuhan-like strains or other VOC variants6–8. However, critical parameters before the illness onset remain poorly known, including when the viruses can be first detected in a subject after exposure, and how infectious infected individuals are.
We investigated the data from the quarantined subjects in this outbreak and compared it to data from the early 2020 epidemic caused by A/B genetic clade (Pango nomenclature9) strains. The centrally-quarantined subjects were the close contacts of confirmed cases. Once a new infection was identified, his/her close contacts were immediately traced, centrally isolated, and underwent daily PCR testing. The dataset from quarantined subjects allowed us to determine the time interval in the infected subjects between exposure and when viral loads were first detectable by PCR. The exact exposure time for the intra-family transmissions was difficult to pinpoint, hence we removed intra-family transmission pairs from our time interval analysis. Our results revealed that the time interval from exposure to the first PCR+ test in the quarantined population was 6.00 days (IQR 5.00-8.00) during the 2020 epidemic (n=29; peak at 5.61 days) and 4.00 days (IQR 3.00-5.00) in the 2021 Delta epidemic (n=34; peak at 3.71 days; Figure 1c).
We next evaluated viral load measurements at the time when SARS-CoV-2 was first detected by PCR in each subject. The relative viral loads of cases infected with the Delta variant (n=62, Ct =24.00 for the ORF1ab gene, IQR 19.00∼29.00) were 1260 times higher than those for the 2020 infections with clade 19A/19B viruses (n=63, Ct = 34.31 for ORF1ab gene, IQR 31.00∼36.00) on the day when viruses were first detected (Figure 1d). We hypothesized a higher within-host growth rate of the Delta variant, which led to the higher observed viral loads once viral nucleotides exceeded the PCR detection threshold (Figure 1e). Similar to results reported by Roman et.al., we found that samples with Ct > 30 (<6×105 copies/mL viruses) did not yield an infectious isolate in-vitro. For the Delta variant infections, 80.65% of samples contained >6×105 copies/mL in oropharyngeal swabs when the viruses were first detected, compared to 19.05% of samples from clade 19A/19B infections. These data indicate that the Delta variant could be more infectious during the early stage of the infection (Figure 1e).
Individuals undergo a latent period after infection, during which viral titers are too low to be detected. As viral proliferation continues within host, the viral load will eventually reach detectable levels and the individual will become infectious. Knowing when an infected person can transmit is essential for designing intervention strategies that break chains of transmission. However, infectiousness is difficult to measure from clinical investigations since >50% of transmission occurs during the pre-symptomatic phase10. Our investigation of quarantined subjects suggests that, for the Delta variant, the time window from exposure to the detection of virus was ∼3.7 days, and infections presented a higher transmission risk when the virus was first detected compared to earlier circulating viral lineages. Consequently, the provincial government required people leaving Guangzhou city from airports, train stations and shuttle bus stations to show proof of a negative COVID-19 test within 72 hours on June 6 and this was shortened to 48 hours on June 7. In contrast, the comparable time window implemented in the 2020 epidemic was seven days.
Transmission bottleneck and the association between minor iSNVs transmissions and viral population diversity
The non-pharmaceutical interventions in Guangdong mainly focus on epidemiological investigation, contact tracing and mass testing. Approximately 30 million PCR tests were performed between May 26, 2021 and June 8, 2021. The intense testing and screening of high-risk populations makes cryptic transmissions unlikely. Nearly all the infections we identified could be connected epidemiologically, either through evidence of direct contact, or indirectly (staying in or visiting the same area) (Figure 1b). In addition, all sequences could be genetically traced back to the index case. This provided a unique opportunity for us to characterize virus transmission dynamics at a finer scale, particularly the extent to which virus genetic diversity is transmitted among hosts. Whole-genome deep sequencing was performed on all identified infections, and 126 high-quality viral genomes (coverage>95%) were obtained, comprising 75% of identified infections in the outbreak (Figure 1a).
Phylogenetic analysis was performed by combining the virus genomes we obtained from the Delta outbreak with genomes from 346 imported cases; the latter represent travelers to Guangdong during March 2020 to June 2021 who arrived from 66 different source countries. We also included a set of reference sequences, comprising 50 genomes randomly selected from each of 13 defined NextStrain clades (https://nextstrain.org/) and the notified VOCs (Alpha, Beta, Gamma, Delta). The viral lineage distribution of the imported cases was approximately representative of the SARS-CoV-2 genetic lineages that were circulating at that time at the global scale. These importations pose a challenge for disease control and prevention in Guangdong, China (Figure 2a).
Viral phylogenies of the Guangzhou outbreak were inferred using the assembled consensus sequence of each sample, which was generated by choosing the majority-frequency nucleotide (>50%) at each position. All Guangzhou outbreak sequences segregated into a single cluster (Figure 2a). Compared with the index case (5137) of the outbreak, 30 substitutions were identified among 125 cases during the 26-days long outbreak (Figure 2b). The most genetically-divergent outbreak sequence contained four nucleotide differences from the index case sample. To understand how these variants emerged, grew and finally fixed during the epidemic (and during the SARS-CoV-2 pandemic more generally), we estimated within-host virus diversity for each sample by mapping polymorphic sites against the consensus genome of the index case (XG5137_GZ_2021/5/21), thereby generating a list of intra-host single-nucleotide variants (iSNVs). Minor iSNVs were called by setting 3% as the threshold for minor allele frequency, in order to exclude potential PCR and sequencing errors 11–13. For 126 high-quality sequences, most samples harbored 3 iSNVs (median) which is consistent with other reported levels (Supplemental Figure) 11,12.
We calculated the transmission bottleneck size among epidemiologically-confirmed transmission pairs. Contact tracing and epidemiological investigation enabled us assign 111 donor-recipient transmission pairs with a high degree of confidence. Of these, the donor had one or more iSNVs above the variant calling threshold of 3% in 74 transmission pairs (Table S1), enabling estimation of the transmission bottleneck size, Nb, using the beta-binomial method 14. The maximum likelihood estimate for Nb was one for 65 out of these 74 transmission pairs, and two or three for the remaining 9 transmission pairs (Figure 2C). Uncertainty in the Nb estimate was large for some transmission pairs, with the 95% confidence interval ranging from 1 to ∼500 or more, suggesting for some pairs the sequencing data was not sufficiently informative. Our data suggest the transmission bottleneck of SARS-CoV-2 is very narrow in general, consistent with the previous household transmission studies 11,15. The transmission bottleneck size influences the extent to which within-host diversity contributes to viral diversity at the population scale. The stringent transmission bottleneck of SARS-CoV-2 suggests the substitutions we observed in Guangdong outbreak (and SARS-CoV-2 pandemic more generally) largely resulted from de-novo mutations appearing within individuals.
Although the transmission bottleneck of SARS-CoV-2 is narrow in general, it may be not constant and could be impacted by both viral and host factors. To investigate the contribution of the transmission of minor iSNV to population-level diversity, we identified the sequences with minor iSNVs and the sequences in which the derived nucleotide state was fixed. Notably, sequences exhibited minor intra-host single nucleotide variants (iSNVs) at 10 of the 30 variant sites (positions that varied from the sequence of the first index case) (Figure 2b). The direct (transmission pair 61, 1, 2, 3) and indirect (from case 6190 to 6486) epidemiological links were observed between the hosts with the minor iSNVs and their potential recipients with these iSNVs fixed (Figure 2C). Therefore, at least three fixed substitutions in this outbreak could be traced to the direct transmission of minor iSNVs, and one substitution was from a suspicious transmission chain. It is also noteworthy that the transmission pairs with 5137 as the donor had a relatively higher estimated Nb, suggesting heterogenicity in iSNV transmission (Figure 2c). The differences in bottleneck size are possibly due to the different transmission route or exposure doses, as has been observed for influenza 16. The case 5137 presented a high viral load (Ct value of 17.6, approximate 2×109 copies/mL in oropharyngeal swabs) 2 days after their direct contact with the cases 5645 and 5571. The high viral loads, direct contacts and relatively high frequency of the iSNVs (4% for T21673C and 47% for C27086T) may have enabled the successful transmission of iSNVs to the recipients (Figure 2C). Taken together, our observations suggest that the transmission bottleneck of SARS-CoV-2 is stringent in general, with most donor iSNVs not found in the recipients. However, transmission of minor iSNVs, with their fixation in the recipient host, resulted in at least some of the substitutions that accumulated during the outbreak.
In this study, we characterized a large transmission chain that originated from the first local infection of the SARS-CoV-2 Delta variant in mainland China. We find evidence for a potentially higher viral replication rate of the Delta variant, as viral loads in Delta infections are ∼1000 times higher than those for clade 19A/19B infections on the day of the first PCR+ test. This suggests that infectiousness of Delta variant during the early stage of infection is likely to be higher. Consequently, the frequency of population screening should be optimized 17. If Delta infections are indeed more infectious during the pre-symptomatic phase, then timely quarantine (before clinical onset or PCR screening) for suspected cases or for close contacts becomes more important. Although the transmission bottleneck of SARS-CoV-2 is narrow in general, heterogenicity of minor iSNV transmission is observed and explains some of the fixed substitutions observed in the virus population during the outbreak. In some settings, the advantageous iSNVs that are present at a low frequency could rise and become fixed in the one generation of transmission, and further predominance in the virus population if the epidemic is not well contained.
Methods
Ethics
This study was approved by the institutional ethics committee of the Guangdong Provincial Center for Disease Control and Prevention (GDCDC). Written consent was obtained from patients or their guardian(s) when samples were collected. Patients were informed about the surveillance before providing written consent, and data directly related to disease control were collected and anonymized for analysis.
Sample collection, clinical surveillance and epidemiological data
Since the first local SARS-CoV-2 infection reported on May 21 in the capital city of Guangdong, the enhanced surveillance was performed by Guangdong CDC and local CDCs to detect suspected infections. Epidemiological investigations had been done on all confirmed cases. Population screening were performed by third-party detection institutions. Once virus positive samples were confirmed by local CDCs or other institutions, the samples were required to send to Guangdong CDC in 24 hours. To make the results comparable, in Guangdong CDC, the real-time reverse transcription
PCR (RT-PCR) were performed by using the same commercial kit (DaAn Gene) and RT-PCR machine (CFX96) as the previous studies5,18. The exposure history for positive cases and their close contacts were obtained through an interview, public video monitoring systems and cell phone apps, etc. Information regarding the demographic and geographic distribution of SARS-CoV-2 cases can be found at the website of Health Commission of Guangdong Province (http://wsjkw.gd.gov.cn/xxgzbdfk/yqtb/). The surge population screening test ensure all possible infections were identified and 111 donor-recipient transmission pairs were assigned with very high confidence. All transmission pairs met the following rules: 1. The recipient was the close contract of the donor and had a clear and direct epidemiological link to the donor; 2. The recipient did not have any contacts with other identified cases.
Virus amplification and sequencing
Total RNAs were extracted from oropharyngeal swab samples by using QIAamp Viral RNA Mini Kit (Qiagen, Cat. No. 52904). Virus genomes were generated by two different approaches, (i) using commercial sequencing kit of BGI (ATOPlex 1000021625) and sequencing on the BGI MGISEQ-2000 (n=25), and (ii) using version 3 of the ARTIC COVID-19 multiplex PCR primers (https://artic.network/ncov-2019) for genome amplification, followed by library construction with Illumina Nextera XT DNA Library Preparation Kit and sequencing with PE150 (n=63) or SE100 (n=38) on Illumina Miniseq. We report only high-quality genome sequences for which we were able to generate >95% genome coverage.
Sequence analysis
The bioinformatics pipeline for BGI platform (https://github.com/MGI-tech-bioinformatics/SARS-CoV-2_Multi-PCR_v1.0) was used to generate consensus sequences and call single nucleotide variants relative to the reference sequence. For sequence data from Miniseq, the raw data were first quality controlled (QC) using fastp19 to trim artificial sequences (adapters), to cut low-quality bases (quality scores <□20). PCR primers were trimmed by using cutadapt version 3.120 or other published method21. Since all infections could be traced back to the first index case, the cleaned reads of each sample were mapped against the genome of the first index case (5137_GZ_2021/5/21) using BWA 0.7.1722. The consensus sequences were determined with iVar 1.2.123, taking the most common base as the consensus (allele frequency >50%). An N was placed at positions along the reference with the sequencing depth fewer ≤ 10. The surge population screening test ensure all possible infections were identified and through the contact tracing the donor-recipient transmission pairs could be assigned with high confidence. To characterize the viral transmission in these pairs, we identified iSNVs relative to the reference genome (XG5137_GZ_2021/5/21) for each sequence with iVar 1.2.1 using the following parameters: alternated frequency at a SNV site ≥ 3%; total sequencing depth at SNV site ≥ 100; sequencing depth for the variant allele ≥ 10; iVar PASS=TRUE. We exclude the head and tail sequences of viral genome (corresponding to the positions 1 to 100 and 29803 to 29903 in Wuhan-Hu-1 reference genome) due to the lower sequencing coverage for most samples in the analysis and the 7 “highly shared” iSNV sites (1959, 4091, 21987, 24404, 28448, 28389, 29681) possibly due to the contamination of the primer sequences or mapping errors 11. To infer the iSNVs transmission in 74 donor-recipient pairs, all sites with ≥3% minor allele frequency in the assumed donor were used in the analysis. In the recipient, all reads at these sites were considered, with a variant calling threshold of 3% using the beta-binomial method of Sobel Leonard et.al14. The nextstrain pipeline24 was used to analyze and visualize the genetic distribution of SARS-CoV-2 infections and its dynamic change in Guangdong between January 2020 and June 2021. Maximum likelihood (ML) tree was estimated with phyml25 using the HKY+Q4 substitution model with gamma-distributed rate variation26. The branch length was recalculated as the number of mutations to the reference sequence of the first index case. The tree was visualized with R package of ggtree27.
Data Availability
All sequencing reads after primer trimming and mapped to the reference sequence (the sequences of the first index case, XG5137_GZ_2021/5/21) have been submitted to the National Genomics Data Center (https://bigd.big.ac.cn/) with submission number CRA004571. The generated consensus sequences were submitted with accession number GWHBDIM01000000-GWHBDNH01000000.
Data availability
All sequencing reads after primer trimming and mapped to the reference sequence (the sequences of the first index case, XG5137_GZ_2021/5/21) have been submitted to the National Genomics Data Center (https://bigd.big.ac.cn/) with submission number CRA004571. The generated consensus sequences were submitted with accession number GWHBDIM01000000 – GWHBDNH01000000.
Code availability
The pipeline for sequencing data analysis was deposit in https://github.com/Jinglu1982/Delta-variant-outbreak-in-GZ. Code to implement the beta-binomial method is publicly available14.
Competing interests
The views expressed in this article are those of the authors and not necessarily those of the Guangdong Provincial Center for Diseases Control and Prevention, or the Guangdong Provincial Institute of Public Health.
Acknowledgements
We gratefully acknowledge the efforts of China national CDCs, Guangdong local CDCs, hospitals, and the third-party detection institutions in epidemiological investigations, sample collection, and detection. This work was supported by grants from Science and Technology Planning Project of Guangdong (2018B020207006), the Key Research and Development Program of Guangdong Province (2019B111103001), and Guangdong Workstation for Emerging infectious Disease Control and Prevention, Chinese Academy of Medical Sciences (2020-PT330-004).
Footnotes
↵# Joint first authors.
We report the first local transmission of the SARS-CoV-2 Delta variant in mainland China. All 167 infections could be traced back to the first index case. Daily sequential PCR testing of the quarantined subjects indicated that the viral loads of Delta infections, when they first become PCR+, were on average ~1000 times greater compared to A/B lineage infections during initial epidemic wave in China in early 2020, suggesting potentially faster viral replication and greater infectiousness of Delta during early infection. We performed high-quality sequencing on samples from 126 individuals. Reliable epidemiological data meant that, for 111 transmission events, the donor and recipient cases were known. The estimated transmission bottleneck size was 1-3 virions with most minor intra-host single nucleotide variants (iSNVs) failing to transmit to the recipients. However, transmission heterogeneity of SARS-CoV-2 was also observed. The transmission of minor iSNVs resulted in at least 4 of the 30 substitutions identified in the outbreak, highlighting the contribution of intra-host variants to population level viral diversity during rapid spread. Disease control activities, such as the frequency of population testing, quarantine during pre-symptomatic infection, and level of virus genomic surveillance should be adjusted in order to account for the increasing prevalence of the Delta variant worldwide.