ABSTRACT
Using high-throughput sequencing of the SARS-CoV-2 genome, we have analyzed 2405 PCR-positive swab samples from 2339 individuals and identified the Omicron BA.2.3.7 variant as a major lineage of the recent community outbreak in Taiwan. Since April 2022, a series of new waves of Omicron cases have surfaced in Taiwan and spread throughout the island. We conducted genomic surveillance with a high success rate and have submitted 1966 full-length Omicron sequences to GISAID. This has permitted identification of signature amino acid changes (ORF1a:L631F, S:K97E, N:M322I) in 1584 (80.6%) of the translated SARS-CoV-2 sequences. The newly established BA.2.3.7 lineage, which is relatively common in Asian countries, is now persistently present in Taiwan. By June 2022 this dominant strain had established a substantial existence in the sequence pool, resulting in additional mutations. The rapid spread and expansion of the Omicron BA.2.3.7 lineage in Asia has had an important socioeconomic impact on health policy.
INTRODUCTION
The coronavirus disease 2019 (COVID-19) pandemic, caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), originated in the People’s Republic of China (PRC) in late 2019, probably in the city of Wuhan(1–3). The outbreak of this unusual respiratory disease led to a wide variety of different responses by various countries across the world(4–7). The response of Taiwan was rapid and based on both its proximity to the PRC and its experiences during the SARS pandemic almost two decades earlier(6, 8, 9). The introduction of almost complete travel restrictions on incoming air and sea passengers, long compulsory quarantine periods for the few residents who were allowed to enter Taiwan and a very wide acceptance by the population of social distancing, temperature checks, mask wearing, etc., in all social situations, meant that, unlike many other countries, the entry of the COVID-19 pandemic was delayed significantly(6, 10, 11). Until April 2022, there were only limited breakthroughs, all of which were quickly contained; these were associated with cases detected among the very limited number of travelers entering the country who were hospitalized on being identified as being infected and airline crews. Hence, Taiwan provides a unique opportunity to explore what happened when the Omicron variant evaded the controls put in place by the Taiwan government and began to spread though the country’s population. The Taiwanese population had not been exposed on a large scale to any of the virus variants before Omicron and by the time the virus began to spread widely in March/April 2022, a significant percentage of the population had been vaccinated. As a result, an in-depth exploration of the virus using high-throughput sequencing as it spread provides new insights into how a novel pandemic virus enters and spreads in a mostly naïve population. In the current study, we have collected a large numbers of Omicron sequences from domestic cases and applied phylogenetic analysis to gain insights about the introduction of the new Omicron variants into Taiwan and the evolution of genomic sequences during rapid expansion of the viral pool over a time period of 15 weeks (from February 22nd to June 6th).
METHODS
We applied Illumina COVIDSeq amplicon sequencing technology to the whole- genome analysis of SARS-CoV-2(12). RNA was extracted from virus-inactivated swab samples. Complementary DNA synthesis was carried out using the reagents provided by the manufacturer, by which the SARS-CoV-2 genome underwent reverse transcription and was then amplified in 98 overlapping amplicons, together with appropriate human controls. In the finally optimized procedure, we processed 384 samples at a time, and analyzed the pooled libraries on a single lane of the S4 chip using the NovaSeq6000 instrument.
Patients and RNA extraction
Cases were recruited from National Taiwan University Hospital Hsinchu Branch (NTU-HCH). Samples for RNA extraction were collected in 3 mL sterile viral transport medium (VTM) tubes and consisted of 2405 nasopharyngeal swabs belonging to 2339 patients. RNA was prepared by automated extraction using the TANBead Nucleic Acid Extraction Kit REF M665A46 (Taiwan Advanced Nanotech Inc.) and the QIAsymphony SP protocol (QIAGEN). This study was reviewed and approved by the Research Ethics Committee (110-110-E) of NTU-HCH.
COVIDseq
We carried out the sequencing using Illumina COVIDSeq Test kits (RUO version) according to the manufacturer’s instructions. The workflow consists the following steps: cDNA synthesis, then virus target amplification using V3 nCov-2019 primers, followed by library preparation and library pooling. Subsequently, 98 targets on SARS-CoV-2 and 11 human fragments, which acted as controls, were analyzed on a NovaSeq 6000 instrument by using 2x151-bp paired-end reads. Next, we used Illumina DRAGEN COVID Lineage app version 3.5.9 (base on pangolin 4.1.2 pangolin-data 1.12 and NextClade 1.11.0) in the BaseSpace Sequence Hub for rapid analysis.
Phylogenetic analysis
A set of 1966 NTU-HCH sequences were deposited to the Global Initiative on Sharing All Influenza Data (GISAID)(13) with the following epi accession numbers: EPI_ISL_14192849 to EPI_ISL_14192840, EPI_ISL_14191496 to EPI_ISL_14191488, EPI_ISL_14191320 to EPI_ISL_14190364, and EPI_ISL_14190353 to EPI_ISL_14189364. To investigate the global transmission, 249 BA.2.3.7, 282 global, and 403 additional sequences from Taiwanese were retrieved from GISAID as of July 2022. A total of 2847 sequences including 1966 from NTU-HCH and 881 reference sequences (Supplementary Data: S Table 1) that had met the quality standard were used for the analysis. The phylogenetic tree was constructed using Nextstrain CLI (command-line interface) (version 3.2.5)(14) and visualized with Auspice (https://auspice.us/).
RESULTS
As shown in Figure 1, very few COVID-19 cases occurred in Taiwan during 2020 and 2021. Clustered infections were reported in May and June 2021, mainly in northern Taiwan. Even at their peak, only hundreds of positive cases were recorded by Taiwan’s Centers for Disease Control (CDC) (Insert of Figure 1). At the beginning of 2022, a number of Omicron infection clusters were noted first in northern Taiwan, and very soon new cases began to exceed fifty thousand per day with the outbreaks affected the entire country (https://nidss.cdc.gov.tw/nndss/disease?id=19CoV). To gain insights about the community transmission of the infection and to monitor viral evolution, we established a genomic surveillance program at National Taiwan University Hospital Hsinchu Branch (NTU-HCH) and conducted whole-genome sequencing using left-over nasal swab samples that had been detected by PCR to be positive.
This figure was constructed using the publicly available data of Taiwan CDC (https://nidss.cdc.gov.tw/nndss/disease?id=19CoV) to show the relative size of community outbreaks of COVID-19 in 2021 and 2022. Genomic surveillance was conducted in five batches to sequence the SARS-CoV-2 viral samples, and the height reflects the number of cases analyzed for each batch. The 2021 outbreak of Alpha lineage was relatively small and limited to cluster infection within New Taipei City (shown in the insert), while the outbreak of the Omicron lineage in April-June of 2022 exceeded 50,000 per day at peak level.
Pilot phase of genomic surveillance for COVID-19
In order to conduct genomic surveillance of SARS-CoV-2, we have adopted the COVIDSeq procedure (Illumina). Initially we used the ultrashort read (2 x 51 bp) approach in order to obtain real-time information. Later we modified the procedure and applied the libraries created from 384 samples onto a single lane of an Illumina S4 chip, which typically was able to generate over 700 Gb data. In principle, this sequencing depth should give almost 60,000-fold coverage of the virus’ 29,932 bp genome. During the pilot phase when we were establishing the genomic surveillance program, we analyzed 293 samples from 228 patients in three batches (Figure 1, batches 1-3), and most of these cases had been collected during 2021 from three hospitals in northern Taiwan (Supplementary Data, S Table 2). From this limited number of cases that were associated with various small community outbreaks in Taiwan (Insert of Figure 1), we found that these sequences mainly belonged to the Alpha lineage (B.1.1.7), and they included sequences from a cluster infection affecting a factory near our hospital.
Serial Investigation of SARS-CoV-2 in a medical center
Beginning in 2022, we gradually expand the application of the established genomic surveillance protocol to conduct a series of investigation on samples collected at NTU-HCH. From end of February to beginning of June, there was a huge surge in cases due to infection with Omicron variants of SARS-CoV-2 (Figure 1). These samples became the major source providing the genome sequences for the current study. There were in total 2112 PCR-positive swab lysates analyzed and these consisted of one set of 192 samples (batch 4) and five sets of 384 samples (batch 5) (Figure 1 and Supplementary Data, S Table 2). Except for one case, all subjects provided one PCR+ sample for the viral genome analysis. Clinical information on the 1920 subjects recruited for batch 5 at NTU-HCH is readily available, and the general characteristics of these COVID-19 infected individuals are shown in Table 1. Almost an equal number of males and females were recruited, and, among them, 103 and 430 individuals had an age below 2 and an age between 2 and 20, respectively. Six hundred and eight (31.7%) of the patients had been admitted to NTU-HCH for treatment, while 83 (4.3%) of them were symptom free. As expected, the young age group (below 2 years) and the old age group (over 60 years) were found to form larger proportion of the hospitalized patients.
Characteristics of the COVID-19 cases recruited for the current study
Success rate when calling a variant
The established protocol was highly efficient at generating a large amount of genomic information that allowed the identification of the viral lineage of each individual sample; furthermore, it also allowed reliable genome assembly for subsequent phylogenetic analysis. For practical purposes, we defined a successful call if the DRAGEN COVID Lineage software (v3.5.9) returned identifiable lineage information for a sample. Combining the pilot phase (batches 1-2 of 2021) and the serial investigation phase (batches 3-5 of 2022), in total we have performed 2405 genomic analyses and determined SARS-CoV-2 lineage for 2368 individuals (success rate, 98.5 %). Of the samples that failed to generate reliable sequence, the majority had a Ct value of 29 or greater. Only four samples with a Ct value of 28 or less failed to generate a reliable lineage information. Thus, the overall success rate of calling a lineage was largely determined by the Ct value, and perhaps, to some extent, the storage time and condition of the sample. For the last five batches of 384 samples each, the majority were processed within 4 weeks and had Ct values of 28 or less; these had a success rate of 99.7% (1914/1920).
GISAID submission
Specifically, we have only submitted genomic data to GISAID on those sequences that had a greater than 98% coverage of the 29903 bp SARS-CoV-2 target genome. The same set of high-quality sequences were used for tracking the signature mutations in the viral samples (Figure 2) and for phylogenetic analysis (Figure 3). Overall, 2424 samples of all five batches generated 2024 sequences (84.2% pass rate) that met the above criterion. For the last five batches of 384 samples, 1720 of the 1920 sequences was selected (89.6 % pass rate) for GISAID submission (Supplementary Data, S Table 3). These samples were obtained between April 26th and June 6th, during the peak of the present Omicron outbreak.
2A. Mutations in the coding sequences of the Omicron lineages. The mutations identified in this study are highlighted in red. RBD: the receptor binding domain of the spike (S) protein.
2B. Proportion of the different signature mutations in the period of February 22nd to June 6th, 2022. Note that there was a sharp increase of the Omicron variant 2.3.7 from week 4 onwards, and the proportion of this Omicron variant remained high for at least ten weeks. The three signature mutations (S:K97E, N:M322I, ORF1a:L631F) of BA.2.3.7, denoted by filled diamonds, filled triangles, and open circles, are completely overlapped. A steady increase of sequences positive for S:G1251V was noted from week 6 and reached plateau at week 10 (April 26th-May 2nd). In the period of week 11 (May 3rd-May 9th), week 12 (May 10th-May16th), week 13 (May 17th-May 23rd), week 14 (May 24th-May 30th), and week 15 (May 31st- June 6th), the percentage of ORF1a:T3224A decreased; the percentage of ORF1a:I1091T increased over the same time period.
3A. We used 1966 sequences from NTU-HCH and 881 sequences from GISAID as a framework for constructing the phylogenetic tree.
3B. We analyzed 1584 BA.2.3.7 sequences submitted to GISAID from the current study together with 228 BA.2.3.7 sequences deposited to GISAID by other groups. The position of signature mutations are indicated; three (S:K97E, N:M322I, ORF 1a:L631F) are located at the origin of the B.A.2.3.7 lineages. S:G1251V is mapped at a major branch in the upper trunk of the tree, under which three minor branches can be defined by mutations in ORF1a:T3224A in blue, ORF1a:A591V and ORF1a:V3683I in purple, and ORF1a:I1091T in orange. The origins of the BA.2.3.7 strains and their position in the tree are indicated. An arrow in red denotes the index case collected from Taiwan on March 27th, 2022.
Signature mutations of the BA2.3.7 lineage
We analyzed the assembled viral genome sequences and searched for significant amino acid changes in the Omicron samples collected in 2022 (Supplementary Data, S Table 4). Comparing the later three data sets (batch 3 to batch 5), we discovered that three amino acid changes (ORF1a:L631F, S:K97E, N:M322I) occurred only after the fourth sequencing batch. All the Batch 3 isolates belonged to the BA.1 or BA.2 classification. This suggests that the rapid increase of cases in Taiwan in April and May, from previously zero cases per day, to almost reaching a hundred thousand per day, came from a strain that was evolving in Taiwan and showed either advantageous viral replication or advantageous transmissibility. The percentage of sequences containing the signatures progressed steadily from 62 % in batch 4 to 85% in batch 5.
Of the three signature mutations shown in Figure 2A, the L631F mutation in ORF1a and K97E mutation in the S protein are located within nsp2 and the N- terminal domain of S1, respectively. The third signature mutant of variant BA.2.3.7 is a methionine to isoleucine substitution at amino acid 322 (M322I) of the N protein.
Both methionine and isoleucine are nonpolar amino acids. Previous reports have suggested that M322I could bring about a slight decrease in N protein thermal stability(15) and might affect the host immune response to COVID-19 viral infection(16).
Evolution of the Omicron BA.2.3.7 lineage
Serial investigation of the collected samples by whole-genome sequencing allowed us to monitor the continuous evolution of the Omicron BA.2.3.7 lineage, as it became the dominant strain of Taiwan (Figure 2B). In the 15-week period of February 22nd to June 6th, there has been a sharp increase of the Omicron variant BA.2.3.7 at week 5 (March 23th to March 29th) and the proportion of this variant remained at over 80% level for ten weeks (Figure 2B). Of note, a steady increase of sequences positive for S:G1251V started at week 6 and reached plateau at week 10 (April 26th-May 2nd). By contrast, over the time period week 10 to week 15, the percentage of ORF1a:T3224A decreased, while that of ORF1a:A591V and ORF1a:V3683I increased, suggesting that these sublineages might have different transmission efficiencies. Moreover, a new mutation, ORF1a: I1091T, was detected during week 13 and the incidence had increased substantially by weeks 14-15.
Phylogenetics of the SARS-CoV-2 isolates
To construct the framework of the phylogenetic tree, first we took 1966 genome sequences from this study and analyzed them in the global context of 872 GISAID reference sequences (Figure 3A). On this basis, we then zoomed in and compared the 1584 Omicron sequences of the current study against the 228 Omicron BA.2.3.7 strains from the GISAID database (Figure 3B). These sequences had been reported from 21 countries, including 51 from Taiwan (Supplementary Data, S Table 1 and 5).
We interpreted the phylogenetic analysis in the context of Pango-dynamic nomenclature system(16). On May 16th, an issue was raised regarding a BA.2.3 sublineage with S:K97E (https://github.com/cov-lineages/pango-designation/issues/643). It was suggested that “this new sublineage” might have a very high prevalence in Taiwan, and it might be descended from a mainly Vietnamese branch with ORF1a:L631F and N:M322I. Later, BA.2.3.7 was designated, and as of July 22nd, there have been 222 sequences in the BA.2.3.7 lineage detected since it was identified on February 27th (Lineage Report, https://outbreak.info/situation-reports?pango=BA.2.3.7).
It is evident that this novel lineage BA.2.3.7 with three amino acid changes (ORF1a:L631F, ORF1a:S:K97E, and N:M322I) was circulating dominantly in Taiwan over the study period (Figure 3B, BA.2.3.7 branch, marked in red). Notably, the first BA.2.3.7 strain identified in the epidemic in Taiwan was collected on March 27th, 2022, and, since that point, we have already detected several genomic changes of this new Omicron lineage. For example, a new mutation, G1251V (Figure 3B, marked in green) in Spike protein, was detected from April onwards, and this particular circulating lineage has rapidly spread across Taiwan.
Persistent dominance of Omicron BA.2.3.7 lineage
While this paper was in preparation, we became aware that a number of viral sequences with the same signature mutations had been reported from Taiwan (https://outbreak.info/situation-reports?pango=BA.2.3.7&loc=USA&loc=USA_US-CA&loc=TWN&selected=TWN&overlay=false). Even though the number was relatively small (51 cases) as compared to our serial investigation at a medical center, the four locations in Taiwan of the reported cases are different from our collection point at the hospital, indicating that this new lineage was detectable broadly across Taiwan. Based on this distribution, it is very likely that the BA.2.3.7 lineage is now the dominant variant throughout Taiwan.
Notably, over the same time period, other Asia Pacific countries (Hong Kong/PRC, Indonesia, Singapore, Vietnam, Japan, Thailand, Philippines, New Zealand, and Malaysia) have also reported significant cumulative prevalence of the BA.2.3.7 variant (Supplementary Data, S Table 5). Among the 44 Omicron BA.2.3.7 strains reported from Japan, two of the affected individuals had travel history to Vietnam and 41 to Taiwan, suggesting significant silent outward transmission events from Taiwan. By way of contrast, BA.2.3.7 forms less than 0.5% of the sequences reported in either California, USA, or globally.
DISCUSSION
In the Western world, where there have been less restrictions on travel and people have shown poorer adherence to wearing facial masks, COVID-19 outbreaks have occurred in several waves as SARS-CoV-2 evolved. These seem to have been, originally associated in many cases with persistent infections in immunocompromized patients, they then, have adapted to the human populations in the various countries.
From 2020 to 2022, there have been several changes in the dominant viral lineages, with new more transmissible strains replacing previous prevailing lineages.
The entry of the Omicron variant into Taiwan in early 2022 was unexpected as the track-and-trace system put in place from January 2020 onwards had been very successful with only a limited number of community outbreaks in 2021, each of which was traced to source and controlled. In early 2022, vaccination against SARS- CoV-2 in Taiwan had reached nearly 90 % (for at least one dose) of the 23 million population (covid-19.nchc.org.tw; Taiwan CDC data). This population is highly urbanized with about a quarter of the population living in the Greater Taipei Area. So as Omicron began to spread, it was within a naïve but vaccinated population.
Taiwanese have shown high levels of compliance with social distancing and mask wearing, so the spread of Omicron was in a unique environment, unlike Europe, the USA and Canada, South America, Africa or even New Zealand. The Omicron variants detected in Taiwan are likely to be those that have evolved to escape the specific conditions found in Asia; this means that these “Asian” strains have undergone selection such that they are highly transmissible, produce silent infections without symptoms and have the ability to escape the vaccines used in Taiwan. In conjunction with these phenotypes, another possibility is that the ethnicity of Taiwan’s population may have played a role. There is believed to have been a coronavirus outbreak in Asia more than 20,000 years ago (17) and the selection imposed by this may have changed the genetics of the Taiwan population.
The failure to control the 2022 Omicron outbreak may rest on the fact that there were multiple and almost simultaneous entry of different SARS-CoV-2 variants, unlike earlier breakthrough events, and that many of these infections were silent. The often-silent nature of Omicron would have allowed the development of community outbreaks beyond the available control measures. Notwithstanding the possibility of multiple entry events into Taiwan of the BA.2.3-related variants, there seems to have significant diversity of the viral strains circulating in Taiwan. Importantly, the large numbers of quality sequences we have collected contribute to the identification of some less frequent mutations (ORF1a: A591V, ORF1a: T3224A, ORF1a: V3683I) within the BA.2.3.7 lineage (Supplementary Data, S Table 4). It is likely that further evolution of the BA.2.3.7 lineage in Taiwan has occurred (Figure 3), as new groups of the SARS-CoV-2 sequences are becoming evident. Based on the phylogeny and the signature mutations, we have divided the BA.2.3.7 isolates into at least five genotypic groups, each with clearly defined sequence features (Figure 3).
The emergence of Omicron BA.2.3.7 in Asia is remarkable. While there is no reliable genomic data from the early cases arising from Malaysia and Vietnam, our phylogenic analysis and the metadata from GISAID indicate that travelling between Asian countries have contributed to the rapid spread of this unique Omicron lineage in this region. The majority of Omicron BA.2.3.7 cases reported from Japan in May 2022 had travel history to Taiwan, and the GISAID sequences clearly indicated that they were related to two branches of the phylogenic tree (Figure 3B), one with S:G1251V mutation and one without.
In conclusion, our genomic dataset is uniquely valuable for understanding how a major COVID-19 outbreak occurred in a naïve and vaccinated population in Taiwan with a very limited number of entry events. As Asian countries began to move from zero tolerance to gradual opening policy, continued surveillance of SARS-CoV-2 using NGS is important. Early detection of significant viral evolution in the endemic areas will minimize any disruption due to a new variant.
Data Availability
All data produced in the present study are available upon reasonable request to the authors
AUTHOR CONTRIBUTION
P.L.S. collected clinical specimens and performed clinical data analysis. H.C.T. performed COVIDseq and drafted the manuscript. Y.N.G. performed phylogenetic analyses and drafted the manuscript. H.Y.S. performed mutation signature analyses and manuscript preparation. R.K., Y.L.W., and W.J.C. interpreted data and prepared the manuscript. L.Y.H. and H.Y.Y. collected clinical samples and performed nucleic acid extraction. H.Y.K. and Y.C.H. managed clinical data and analyzed clinical data.
Y.F.L. and H.Y.W analyzed genomic data. C.C.C. collected samples and performed viral analysis. T.W.C. collected samples and analyzed clinical data. K.M.L. and C.G.H. established protocols and processed samples. S.R.S. performed the virology study and functional analysis. C.C.W., C.J.Y., and S.F.T. managed the project and prepared the manuscript.
About the Author
Dr. Shao is an assistant professor in Department of Laboratory Medicine and Pediatrics, National Taiwan University College of Medicine, Taiwan. Her main research interests include infectious disease, clinical virology, clinical microbiology, and vaccines.
Dr. Tu is postdoctoral fellow in Institute of Molecular and Genomic Medicine, National Health Research Institutes, Taiwan. Her research interests include cancer and genomic research and NGS technology.
Dr. Gong is an assistant professor in Research Center for Emerging Viral Infections and International Master Degree Program for Molecular Medicine in Emerging Viral Infections, Chang Gung University, Taiwan. His research interests include bioinformatics, phylogenetics, machine learning, and viral evolution.
Supplementary Data
We gratefully acknowledge the following Authors from the Originating laboratories responsible for obtaining the specimens, as well as the Submitting laboratories where the genome data were generated and shared via GISAID and GenBank, on which this research is based.
ACKNOWLEDGEMENTS
This work was supported by Ministry of Health and Welfare of Taiwan (MOHW110- TDU-C-222-000010) to S.F.T., by the Research Center for Emerging Viral Infections from The Featured Areas Research Center Program within the framework of the Higher Education Sprout Project by the Ministry of Education (MOE) of Taiwan, the National Science and Technology Council (NSTC), Taiwan (NSTC 111-2321-B-182- 001, 111-2634-F-182-001, and 111-2221-E-182-053-MY3), and the National Institute Of Allergy And Infectious Diseases of the National Institutes of Health under Award Number 5U01AI151698-02 to Y.N.G. and S.R.S.. The authors wish to thank Y Henry Sun and Hsiao-Hui Tso for critical reading of the manuscript, to Yi-Chun Tsai, Shuo- Peng Chou, Jhen-Rong Huang, and Yuan-Yu You for sample collection and nucleic acid preparation, to Tai-Yun Lin, Yu-Chen Huang and Chiao-Chan Wang for library construction and sequencing operation.
Footnotes
In Taiwan, Omicron BA.2.3.7 Variant has emerged and persisted in dominance as a cause of community outbreaks