Abstract
Coronavirus Disease 2019 (COVID-19) is a global public health threat. Genomic surveillance of SARS-CoV-2 was implemented during March 2020 at a major diagnostic hub in Bangkok, Thailand. Several virus lineages supposedly originated in many countries were found, and a Thai-specific lineage, designated A/Thai-1, has expanded to be predominant in Thailand. A virus sample in the SARS-CoV-2 A/Thai-1 lineage contains a frame-shift deletion at ORF7a, encoding a putative host antagonizing factor of the virus.
Introduction
Coronavirus Disease 2019 (COVID-19) has reached the status of global pandemic. Genomic surveillance of its etiological virus, SARS-CoV-2, plays an important role in epidemiological investigations and transmission control strategies [1]. Genetic variation data of the virus could reveal transmission chains between infected individuals and could even map the connection between outbreak cohorts. Thailand has suffered from the spread of COVID-19 with the total number of confirmed cases over 3,000 and with more than 120,000 individuals screened as of May 2020. Since January 2020, when both imported and locally-transmitted COVID-19 cases were reported in Thailand, the country has implemented several measures to combat COVID-19 at a national scale [2-4].
Genomic surveillance could be a powerful tool in the implementation of the national COVID-19 control strategy in Thailand. ARTIC multiplex tiling PCR allows whole-genome sequencing with minuscule amount of material by generating genome-wide overlapping amplicons, which has led to its success during the Zika virus outbreak investigation in Brazil [5, 6]. Using leftover RNA samples from a standard RT-PCR diagnosis, the genomic information of SARS-CoV-2 can be decoded in less than a week. The data presented here provide an insight into the genetic repertoire, origins and viral lineages of SARS-CoV-2 in Thailand. The information is particularly important given the multiple introduction events into the country and the local expansion of the Thai-specific SARS-CoV-2 lineages.
Genomic surveillance of SARS-CoV-2 populations in Thailand
We sequenced 27 anonymized RT-qPCR positive samples from Ramathibodi Hospital in Bangkok during March 13-28, 2020 (Supplementary Table 1) [EC approval number: MURA2020/676]. The hospital acted as one of the major diagnostic hubs for COVID-19 in Bangkok during the study period. Enrichment and amplification steps were done according to the ARTIC Network protocol with ARTIC primer version 2 [7]. The libraries were prepared using KAPA HyperPrep and KAPA Library Amplification kits and subsequently sequenced using with a MiSeq Reagent Kit v2 according to the manufactures’ protocols. Variant calling was performed using the ncov2019-artic-nf pipeline (https://github.com/connor-lab/ncov2019-artic-nf). Consensus sequences were used to construct the maximum-likelihood and Bayesian phylogenetic trees with recommended representatives from various lineages worldwide utilizing IQ-TREE 2.0 and BEAST v1.10.4, respectively (Supplementary Table 2) [8-10]. Interestingly, Thailand appears to have had multiple introduction events of SARS-CoV-2 into the country, as evidenced by at least six separate clusters in the maximum-likelihood tree (Figure 1 and Supplementary Figure 1) [11]. Based on a Pangolin classification system (Database version 27 April 2020), they are grouped into A, B.1, B.1.5 and B.4 lineages (Figure 1 and Supplementary Figure 1) [9, 12]. Considering the origins and lineage branches, these SARS-CoV-2 lineages are likely to have recent ancestors outside Thailand. For example, six of the samples in the B.1 lineage in our collection are grouped tightly with virus samples commonly found in the United States of America and Europe and collected during the same period (March 2020) as visualized by Nextstrain Timetree (Supplementary Figure 2). The constructed Bayesian tree displayed the similar structure to that of the maximum-likelihood approach (Supplementary Figure 3).
It is worth noting the local expansion of a putative Thai specific lineage (Figure 2). This cluster of viruses has passed the criteria of a novel lineage as follows: (a) exhibits two shared nucleotide differences from the ancestral lineage [cut point ≥ 1], (b) contains 11 genomes with > 95% of the genome sequenced [cut point ≥ 5], (c) exhibits one shared nucleotide change (27,877G→U) among the ongoing transmission groups from March 2020 [cut point ≥ 1], and (d) has 99% bootstrap value for the lineage defining node [cut point > 70%] [9]. This lineage, designated A/Thai-1 (Figure 1 and 2), descended from the original A lineage (based on the maximum-likelihood based classification system), which was first reported in China before expanding into various countries in Asia, Europe, North America, South America and Australia [9]. This A/Thai-1 branch is separated from the rest of the original A lineage and subgroups. Upon visual inspection in Nextstrain, only one Malaysian sample (MKAK-CL-2020-5096) is the closest to A/Thai-1, but only with 63% bootstrap value and one shared lineage-specific nucleotide substitution (4,390G→U) (Supplementary Figure 4). Non-synonymous mutations unique to A/Thai-1 are 20,134G→U (ORF1b) and 24,047G→A (Spike protein) (the full mutation list is shown in Table 1). Among the changes, 20,134G→U mutation has been independently found in two samples in lineage B.1 from the Netherlands and USA. It remains to be determined with a larger sample size whether this is the result of convergent evolution or genetic recombination. This pattern of homoplasy was also hypothetically linked to putative RNA editing [13].
Thailand/Bangkok-0018, a sample in the A/Thai-1 lineage, contains a 4-nt frame-shift deletion at position 27,694-27,697, causing a premature truncation in ORF7a, which now contains five altered amino acid residues and loses the 16 original C-terminal residues (Figure 3). The deletion was confirmed by Sanger sequencing twice using two independent RT-PCR reactions (Supplementary Figure 5). The frame-shift mutation alters approximately one-sixth (21/121 residues) of the ORF7a protein. Based on protein homology to SARS-CoV, the missing region corresponds to a transmembrane helix and an ER retrieval motif, required for antagonizing a host antiviral factor [14, 15]. One sample from Arizona, USA also contains an 81-nt in-frame deletion in the ORF7a gene [16]. So far, only one sample in A/Thai-1 appears to have this frame-shift deletion. It is tempting to speculate on the relationship between ORF7a deletions and virus attenuation. However, further investigations by laboratory-based functional experiments are needed before reaching any conclusion on their biological and clinical significance.
When the analysis was extended to 22 additional genomes, independently deposited in GISAID by the Thai National Influenza Center and the Thai National Institute of Health, samples collected in January 2020 are grouped closely with the B lineage from China including the Wuhan-Hu-1 reference (Supplementary Table 3). The genetic repertoires from this additional collection also support the notion of multiple virus lineages introduced into Thailand. Nine of them also fall into A/Thai-1, making it the largest lineage in Thailand during the period of March 2020 (22/49 genomes).
Implication of the findings
Genomic surveillance is likely to be pivotal in the identification and the elimination of transmission cohorts and chains [17, 18]. The genetic composition presented here suggests the necessity for screening and monitoring international travelers during the period of COVID-19 pandemic. The local expansion of A/Thai-1 has created a new evolutionary branch unique to Thailand, which inevitably requires this lineage to be investigated for its compatibility to diagnosis and vaccine tools under development.
Data Availability
The data that support the findings of this study are openly available at https://www.gisaid.org/ as described in the manuscript.
Acknowledgment
The work here was supported by Ramathibodi Foundation, TCELS, NRCT and Mahidol University. The authors acknowledge NSTDA Supercomputer Center (ThaiSC) for providing computing resources for this work. We are grateful for the comment and suggestion from P. Wilairat and Y. Yuthavong. We appreciate the contribution from K. Joonlasak, A. Huang, A.R. Jones, S. Fernandez and C. Klungthong (Armed Forces Research Institute of Medical Sciences). The computational aspects of this research were supported by the Wellcome Trust Core Award Grant Number 203141/Z/16/Z and the NIHR Oxford BRC. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.