Abstract
The pandemic spread of novel coronavirus, (SARS-CoV-2) causing CoronaVirus Infectious Diseases (COVID-19) emerged into a global threat for human life causing serious death rates and economic crunch all over the globe. As on April 17, 2020 at 2:00am CEST, there include a total of 2,034,802 confirmed cases for Corona and 1,35,163 deaths worldwide have been reported which includes 212 countries, areas or territories reported by World Health Organization (WHO), in which USA tops 6,32,781 confirmed cases (28,221 deaths) followed by Italy 1,65,155 (21,647 deaths), Spain 1,77,633 (18,579 deaths) and China 84,149 (4,642 deaths). This study aims to compare the genomic nature of SARS-CoV-2 genome reported from Wuhan, China with two Indian isolate genome reported by ICMR-NIV, India. Further Phylogenetic studies performed with coronavirus infecting non-human species like Bats, Duck, and sparrow were compared with Indian and other country whole genome sequences of SARS-CoV2 using MegaX and traced out the association between the human coronavirus with the other species viral genome. In addition, epidemiological reports on COVID-19 among Worldwide and India centric data were compared between April 7, 2020 to April 17, 2020 global data and the number of active cases were increased dramatically in this 10 days period studied, highlighted in the current study.
Introduction
Totally 211 countries, territories or area have been affected by the novel pathogenic Human CoronaVirus (HCoV). The novel coronavirus was officially renamed as “SARS-CoV-2” from “2019-nCoV” by 11th February 2020. The disease caused by SARS-CoV-2 was called “CoronaVirus Infectious Disease 2019” (COVID-19) by WHO. COVID-19 has been declared as a Public Health Emergency of International Concern by the WHO [1]. The virus belongs to Coronaviridae family which consists of single-stranded, non-segmented positive-sense RNA genome of size approximately 26-32 kb [2-3]. To date, six known HCoVs have been identified, namely HCoV-229E, HCoV-NL63, HCoV-OC43, HCoV-HKU1, Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV) and Middle East respiratory syndrome coronavirus (MERS-CoV); except SARS and MERS, other four were universally dispersed in the human population and causes common cold infections among one-third of human populations [4-5].
Two serious coronavirus disease outbreaks occurred in the past two decades. First is SARS in 2003, originated from Southern China and spread to more than 30 countries and all major continents, resulted in more than 8000 human infections and 774 deaths [6-7]. The second include MERS in 2012 genetically different from SARS-CoV but infected 2,249 people in 27 countries with 35% case fatality [8-10].
The current COVID-19 entirely made the globe into lockdown status with the highest death rates and infected rates changing dynamically every day. The beginning of SARS-CoV-2 was linked to a cluster of patients with pneumonia of unknown aetiology connected to a local Huanan South China Seafood Market in Wuhan, Hubei Province, China in December 2019 [11]. It is now emerged as pandemic condition and World Health Organization (WHO) report as on April 17, 2020 at 2:00am CEST, there include a total of 2,034,802 confirmed cases for Corona and 1,35,163 deaths worldwide which includes 212 countries, areas or territories with cases [1] WHO region wise data in Fig.1A-B.
The fuss about the source of the virus and its intermediate host is still not yet confirmed. However, so far, there are still controversies about the source of the virus and its intermediate host [3]. The evolutionary analysis shows that the present strain of coronavirus is more similar to Bat coronavirus isolate RaTG13 (GenBank No.: MN996532.1), with 96.2% nucleotide homology in the whole genome [12-14]. Other groups suggest that pangolin, mink, snake, turtle may be potential intermediate hosts for the virus but not conclusively [3, 15-18].
In India, from January 25 to 31, 2020 three cases were shown to be positive for COVID-19 at Kerala state, all the three had travel history from Wuhan, China. Two patient’s samples were analysed by Next Generation sequencing and the data has been published in NCBI under MT012098.1 and MT050493.1 and GISAID database, accession numbers EPI ISL 413522 and EPI ISL 413523 and they reported briefly on various aspects [19]. We initiated our analysis, once the genome sequence was available at the NCBI database, however, Pragya D et al., reported by the end of March 2020. This study was aimed to understand the genomics and epidemiology of the Indian Human coronavirus with other major countries genome sequence, in addition the phylogeny relationship between COVID-19 from Indian patients genome with other Coronavirus genome which infects other species like Bats, Duck, Sparrow and etc.,
Materials and Methods
Sequence recovery and analysis
Genomic sequences hold innumerable information to foresee identification, location and pathogenicity of viral strains. As on April 07, 2020 there stay 547 nucleotide sequences and 5,364 protein sequences available in NCBI Virus database (https://www.nih.gov/coronavirus). Of which 469 sequences contains nearly about ∼28000-29000 base pairs (bps), sequences represents complete genome data and 78 denotes short specific protein encoding gene sequences.
The phylogeny was constructed to compare the evolutionary relationship of human and bat genome sequences. We used the first complete genome sequence submitted by China under the Genbank id: NC_045512.2 for Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) [COVID-19] for the study and a BLAST similarity search was performed. The top 100 hits were screened and the aligned sequences were selected for further studies. Genome sequences of novel COVID-19 affecting Human (30) including 2 genome from India-Kerala state MT050493.1 & MT012098.1 and 2 closely related bat corona viral genomes MG772933.1 and MN996532.1 were investigated using MEGAX. The sequences were subjected to analysis by Maximum Likelihood method and Tamura-Nei model for the native and bootstrap tree (50 repeats) construction.
To compare the genomic evolutionary lineage from the other species, we compared the six (6) sequences of SARS-CoV-2 [Genbank id: NC_045512.2, MT126808.1, MT007544.1, MT049951.1, MT012098.1 (India) and MT050493.1 (India)] and 24 corona virus isolates from other species such as bat, sparrow, duck, pig, cow, etc., using MEGA X [20-22] (Table 1).
Results
A phylogenetic tree was constructed with 30 genome sequences of Human coronavirus and 3 sequences of bat coronavirus using maximum likelihood method and Tamura-Nei model with MEGAX program. The algorithm produced a tree with highest log likelihood value (-632944.00) and also formed a bootstrap consensus tree inferred from 50 replicates. The percentage of trees in which the associated taxa clustered together was shown next to the branches. There were a total of 29903 positions in the final dataset. The lengths of the branches represent the amount of change that is estimated to have occurred between a pair of nodes. Alternative evolutionary trees are produced by bootstrapping analysis under maximum likelihood model. This is done through 50 iterations and when we recover the same node after 50 iterations then the node in the tree is well supported.
Upon boostrap, Fig. 2. data showed all the three Bat-SARS-like coronavirus in closed group branching to the China, USA and India strains. Though the first complete genome NC045512.2.1-29903 Wuhan-Hu-1 (blue triangle) was closely branched to MN996528.1 [Wuhan, Hubei, China], MT093571.1 [Swedan], MT039890.1 [Korean travelled from Wuhan, China], MT066156.1 [First case of Italy], MT007544.1 [Australia], the Bat SARS virus, precisely RaTG13 and SL-Covzc45 were completely close and neighborhood with the CoVid-19 isolated from Wuhan-genomes MT198652.2 (Spain), MT123292.2 (Guangzhou, China) MT012098.1 (Kerala, India) and LR757995.1 (Hong Kong, China).
The purpose of this study is to highlight the key role of Wuhan, China trace to Indian patients, Hence, the two India-Kerala state isolates were specifically traced by multiple alignments of the genome sequences using NCBI tool and Multalin, we found MT012098 consist of 29,854 bps and MT050493.1 consist of 29,851bps, three basepairs were missing in the initial sequence of the second isolate. Whereas totally, 9 point mutations were identified in this two genomes of which, 8 were C-T/T-C change and only one mutation at position 22,772 T/G change with 99% identity score, complete details recorded in the Table 2. The second strain from India MT050493.1 was in close association with LR757996.1 (Wuhan, China), MN938384.1 (Guangdong, China) and LC528232.1 (Isolated from a patient in cruise ship, Japan).
Phylogenic study of coronavirus genome with diverse organisms showed, highest log likelihood value of -1064822.43 along with bootstrap consensus tree of 50 replicates. In this analysis a total of 31116 positions in the final dataset were used.
Though two Coronavirus from Indian patient had history from Wuhan travel and returned to India, Kerala state. In Fig.3 human corona virus (MT012098.1) is closely related to White-eye coronavirus (JQ065044.1) and sparrow coronavirus (NC016992.1). The second Indian human coronavirus genome (MT050493.1) was closely related to Duck coronavirus (JF705860), Transmissible gastroenteritis (KX499468.1) and Canine respiratory coronavirus (KX432213.1). Whereas, all the other human affecting coronavirus genome from China, Wuhan, Brazile, Australia were closely related and they were adjacent to Bat coronavirus (MN996532.1, EF065511.1, MN611520.1), civet coronavirus AY572038.1 as a secondary neighborhood close to the human coronavirus next to bat virus. Though it is not conclusive, the coronavirus from sparrow and white-eye are more close to Indian isolates, which indicates they may be associated with the patients from other species of animal sources.
Discussion
The emergence and rapid spread of COVID-19 signifies a perfect epidemiological storm. COVID-19 a member of the genus Betacoronavirus, subgenus - Sarbecovirus as like SARS-CoV, but MERS-CoV belongs to Merbecovirus (subgenus) [12, 23-24]. The RaTG13 is 96% identical to COVID-19 at the nucleotide level, but the receptor binding domain (RBD) is only 85% similar and shared only one of the six critical amino acid residues [25]. Still, the COVID-19 RBD for human ACE2 receptor found to be strong and functional [26] which armor the virus towards human respiratory system.
Research Group of South China Agricultural University analyzed more than 1,000 metagenomic samples, and they found 70% of pangolins were positive for the coronavirus. In addition, virus isolate from pangolin shared 99% sequence similarity with the current infected human strain SARS-CoV-2 [27] and they addressed pangolins may be one of the intermediate hosts for SARS-CoV-2 later they were embrassed to be a miscommunication of data reported [28] Infact, only 90% genome were homology to SARS-CoV-2 genome and Coronavirus from Pangolins. This incidence was a best example, how the scientific and public media were exaggerating the data without solid evident.
Inspite, the 96% identity of SARS-CoV-2 with the Bat coronavirus presumes, they are very closely related to SARSCoV-2, in virtual this likely represents more than 20 years of RNA sequence evolution, whcih may occure if the molecular clock emerges at an uncertain rate if there was strong adaptive evolution of the virus in humans[25]. Available bat coronavirus genome was studied from Yunnan province, over 1,500 km from Wuhan. There are few bat coronaviruses from Hubei province, and those that have been sequenced are relatively distant to SARS-CoV-2 in phylogenetic trees [29]. The available data on bat viruses is strongly biased toward some geographical locations, which needs further study [25].
The two Indian SARS-CoV-2 sequences were found to be non-identical (0.04% nt divergence), and the result of phylogenetic analysis indicated they are not in same clade, carries two different lineage which indicates they are non-identical source of infection [19]. Further multiple genome sequencing of Coronavirus from Indian patients will give us better understanding about the source of infection and mutation rate among the patient to patient and point of contact. The phylogenetic analysis of two Indian isolates elucidates strong associatedion with Bat virus genome, inspite the close association with duck and sparrow coronavirus were double checked with the genome blast given no statistical significance. Inspite, We estimate, if new coronavirus genome sequence from other animals like Bats, Pangolin, Civet or any other animal associated with Huanan South China Seafood Market in Wuhan, Hubei Province, China, may give us solid source of infection and the mode of transmission from animal to Human can be traced. Since, the 3-3.95 % genome difference in a RNA virus between SARS-CoV-2 and RaTG13 will not happen in a year and we are lacking the exact source or primary host of this novel coronavirus.
In India, as on April 17, 2020 18:00 IST, totally 13,815 confirmed cases, 1955 recovered cases, 11,403 active cases and 457 deaths. Currently, Maharastra, Delhi and Tamil Nadu were respectively top three affected States/Union Territory of India. The complete details were represented in the Table 3. Despite, India had cases from Wuhan to India-Kerala state by January 4th week 2020, gradually; the cases were increased and identified from patients with travel history of various other countries and regions Fig.4.. The Government of India made complete Lockdown on March 24, 2020 midnight to April 15, 2020 early morning (21 days) as a first schedule. Second schedule announced by Honourable Prime Minister of India Mr. Narendra Modi on April 14th 10 am, 2020 that the lockdown will be extended further to grab the spread of COVID-19 from local transmission (stage 2) to community transmission (stage 3) from April 15 early morning to May 3, 2020 early morning. This will further reduce the number of new infection rate, and the existing patients may be identified and treated according to the WHO guidelines. Inspite of all the effort taken, the rate of new cases has been steeply increased over the April 3rd week and the list of confirmed, dead and recovered details shown in the Table.3.
Government of India and the health authorities of all the states had taken necessary action to curb spread and made containment plan and enhanced surveillances for the virus and its restricted movement. ICMR-NIV, Pune doing splendid work towards the identification of corona patients through Real-time quantitative PCR and rapid kit based methods. Currently, the diagnoses were carried out for the patients with clinical symptoms and or travel history or their contacts. In near future, the Government has planned to test everyone with symptoms irrespective of travel or contact history. This will curb the case of COVID-19 in India.
The world wide data alarms USA, Italy, Spain, UK, Iran, Netherlands and other countries with high rate of death cases, warrants early diagnosis and treatment, though there is no specific drug against COVID-19. The perfect lockdown, social distancing and safety measures plays vital role in this current scenario to reduce the number of new cases arises in the world Table 4 (7 parts). Fig 5A-C.
Inspite of all the effort taken all over the world, USA has been affected to the highest level in the world, where 6,32,781 confirmed cases and 28,221deaths were reported as on April 17, 2020. Next to USA, the top 9 countries were Italy, Spain, France, The United Kingdom, Iran (Islamic Republic of), China, Belgium, Germany, Netherlands have confirmed cases of 1,65,155, 1,77,633, 1,05,155, 98,480, 77,995, 84,149, 33,573, 1,30,450 and 2,81,53 respectively and the death toll reported to be Italy 21,647, Spain 18,579, France 17,146, The United Kingdom 12,868, Iran (Islamic Republic of) 4,869, China 4,642, Belgium 4,440, Germany 3,569, Netherlands 3,134 as per WHO report (April 17,2020 2:00am CEST). Although Germany position to be 4th as per the confirmed cases, but the number of death reported till date was lower compared to other countries, which made it as 9th position. The complete country wise cumulative confirmed cases and cumulative deaths between April 7th 2020 and April 17th 2020 was calculated and the percentage wise confirmed cases and death rate in 10 days was calculated and shown in the Table. 4.
The percentage of death rate increased in just 10 days alarms the world and it should be recorded in the literature for future study. Hence, we included the whole data in the table, and the top 20 countries with death fold increase in 10 days between April 7, 2020 to April 17, 2020 as follows United States of America 195%, Italy 31%, Spain 42%, France 93%, The United Kingdom 139%, Iran (Islamic Republic of) 26%, China 39%, Belgium 172%, Germany 122%, Netherlands 68%, Brazil 257%, Turkey 134%, Sweden 152%, Canada 258%, Switzerland 52%, Portugal 93%, Indonesia 124%, Mexico 378%, Ireland 155%, India 263%.
Continued testing of people, irrespective of symptoms and travel history is the need of time to scrutinize the asymptomatic carrier people among the group of population, will cut down the number of death rate. This April 2020 plays major role in the steady death toll increase among various countries mentioned above. Early diagnosis, appropriate treatment, containment zone formation, Quarantine the people with symptoms may help us to flat the rate of infection and death rate.
Possible ending of COVID-19
The pandemic COVID-19 outbreak may have handful different endings. If nature or God give us the luck, the most favourable scenario would be COVID-19 unconsciously petering out as was the case with SARS in 2003. The second chance would be like MERS which, continue sporadically pop up over the years. The third one in worst scenario, it may create more deaths over the year as the sinister path like 1918 Spanish influenza over a decade [30]. Let the nature and future determine it.
Conclusion
The dynamic increase of number of confirmed cases and genome sequences from all over the world made us to register the various conditions as on April 1st week, 2020. Every one hour, the world wide data has been updated in various resources like WHO, John Hopkins website, Worldometer. In this context, our study from 2 Indian isolate provided us the trace for infection source and the spread rate increases the point mutation, which may either reduce the virulent or increase the pathogenicity of novel coronavirus, SARS-CoV-2 mediated COVID-19 which needs further study.
Data Availability
Data available from public source- NCBI
Acknowledgement
I thank Dr Giridharan Appaswamy, Chief Scientific Officer, LifeCell International Private Limited, Chennai, for his continuous support and encouragement to do this research work. I thank Management, Lifecell International Pvt Ltd and all the lab members from Lifecell, R& D division for their support during this work.