PT - JOURNAL ARTICLE AU - Song, Shuhui AU - Li, Cuiping AU - Kang, Lu AU - Tian, Dongmei AU - Badar, Nazish AU - Ma, Wentai AU - Zhao, Shilei AU - Jiang, Xuan AU - Wang, Chun AU - Sun, Yongqiao AU - Li, Wenjie AU - Lei, Meng AU - Li, Shuangli AU - Qi, Qiuhui AU - Ikram, Aamer AU - Salman, Muhammad AU - Umair, Massab AU - Shireen, Huma AU - Batool, Fatima AU - Zhang, Bing AU - Chen, Hua AU - Yang, Yungui AU - Abbasi, Amir Ali AU - Li, Mingkun AU - Xue, Yongbiao AU - Bao, Yiming TI - Genomic epidemiology of SARS-CoV-2 in Pakistan AID - 10.1101/2021.06.24.21255875 DP - 2021 Jan 01 TA - medRxiv PG - 2021.06.24.21255875 4099 - http://medrxiv.org/content/early/2021/07/02/2021.06.24.21255875.short 4100 - http://medrxiv.org/content/early/2021/07/02/2021.06.24.21255875.full AB - Pakistan has been severely affected by the COVID-19 pandemic. To investigate the initial introductions and transmissions of the SARS-CoV-2 in the country, we performed the largest genomic epidemiology study of COVID-19 in Pakistan and generated 150 complete SARS-CoV-2 genome sequences from samples collected before June 1, 2020. We identified a total of 347 variants, 29 of which were over-represented in Pakistan. Meanwhile, we found over one thousand intra-host single-nucleotide variants. Several of them occurred concurrently, indicating possible interactions among them. Some of the hypermutable positions were not observed in the polymorphism data, suggesting strong purifying selections. The genomic epidemiology revealed five distinctive spreading clusters. The largest cluster consisted of 74 viruses which were derived from different geographic locations and formed a deep hierarchical structure, indicating an extensive and persistent nation-wide transmission of the virus that was probably contributed by a signature mutation of this cluster. Twenty-eight putative international introductions were identified, several of which were consistent with the epidemiological investigations. No progenies of any of these 150 viruses have been found outside of Pakistan, most likely due to the nonphmarcological intervention to control the virus. This study has inferred the introductions and transmissions of SARS-CoV-2 in Pakistan, which could provide a guidance for an effective strategy for disease control.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis work was supported by grants from the National Key R&D Program of China (2020YFC0848900, 2016YFE0206600), the Strategic Priority Research Program of Chinese Academy of Sciences, China (XDA19090116, XDB38060100), the Open Biodiversity and Health Big Data Programme of International Union of Biological Sciences, International Partnership Program of Chinese Academy of Sciences (153F11KYSB20160008), the Professional Association of the Alliance of International Science Organizations (ANSO-PA-2020-07), and the Youth Innovation Promotion Association of Chinese Academy of Sciences (2017141).Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:This study design was approved by Institutional review board of National institute of health.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).Yes I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe raw sequence data and the whole genome sequences reported in this paper have been deposited in the Genome Sequence Archive under accession number CRA003122 and Genome Warehouse under accession number GWHAOJE01000000~GWHAOOX01000000, in National Genomics Data Center, Beijing Institute of Genomics (China National Center for Bioinformation), Chinese Academy of Sciences. https://bigd.big.ac.cn/gsa/browse/CRA003122 https://ngdc.cncb.ac.cn/search/?dbId=gwh&q=PRJCA003179&page=1