RT Journal Article SR Electronic T1 Genomic epidemiology of SARS-CoV-2 in Pakistan JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2021.06.24.21255875 DO 10.1101/2021.06.24.21255875 A1 Song, Shuhui A1 Li, Cuiping A1 Kang, Lu A1 Tian, Dongmei A1 Badar, Nazish A1 Ma, Wentai A1 Zhao, Shilei A1 Jiang, Xuan A1 Wang, Chun A1 Sun, Yongqiao A1 Li, Wenjie A1 Lei, Meng A1 Li, Shuangli A1 Qi, Qiuhui A1 Ikram, Aamer A1 Salman, Muhammad A1 Umair, Massab A1 Shireen, Huma A1 Batool, Fatima A1 Zhang, Bing A1 Chen, Hua A1 Yang, Yungui A1 Abbasi, Amir Ali A1 Li, Mingkun A1 Xue, Yongbiao A1 Bao, Yiming YR 2021 UL http://medrxiv.org/content/early/2021/07/02/2021.06.24.21255875.abstract AB Pakistan has been severely affected by the COVID-19 pandemic. To investigate the initial introductions and transmissions of the SARS-CoV-2 in the country, we performed the largest genomic epidemiology study of COVID-19 in Pakistan and generated 150 complete SARS-CoV-2 genome sequences from samples collected before June 1, 2020. We identified a total of 347 variants, 29 of which were over-represented in Pakistan. Meanwhile, we found over one thousand intra-host single-nucleotide variants. Several of them occurred concurrently, indicating possible interactions among them. Some of the hypermutable positions were not observed in the polymorphism data, suggesting strong purifying selections. The genomic epidemiology revealed five distinctive spreading clusters. The largest cluster consisted of 74 viruses which were derived from different geographic locations and formed a deep hierarchical structure, indicating an extensive and persistent nation-wide transmission of the virus that was probably contributed by a signature mutation of this cluster. Twenty-eight putative international introductions were identified, several of which were consistent with the epidemiological investigations. No progenies of any of these 150 viruses have been found outside of Pakistan, most likely due to the nonphmarcological intervention to control the virus. This study has inferred the introductions and transmissions of SARS-CoV-2 in Pakistan, which could provide a guidance for an effective strategy for disease control.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis work was supported by grants from the National Key R&D Program of China (2020YFC0848900, 2016YFE0206600), the Strategic Priority Research Program of Chinese Academy of Sciences, China (XDA19090116, XDB38060100), the Open Biodiversity and Health Big Data Programme of International Union of Biological Sciences, International Partnership Program of Chinese Academy of Sciences (153F11KYSB20160008), the Professional Association of the Alliance of International Science Organizations (ANSO-PA-2020-07), and the Youth Innovation Promotion Association of Chinese Academy of Sciences (2017141).Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:This study design was approved by Institutional review board of National institute of health.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).Yes I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe raw sequence data and the whole genome sequences reported in this paper have been deposited in the Genome Sequence Archive under accession number CRA003122 and Genome Warehouse under accession number GWHAOJE01000000~GWHAOOX01000000, in National Genomics Data Center, Beijing Institute of Genomics (China National Center for Bioinformation), Chinese Academy of Sciences. https://bigd.big.ac.cn/gsa/browse/CRA003122 https://ngdc.cncb.ac.cn/search/?dbId=gwh&q=PRJCA003179&page=1