ABSTRACT
Background Faced to the ongoing global pandemic of coronavirus disease, the ‘National Reference Centre for Whole Genome Sequencing of microbial pathogens: database and bioinformatic analysis’ (GENPAT) formally established at the ‘Istituto Zooprofilattico Sperimentale dell’Abruzzo e del Molise’ (IZSAM) in Teramo (Italy) supports the genomic surveillance of the SARS-CoV-2. In a context of SARS-CoV-2 surveillance needed proper and fast assessment of epidemiological clusters from large amount of samples, the present manuscript proposes a workflow for identifying accurately the PANGOLIN lineages of SARS-CoV-2 samples and building of discriminant minimum spanning trees (MST) bypassing the usual time consuming phylogenomic inferences based on multiple sequence alignment (MSA) and substitution model.
Results GENPAT constituted two collections of SARS-CoV-2 samples. The samples of the first collection were isolated by IZSAM in the Abruzzo region (Italy), then shotgun sequenced and analyzed in GENPAT (n = 1 592), while those of the second collection were isolated from several Italian provinces and retrieved from the reference Global Initiative on Sharing All Influenza Data (GISAID) (n = 17 201). The main outcomes of the present study showed that (i) GENPAT and GISAID identified identical PANGOLIN lineages, (ii) the PANGOLIN lineages B.1.177 (i.e. historical in Italy) and B.1.1.7 (i.e. ‘UK variant’) are major concerns today in several Italian provinces, and the new MST-based method (iii) clusters most of the PANGOLIN lineages together, (iv) with a higher dicriminatory power than PANGOLIN, (v) and faster that the usual phylogenomic methods based on MSA and substitution model.
Conclusions The shotgun sequencing efforts of Italian provinces, combined to a structured national system of metagenomics data management, provided support for surveillance SARS-CoV-2 in Italy. We recommend to infer phylogenomic relationships of SARS-CoV-2 variants through an accurate, discriminant and fast MST-based method bypassing the usual time consuming steps related to MSA and substitution model-based phylogenomic inference.
Competing Interest Statement
The authors have declared no competing interest.
Clinical Trial
The present study is neither a clinical trial, nor a prospective studies.
Funding Statement
The study was funded by the European Union Horizon 2020 Research and Innovation program under grant agreement No 773830: One Health European Joint Program and by the Italian Ministry of Health IZSAM 05/20 Ricerca Corrente 2020 PanCO: epidemiologia e patogenesi dei coronavirus umani ed animali. Mention of trade names or commercial products in this article is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the IZSAM.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The results analyzed in the present study derive from the official control activities performed by the Public Health Local Authority of Abruzzo region. All human data and samples were collected ethically with consents and permissions of participants, following the Decreto della Giunta Regionale DGR n. 194 del 2.04.2021 from the Dipartimento Sanità della REGIONE ABRUZZO. A local ethics committee, so‑called the Ethics Committee of the National Reference Centre for Whole Genome Sequencing of microbial pathogens: database and bioinformatic analysis, ruled that no formal ethics approval was required in this particular case because the related data were openly available to the public before the initiation of the study.
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Footnotes
email addresses a.dipasquale{at}izs.it
n.radomski{at}izs.it
i.mangone{at}izs.it
p.calistri{at}izs.it
a.lorusso{at}izs.it
c.camma{at}izs.it
Data Availability
Metadata and consensus sequences of all the SARS-CoV-2 are available from GISAID (https://www.gisaid.org/) at the accession numbers described in supplementary information. The algorithm vcf2mst.pl is available in GitHub (https://github.com/genpat-it/vcf2mst).
Abbreviations
- ARDS
- acute respiratory distress syndrome
- CoV
- coronavirus
- COVID-19
- coronavirus disease 19
- GATK4
- genomic analysis toolkit
- GENPAT
- Whole Genome Sequencing of microbial pathogens: data-base and bioinformatics analysis
- GISAID
- Global Initiative on Sharing All Influenza Data
- IZSAM
- Istituto Zooprofilattico Sperimentale dell’Abruzzo e del Molise Giuseppe Caporale
- ML
- maximum likelihood
- MSA
- multiple sequence alignment
- MST
- minimum spanning trees
- NRC
- National Reference Centre
- RdRp
- RNA-dependent RNA polymerases
- SARSr-CoVs
- SARS-related coronaviruses
- SARS-rCoV
- severe acute respiratory syndrome-related virus
- (UK)
- United Kingdom
- WHO
- World Health Organization