PT - JOURNAL ARTICLE AU - Foster, Charles S.P. AU - Stelzer-Braid, Sacha AU - Deveson, Ira W. AU - Bull, Rowena A. AU - Yeang, Malinna AU - Phan-Au, Jane AU - Silva, Mariana Ruiz AU - van Hal, Sebastiaan J. AU - Rockett, Rebecca J. AU - Sintchenko, Vitali AU - Kim, Ki Wook AU - Rawlinson, William D. TI - Assessment of inter-laboratory differences in SARS-CoV-2 consensus genome assemblies between public health laboratories in Australia AID - 10.1101/2021.08.19.21262296 DP - 2021 Jan 01 TA - medRxiv PG - 2021.08.19.21262296 4099 - http://medrxiv.org/content/early/2021/08/22/2021.08.19.21262296.short 4100 - http://medrxiv.org/content/early/2021/08/22/2021.08.19.21262296.full AB - Whole-genome sequencing of viral isolates is critical for informing transmission patterns and ongoing evolution of pathogens, especially during a pandemic. However, when genomes have low variability in the early stages of a pandemic, the impact of technical and/or sequencing errors increases. We quantitatively assessed inter-laboratory differences in consensus genome assemblies of 72 matched SARS-CoV-2-positive specimens sequenced at different laboratories in Sydney, Australia. Raw sequence data were assembled using two different bioinformatics pipelines in parallel, and resulting consensus genomes were compared to detect laboratory-specific differences. Matched genome sequences were predominantly concordant, with a median pairwise identity of 99.997%. Identified differences were predominantly driven by ambiguous site content. Ignoring these produced differences in only 2.3% (5/216) of pairwise comparisons, each differing by a single nucleotide. Matched samples were assigned the same Pango lineage in 98.2% (212/216) of pairwise comparisons, and were mostly assigned to the same phylogenetic clade. However, epidemiological inference based only on single nucleotide variant distances may lead to significant differences in the number of defined clusters if variant allele frequency thresholds for consensus genome generation differ between laboratories. These results underscore the need for a unified, best-practices approach to bioinformatics between laboratories working on a common outbreak problem.Competing Interest StatementThe authors have declared no competing interest.Funding StatementWe acknowledge funding support from the UNSW COVID-19 Rapid Response Research Initiative (to W.D.R.). This work was also supported by MRFF Investigator Grant APP1173594 & Cancer Institute NSW Early Career Fellowship 2018/ECF013 (to I.W.D.). K.W.K. is supported by the Juvenile Diabetes Research Foundation Postdoctoral Fellowship (3-PDF-2020-940-A-N).Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:Viral sequence data from one lab (SAViD) was identified as exempt from ethics by NSW Health as no human isolate or clinical data was analysed specifically for research purposes in this study. All work carried out by ICPMR was done in accordance with governance regulations from the Human Research Ethics Committees of the Western Sydney Local Health District (2020/ETH00287).All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe raw sequencing reads from Lab1 are available from the Sequence Reads Archive (SRA) under accession number PRJNA750251. GISAID and SRA accessions for Lab2 are given in Supplementary Table S1. The custom python3 program for pairwise comparisons is available from https://github.com/charlesfoster/pairwise_comparisons. The custom R script for comparing sequence triplicates is available from https://github.com/charlesfoster/useful_scripts/blob/master/find_sample_triplets.R.