RT Journal Article SR Electronic T1 Profiling SARS-CoV-2 mutation fingerprints that range from the viral pangenome to individual infection quasispecies JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2020.11.02.20224816 DO 10.1101/2020.11.02.20224816 A1 Lau, Billy T. A1 Pavlichin, Dmitri A1 Hooker, Anna C. A1 Almeda, Alison A1 Shin, Giwon A1 Chen, Jiamin A1 Sahoo, Malaya K. A1 Huang, ChunHong A1 Pinsky, Benjamin A. A1 Lee, HoJoon A1 Ji, Hanlee P. YR 2020 UL http://medrxiv.org/content/early/2020/11/04/2020.11.02.20224816.abstract AB Background The genome of SARS-CoV-2 is susceptible to mutations during viral replication due to the errors generated by RNA-dependent RNA polymerases. These mutations enable the SARS-CoV-2 to evolve into new strains. Viral quasispecies emerge from de novo mutations that occur in individual patients. In combination, these sets of viral mutations provide distinct genetic fingerprints that reveal the patterns of transmission and have utility in contract tracing.Methods Leveraging thousands of sequenced SARS-CoV-2 genomes, we performed a viral pangenome analysis to identify conserved genomic sequences. We used a rapid and highly efficient computational approach that relies on k-mers, short tracts of sequence, instead of conventional sequence alignment. Using this method, we annotated viral mutation signatures that were associated with specific strains. Based on these highly conserved viral sequences, we developed a rapid and highly scalable targeted sequencing assay to identify mutations, detect quasispecies and identify mutation signatures from patients. These results were compared to the pangenome genetic fingerprints.Results We built a k-mer index for thousands of SARS-CoV-2 genomes and identified conserved genomics regions and landscape of mutations across thousands of virus genomes. We delineated mutation profiles spanning common genetic fingerprints (the combination of mutations in a viral assembly) and rare ones that occur in only small fraction of patients. We developed a targeted sequencing assay by selecting primers from the conserved viral genome regions to flank frequent mutations. Using a cohort of SARS-CoV-2 clinical samples, we identified genetic fingerprints consisting of strain-specific mutations seen across populations and de novo quasispecies mutations localized to individual infections. We compared the mutation profiles of viral samples undergoing analysis with the features of the pangenome.Conclusions We conducted an analysis for viral mutation profiles that provide the basis of genetic fingerprints. Our study linked pangenome analysis with targeted deep sequenced SARS-CoV-2 clinical samples. We identified quasispecies mutations occurring within individual patients, mutations demarcating dominant species and the prevalence of mutation signatures, of which a significant number were relatively unique. Analysis of these genetic fingerprints may provide a way of conducting molecular contact tracing.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThe work is supported by the National Institutes of Health [2R01HG006137-04 to H.P.J., P01HG00205ESH to B.T.L. and H.P.J., U01HG010963 to HJ.L., D.P. and H.P.J., 1R35HG011292-01 to B.T.L.]. Additional support came from the Clayville Foundation.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The Institutional Review Board (IRB) at Stanford University School of Medicine approved the study protocol (IRB-56088).All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesSequence data is available at the National Institutes of Health's Sequence Read Archive under BioProject Accession ID of PRJNA663917.Bpbase pairCDCCenters for Disease Control and PreventioncDNAcomplementary DNAEUAEmergency Use AuthorizationFDAFood and Drug AdministrationGISAIDGlobal Initiative on Sharing All Influenza DatagnomADGenome Aggregation DatabasekbkilobaseNCBINational Center for Biotechnology InformationNGSnext-generation sequencingnspnon-structural proteinntnucleotideORFopen reading frameRdRpRNA-dependent RNA polymeraseRTreverse transcriptionViPRVirus Pathogen Resource