Abstract
Since the beginning of the COVID-19 pandemic, SARS-CoV-2 has demonstrated its ability to rapidly and continuously evolve, leading to the emergence of thousands of different sequence variants, many with distinctive phenotypic properties. Fortunately, the broad availability of next generation sequencing (NGS) technologies across the globe has produced a wealth of SARS- CoV-2 genome sequences, offering a comprehensive picture of how this virus is evolving so that accurate diagnostics and reliable therapeutics for COVID-19 can be maintained. The millions of SARS-CoV-2 sequences deposited into genomic sequencing databases, including GenBank, BV-BRC, and GISAID are annotated with the dates and geographical regions of sample collection, and can be aligned to the Wuhan-Hu-1 reference genome to extract the constellation of nucleotide and amino acid substitutions. By aggregating these data into concise datasets, the spread of variants through space and time can be assessed. Variant tracking efforts have focused on the spike protein due to its critical role in viral tropism and antibody neutralization. To identify emerging variants of concern as early as possible, we developed a computational pipeline to process the genomic data from public databases and assign risk scores based on both epidemiological and functional parameters. Epidemiological dynamics are used to identify variants exhibiting substantial growth over time and across geographical regions. In addition, experimental data that quantify Spike protein regions critical for adaptive immunity are used to predict variants with consequential immunogenic or pathogenic impacts. These growth assessment and functional impact scores are combined to produce a Composite Score for any set of Spike substitutions detected. With this systematic approach to routinely score and rank emerging variants, we have established a method to identify threatening variants early and prioritize them for experimental evaluation.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This work has been funded with Federal funds from the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services - HHS75N93019C00076.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
N/A
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
Publicly available datasets were analyzed in this work, and the original contributions presented are included in the manuscript. Further inquiries for data availability can be directed to the authors.