Abstract
The COVID-19 pandemic has seen the persistent emergence of immune-evasive SARS-CoV-2 variants under the selection pressure of natural and vaccination-acquired immunity. However, it is currently challenging to quantify how immunologically distinct a new variant is compared to all the prior variants to which a population has been exposed. Here we define ‘Distinctiveness’ of SARS-CoV-2 sequences based on a proteome-wide comparison with all prior sequences from the same geographical region. We observe a correlation between Distinctiveness relative to contemporary sequences and future change in prevalence of a newly circulating lineage (Pearson r = 0.75), suggesting that the Distinctiveness of emergent SARS-CoV-2 lineages is associated with their competitive fitness. By assessing the Delta variant in India versus Brazil, we show that the same lineage can have different Distinctiveness-contributing positions in different geographical regions, depending on the other variants that previously circulated in those regions. More broadly, we analyze 944 combinations of geographic regions and time windows to demonstrate that the average Distinctiveness of a lineage in a country/time window is predictive of a greater than 20 percentage point future increase in infection prevalence after 56 days with an ROC AUC of 0.89. Finally, we find that positions that constitute known SARS-CoV-2 epitopes contribute disproportionately (20-fold higher than the average position) to Distinctiveness. Overall, this study suggests that real-time assessment of new SARS-CoV-2 variants in the context of prior regional herd exposure via Distinctiveness can augment genomic surveillance efforts.
Competing Interest Statement
MN, KM, AV, PL, and VS, are employees of nference and have financial interests in the company. nference is collaborating with bio-pharmaceutical, medical device and diagnostics companies, public health agencies, academic medical centers and health systems on data science initiatives unrelated to this study. These collaborations had no role in study design, data collection and analysis, decision to publish, or preparation of this manuscript. AMB, JW, RC, and JB declare no conflicts of interest.
Funding Statement
This study was self-funded by nference. No external funding was received for this study. The work of AMB, JW, RC, and JB was supported by the National Center for Biotechnology Information of the National Library of Medicine, National Institutes of Health.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Footnotes
↵+ Joint first authors
Added a section on epitope analysis along with collaborators from NCBI and NIH who contributed to this section of the study
Data Availability
All SARS-CoV-2 sequences and associated metadata were downloaded from GISAID.