Abstract
The continued emergence of SARS-CoV-2 variants is one of several factors that may cause false negative viral PCR test results. Such tests are also susceptible to false positive results due to trace contamination from high viral titer samples. Host immune response markers provide an orthogonal indication of infection that can mitigate these concerns when combined with direct viral detection. Here, we leverage nasopharyngeal swab RNA-seq data from patients with COVID-19, other viral acute respiratory illnesses and non-viral conditions (n=318) to develop support vector machine classifiers that rely on a parsimonious 2-gene host signature to predict COVID-19. Optimal classifiers achieve an area under the receiver operating characteristic curve (AUC) greater than 0.9 when evaluated on an independent RNA-seq cohort (n=553). We show that a classifier relying on a single interferon-stimulated gene, such as IFI6 or IFI44, measured in RT-qPCR assays (n=144) achieves AUC values as high as 0.88. Addition of a second gene, such as GBP5, significantly improves the specificity compared to other respiratory viruses. The performance of a clinically practical 2-gene RT-qPCR classifier is robust across common SARS-CoV-2 variants, including Omicron, and is unaffected by cross-contamination, demonstrating its utility for improving accuracy of COVID-19 diagnostics.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This study was supported by the UCSF Clinical and Translational Science Institute Catalyst Program, Chan Zuckerberg Biohub and the National Heart, Lung, and Blood Institute (1K23HL138461-01A1).
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
University of California San Francisco (UCSF) IRB, protocol #17-24056
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
Gene counts for all UCSF samples have been deposited under NCBI GEO accession GSE188678. The New York dataset can be obtained according to the Data Availability statement in the original publication12. Code for RNA-seq and qPCR SVM classifier development and validation is available at: https://github.com/czbiohub/Covid-Host-Classifier-Code.