The Coviral Portal: Multi-Cohort Viral Loads and Antigen-Test Virtual Trials for COVID-19

Alexandra Morgan; Elisa Contreras; Michie Yasuda; Sanjucta Dutta; Donald Hamel; Tarini Shankar; Diane Balallo; Stefan Riedel; James E. Kirby; Phyllis J. Kanki; Ramy Arnaout

doi:10.1101/2023.05.05.23289582

Abstract

Background Regulatory approval of new over-the-counter tests for infectious agents such as SARS-CoV-2 has historically required that clinical trials include diverse groups of specific patient populations, making the approval process slow and expensive. Showing that populations do not differ in their viral loads—the key factor determining test performance—could expedite the evaluation of new tests.

Methods 46,726 RT-qPCR-positive SARS-CoV-2 viral loads were annotated with patient demographics and health status. Real-world performance of two commercially available antigen tests was evaluated over a wide range of viral loads. An open-access web portal was created allowing comparisons of viral-load distributions across patient groups and application of antigen-test performance characteristics to patient distributions to predict antigen-test performance on these groups.

Findings In several cases distributions were surprisingly similar where a difference was expected (e.g. smokers vs. non-smokers); in other cases there was a difference that was the opposite direction from expectations (e.g. higher in patients who identified as White vs. Black). Sensitivity and specificity of antigen tests for detecting contagiousness were similar across most groups. The portal is at https://arnaoutlab.org/coviral/.

Conclusions In silico analyses of large-scale, real-world clinical data repositories can serve as a timely evidence-based proxy for dedicated trials of antigen tests for specific populations. Free availability of richly annotated data facilitates large-scale hypothesis generation and testing.

Funding Funded by the Reagan-Udall Foundation for the FDA (RA and JEK) and via a Novel Therapeutics Delivery Grant from the Massachusetts Life Sciences Center (JEK).

Introduction

Diagnosis of new infectious pathogens such as SARS-CoV-2 requires development of new diagnostic tests, which must be evaluated and approved by regulatory agencies before they can be used for patient care. Such tests include over-the-counter (OTC) antigen tests, which have been widely used for at-home testing in the context of COVID-19. In order to be approved, a new test must demonstrate a minimum level of clinical performance. Performance is typically measured as the test’s sensitivity, defined as the proportion of true-positive samples that have a positive result, and its specificity, defined as the proportion of true negatives that have a negative result. Clinical performance must be demonstrated in a defined patient population or group and clinical context, for example inpatients as opposed to outpatients. However, at the start of an outbreak, epidemic, or pandemic, there may not be enough information to know whether a test can be expected to perform differently in some patient groups vs. others. Therefore at the start of development, a new diagnostic test may be approved based on its performance in the general population, not specific groups.

In contrast, as time goes on, evidence for clinical differences among specific groups may emerge. As this happens, it becomes reasonable to ask whether a test might perform differently in specific groups, with important implications for how and potentially even whether that test should be used in a given clinical scenario. Ideally, this question would be answered by conducting dedicated trials of the new diagnostic test in each group of interest. Unfortunately, trials are expensive and slow. Also, the number of specific groups that may be of interest is large, since specific subgroups can be defined not only based on demographics (such as age, race, and gender), comorbidities (such as diabetes, heart disease, or immunosuppression), and care settings (inpatient vs. outpatient vs. emergency room) but also by the many possible combinations of each of these characteristics, which is an essential component of precision medicine.¹ As a result, in practice it is prohibitively difficult to perform many separate trials on specific groups for even a single diagnostic test, much less for the many tests that are likely to be developed in response to a large-scale outbreak, such as has happened in response to the COVID-19 pandemic. This is a problem for regulators, clinicians, and patients alike.

One solution is to apply a new test’s various performance characteristics to real-world data collected in the course of patient care. Such characteristics include results of existing trials as well as analytical (i.e. pre-clinical) operating parameters such as the limit of detection (LOD). The LOD is defined as the lowest concentration of virus that the test can detect in 95% of replicates. It is routinely determined by manufacturers and validated by clinical laboratories before a test is put to use clinically.² The relationship between concentration and detection is usually understood to follow an S-shaped curve;³ fitting it requires at least one additional datapoint besides the LOD. The concentration of the virus may be measured as the viral load, most often defined as the number of copies of viral mRNA per milliliter of testing material (copies/mL).

“Real-world data” means the viral-load result of a reference diagnostic test that has already been approved for the general population. Because this data is from the general population, it will presumably include results on many specific patient groups. One can apply the performance of the new test as described above to the set or “distribution” of viral loads from a group to predict what proportion of patients in the group would have tested positive with the new test. This proportion is the sensitivity of the new test for that group. In this way, one can estimate clinical sensitivity without needing a dedicated trial on that group.

In this study, we apply this approach to COVID-19. We use the 46,726 positive SARS-CoV-2 RT-qPCR results our institution has performed as of this writing and use our electronic health record to annotate each result according to the patient’s demographics, comorbidities, and so on. Importantly, we convert each PCR result from a Ct value to a viral load using robust (100% code-coverage) and accurate publicly available software, as previously described.^{2, 4–6} Although PCR results are typically reported simply as positive or negative, qPCR is intrinsically quantitative (the “q” in “qPCR”); we make use of this quantitative information in its natural unit of measure (viral load). This is in contrast to Ct values, which are less useful because they vary inversely with viral load and correspond to different viral loads on different PCR testing platforms.

We focus especially on sensitivity and specificity for infectivity or contagiousness. Contagiousness is of special interest given the public health focus on using quick, inexpensive tests to curtail community transmission in a pandemic.^{7, 8} Treating contagiousness as a function of viral load, contagiousness can be estimated using a virus culture assay we previously described in which a positive patient sample is applied to susceptible cells and monitored for virus replication. After an initial adsorption period, the cells are washed free of the initial virus to eliminate carryover. The supernatant is then sampled on a timescale of days and tested by PCR for the presence of new virus.^{9, 10} The lowest concentration of virus in a patient sample from which new virus can be recovered is the contagiousness threshold. Because cells in culture have no physical or distance barriers, no mucociliary elevator, and no protection via medications or an immune system, we consider this threshold a conservative estimate. We previously demonstrated this threshold is approximately 50,000 copies/mL and has been fairly stable even as the SARS-CoV-2 virus has evolved.^{9, 10}

Materials and Methods

Institutional review

Institutional Review Board approval was obtained for all described work under Beth Israel Lahey Health (BILH) IRBs 2022P000328 and 2022P000288. The Harvard T. H. Chan School of Public Health IRB20-1979 provided non-human subjects research determination for virus culture work.

Defining specific patient groups

We extracted the following information from our hospital’s clinical-research data repository: demographics (age, gender, and self-reported race/ethnicity), socioeconomic status (using the median neighborhood household income for the patient’s ZIP code, obtained via the 2020 U.S. census, as a proxy), care setting (inpatient, outpatient, emergency ward, or other institution), presentation/disposition (based on vital signs, which we combined into a measure of initial presentation), outcome (survived, died with COVID-19 as the cause of death, died with COVID-19 as an incidental finding), vaccination status (vaccinated, unvaccinated, or unknown), treatment (CPT-encoded procedures, remdesivir (GS-5734; Gilead Sciences, Foster City, CA) administration, steroid administration), comorbidities (according to the Charlson Comorbidity Index:¹¹ body-mass index, diabetes, chronic heart disease, chronic lung disease, chronic renal disease, liver disease, dementia, chronic neurological conditions, connective-tissue disease, human immunodeficiency virus (HIV), and malignancy), and immunosuppression status¹² (CD4+ T-count <100 cells/µL, hematologic malignancy, chemo/immuno-modulating agent alone or in setting of solid malignancy, organ transplant, or rheumatologic/inflammatory condition). The rationale for extracting these data items specifically was twofold: first, this list includes the complete COVID-19 core diagnostic data at federal and state levels; second, it includes data necessary for calculating the well validated 4C Mortality Score for SARS-CoV-2.¹³ ICD-10 codes corresponding to the listed comorbidities were determined by a physician (Dr. Arnaout) following prior methodologies¹⁴ but updated for 2022-2023.

At presentation, patients were considered sick if any of the following were true within 1 day of the PCR test sample: systolic blood pressure <90 mmHg, diastolic blood pressure <60 mmHg, heart rate >100 beats per minute, respiratory rate >18 breaths per minute, or temperature >99.1°F. They were otherwise considered well, with the exception that if no values were recorded (NULL in the data repository) for all criteria, presentation was considered unknown and therefore not assigned.

Patients were designated as immunocompromised at the time of PCR testing if one of the following were true: on their most recent T-cell subset analysis report, their absolute CD4+ cell count was <100 cells/µl; they had a diagnosis of either lymphoma or leukemia associated with a healthcare encounter (visit, admission, or phone call) either before the PCR test or within 60 days after the PCR test; they were on any of the following medications on an ongoing basis, prescribed prior to the PCR test and with enough refills to include the time up to 30 days prior to the PCR test: abatacept, adalimumab, anakinra, azathioprine, basiliximab, budesonide, certolizumab, cyclosporine, daclizumab, dexamethasone, everolimus, etanercept, golimumab, infliximab, ixekizumab, leflunomide, lenalidomide, methotrexate, mycophenolate, natalizumab, pomalidomide, prednisone, rituximab, secukinumab, serolimus, tacrolimus, tocilizumab, tofacitinib, ustekinumab, or vedolizumab. Otherwise, they were designated not immunocompromised.

Supplementary Table 1 provides further details for the above methods.

View this table:

Table S1: Detailed criteria for defining patient characteristics, Related to Methods.

Criteria for each of the characteristics available on the portal are described.

Viral load

The SARS-CoV-2 RT-qPCR testing in this study was performed on three Abbott Molecular platforms: m2000, Alinity m, and Alinity 4-Plex (Abbott Molecular, Des Plaines, IL, U.S.A.). These detect identical SARS-CoV-2 N and RdRp gene targets. They are extremely sensitive, with LOD of ∼100 copies/mL. They output a quantitative fractional cycle number (FCN), a type of Ct value described in detail elsewhere.¹⁵ Together these platforms accounted for 46,726 positive tests.

Ct values were converted to viral loads in units of copies of viral mRNA per mL using the public Python package ct2vl as previously reported.⁴ Briefly, this software was validated via calibration curves established for all platforms using an extended SeraCare panel (LGC Seracare, Milford, MA) panel based on a SARS-CoV-2 genome incorporated into replication-incompetent, enveloped Sindbis virus and calibrated based on digital PCR at US National Institutes of Standards and Technology (NIST) and LGC/Seracare.¹⁶ Validation material ranged in viral load from 300 to 10⁶ viral genome copies/mL. Results were harmonized with the cycle threshold for a spiked internal control also amplified in each SARS-CoV-2 assay to confirm lack of PCR inhibition and accurate viral load output. The standards, modeling SARS-CoV-2 virus, were run through all stages of sample preparation and extraction to allow appropriate comparison with identically processed patient samples. R² was ∼0.99 for all calibration determinations, indicating assays are robustly quantitative.

Presumed SARS CoV-2 variant

Presumed variant was inferred from the date of sample collection based on the data presented by Covariants (https://covariants.org) showing the frequency of sequencing particular variants in Massachusetts, the United States, and other locations.¹⁷ Specimens from before June 7, 2021 were annotated as being an early variant. Specimens from between July 7, 2021 and December 6, 2021 were annotated delta variant. Specimens from after January 3, 2022 were annotated as omicron variant. Results from the month between windows, when more than one major variant was common, were not annotated with a presumed variant and are omitted from by-variant comparisons.

Evaluation of antigen tests vs. PCR

Patients seeking COVID testing at a drive-through testing site near Boston affiliated with our medical center^{5, 6} between May 23 and November 4 of 2022 were offered the opportunity to participate in a separate arm, providing a comparative, parallel prospective study. This represented community testing for both symptomatic and asymptomatic individuals with diverse demographics (age, race, sex, socio-economic status).

Each patient who consented had both a standard-of-care PCR test and two OTC antigen tests performed (Abbott BinaxNow COVID-19 Ag Card and CareStart COVID-19 Antigen Home Test). The PCR test was performed on material collected with a nasopharyngeal swab. SARS-CoV-2 RT-qPCR testing was performed using the Abbott m2000 RealTime or Alinity m SARS-CoV-2 assays according to the manufacturer’s instructions, yielding, for each positive sample, a Ct value which was converted to viral load as previously described. Specimens for the antigen tests were collected with separate nasal swabs for each test, according to the manufacturer’s instructions. These were collected and the tests performed by study personnel after informed consent was obtained on-site within the time-frame constraints detailed in each test’s instructions for use, as per IRB. In order to extrapolate antigen-test performance from this subset to all patients, positivity vs. viral load was modeled by logistic regression (the LogisticRegression function in Python’s scikit-learn library).¹⁸ LogisticRegression converges on optimal parameters in a model predicting the probability of a positive test based on viral load. Parameters were predicted separately for each test. The equation for probability was a standard sigmoid constrained to the range 0-1 (i.e., the lowest probability is zero and the highest probability is 1): p(test success) = where v, the independent variable, is log10 of the viral load. This constraint leaves two free parameters: v0 is the midpoint, i.e. the model’s estimate of where the success rate passes 50%, while k controls the steepness, i.e. the change in viral load to change in probability of being positive.

Contagiousness

Samples were then stored at 4°C until contagiousness testing. This was done within a four-day time period. We previously validated that freeze-thaw does not impact viral viability and will bank remaining samples for future investigations. Quantitative viral culture was performed on a random sample of PCR-positive samples. Vero E6 cells (ATCC CRL-1586) were seeded on a 6-well flat bottom plate at 0.3×10⁶ cells per well in Eagle’s minimum essential media (EMEM) containing 1% antibiotic-antimycotic, 1% HEPES and 5% fetal calf serum (FCS, Gibco) grown to confluence at approximately 1 × 10⁶ cells per well, inoculated with 250µL of patient sample, and incubated at 37°C for 24 hours for viral adsorption, as previously described.^{10, 19–21} Carryover of non-viable viral RNA present in samples was limited by washing cell cultures after the 24-hour viral adsorption and adding fresh EMEM composite media with reduced FCS to 2% for viral growth, meaning detectable virus represents viable replicating virus. On days 3 and 6, cell culture supernatant was removed and added to 800µL of VXL buffer (QIAGEN, German, MD) (1:1 ratio) for subsequent nucleic acid extraction and detection of virus by PCR. Viral load in culture supernatants on days 3 and 6 served as a quantitative surrogate for viable (i.e. replication-competent) virus in the patient sample and provided a measure of the magnitude of sample infectivity. SARS-CoV-2 RT-qPCR testing of Vero cell culture supernatants was performed using the Abbott m2000 Real-Time or Alinity m SARS-CoV-2 assays according to the manufacturer’s instructions. The contagiousness threshold was determined by the threshold patient-sample viral load value resulting in detectable culture viral load.

Whole-genome viral NGS

Next-generation-sequencing (NGS)-based sequencing of select PCR-positive samples from the viral antigen evaluation study was performed as follows. Full-length SARS-CoV-2 viral genome sequencing was performed on the Oxford Nanopore MinION system (≥R9.4 flowcell; Oxford Nanopore Technologies-ONT, Oxford, UK) using the guppy basecaller and the downstream ARTIC network bioinformatics pipeline for genome assembly.^{22, 23} The workflow was run on a 2021 Intel Core i9-11900 Rocket Lake 3.5GHz 8-core LGA 1200 boxed processor with NVIDIA A5000 GPU. Standard coverage and quality metrics and plots were produced, single-nucleotide variants were recorded, and variants assigned using NextClade.²⁴

Web portal and privacy protection

The portal was written using Svelte and d3 for the interactive frontend and Python run against a Postgres database for the backend. To reduce re-identification risk, ages were jittered by adjusting the patient’s date of birth by a random number of days (drawn from a Gaussian distribution with a standard deviation of two years) before calculating patient’s age at the time of each test. Groups smaller than 4-8 patients are suppressed and therefore not viewable. Revealing exact sizes of such small groups defined by multiple patient characteristics would pose a re-identification risk. To prevent inferring the sizes of these groups by subtraction of viewable group sizes, viewable group sizes are jittered by dropping approximately 0.5-1% of the data on any split by patient feature. To maximize consistency of the results of jittering as data are updated, jittering was performed using random number seeds based on pseudo-identifiers (which are never uploaded and thus inaccessible to/safe from the web client). For ease of visualization, plots of viral load distributions are shown as kernel-density estimates (i.e. smoothed) using a Gaussian kernel of width 0.25 log10 viral load units (∼1.7-fold).

Statistical tests

The geometric mean viral load for each patient group was calculated as a summary statistic. The geometric (as opposed to arithmetic) mean was chosen because viral loads vary over many orders of magnitude.² The Kolmogorov-Smirnov test (KS; scipy.stats.kstest) was used to compare distributions. This test was used because data were not distributed normally and KS does not require normality (unlike, for example, the t-test, which requires normal distributions). KS tests the null hypothesis that the distributions of viral loads for two patient groups are statistically indistinguishable.²⁵ The p-value gives the probability that distributions from the two groups are drawn from the same underlying distribution. A large p-value means the two groups are statistically indistinguishable; a small p-value means they are different. Interpretation of p-values as significant vs. not significant requires a significance threshold, which requires correction for multiple comparisons if multiple comparisons are performed.^{26, 27} Because the number of comparisons performed via the web portal is up to the user, un-corrected p-values are reported, with interpretation as significant or not significant left to the user.

Software and hardware

Data extraction, annotation, statistics, and analyses were performed using standard Unix tools and Python 3.9+ using the pandas, numpy, scipy, and scikit libraries and the interactive Jupyter notebook environment. Figures were created using Python graphics libraries matplotlib and seaborn, and OmniGraffle 7 (The Omni Group, Seattle, WA).

Results

A web portal for large-scale real-world SARS-CoV-2 viral load results for different patient groups

46,726 COVID-19 PCR results representing approximately 39,180 unique individuals were converted to viral loads and annotated for patient demographics, comorbidities, presentation, treatment, and socioeconomic status and made available for interactive investigation via a public web portal at https://arnaoutlab.org/coviral/ (Table 1). The portal^{28, 29} allows users to visualize the viral load distribution for any patient group, to compare distributions between groups, and to estimate, for each group, the sensitivity and specificity of a given OTC test for detecting contagious individuals. Users can define and compare complex subgroups by selecting multiple characteristics via checkboxes in the user interface (Fig. 1). In this work, all the figures that contain distributions are direct screenshots from the portal.

Fig. 1: Patient characteristics available for defining patient groups and subgroups.

This screenshot from the web portal demonstrates the ability for users to define patient groups by demographic, comorbidity, treatment, vaccination status, pandemic era, and so on, using checkboxes.

View this table:

Table 1: Summary of patient characteristics.

Shown are counts for select high-level categories as of April 14, 2023. Counts may differ somewhat from the counts presented through the web portal as a result of jittering and as more data continues to be added through the portal. Note that counts broken down by characteristics do not add up to the total, because of the nulling out of some data to reduce re-identification risk (see Methods).

Overall viral load distributions

Viral loads varied over nearly 10 orders of magnitude, from 7 copies/mL (the lowest our system will report) to 1.5 billion copies/mL (99^th percentile). This extraordinary range is consistent with observations from early in the pandemic (spring-summer of 2020).² The referenced early observations suggested that viral loads were, to a good approximation, distributed fairly uniformly over the range. In contrast, the current dataset, which is ten times as large (46,726 results vs. 4,774 in the previous work²), demonstrates clear bimodality: patients’ viral loads tend to be either very low, with a peak around the LOD of 100 copies/mL, or else very high, with a peak around 100 million copies/mL (Fig. 2). This bimodality is apparent in retrospect (e.g., Fig. 2a of Arnaout et al. 2021²) but required a large dataset to visualize clearly. Further research is needed to understand the reason(s) for these two peaks.

Fig. 2: Overall bimodal distribution of viral loads.

When no checkboxes are selected to constrain or partition the dataset, users see the distribution of all viral loads. The marked bimodal distribution is clearly apparent.

Viral load comparisons among patient groups: remdesivir treatment and patient presentation

The web portal allows statistical comparisons of many thousands of specific patient groups. Here we describe several examples that are illustrative of the questions that can be asked and answered using this resource. Remdesivir (Gilead Sciences, Foster City, CA) is an intravenously administered RNA polymerase inhibitor initially approved by the FDA for treatment of SARS-CoV-2 in hospitalized adults and adolescents.³⁰ Of the 46,726 test results in our dataset, 688 were from patients who then received remdesivir treatment. In practice, at our institution, remdesivir was used for sicker patients. We hypothesized that viral loads would be higher on average in patients who received remdesivir and in sicker patients. The data supported this hypothesis: viral loads were higher on average in both remdesivir-receiving and sicker-appearing patients, with means of 8.6×10⁵ copies/mL in patients who received remdesivir vs. 1.2×10⁵ in those who did not (Fig. 3a) and 9.4×10⁴ in sick-appearing patients vs. 4.1×10⁴ in well-appearing patients (Fig. 3b). In both cases, the difference was due to a greater fraction patients in the high-viral-load peak. This was especially clear in the remdesivir comparison (Fig. 3a). In each case, the KS p-value was 4.3×10^-16, which we interpret as rejecting the null hypothesis of no difference, with high confidence. These are examples in which the data confirmed hypotheses regarding differences in viral load.

Fig. 3: Viral load comparisons by remdesivir treatment, patient presentation, and outcome.

Screenshot from the web portal. Viral loads are higher in (a) patients who received remdesivir vs. patients who did not receive remdesivir and in (b) patients who presented as ill-appearing vs. patients who presented as well-appearing (see Table S1 for precise definitions). Note the bimodal distributions, with a low-viral-load peak and a high-viral-load peak. (c) Viral load distributions and pairwise p-values by presence or absence of pulmonary disease for patients during the early variant era. (d) Patients who died from COVID-19 had higher viral loads than either patients who died with COVID-19 as an incidental finding or survivors. Viral loads for were statistically indistinguishable between the latter two groups. The web portal displays distributions in a ridgeline plot from lowest to highest mean, top to bottom. (e) Because there are three or more groups, p-values are displayed as a heatmap, accompanied by explanatory text. Because KS p-values are symmetric, only the top half of the heatmap is shown. Fig. 3f-h: viral load distributions by self-reported race and by race-plus-presumed variant. (f) Viral load distributions by race. (g) Statistical comparisons among these distributions. (h) Viral load distributions between Black vs. White patients in the delta-variant era.

Unexpected findings: pulmonary disease

Serious cases of COVID-19 are marked by life- threatening respiratory distress. This evolution became less common with the emergence of the omicron strain and the increasing prevalence of prior immunological exposure including vaccination. We hypothesized that patients with pulmonary disease would have higher viral loads than patients without pulmonary disease, especially for early viral variants, which had a stronger tropism for lung as opposed to the upper respiratory tract. However, this hypothesis was not supported (Fig. 3c). Viral loads for the 975 patients with pulmonary disease tested during the early-variant era were statistically indistinguishable from those for the 27,308 patients with no pulmonary disease who were tested during the same time period. This is an example of unexpected findings that this dataset and its web-portal interface can reveal.

Comparisons among multiple groups: survivorship and causes of death

The web portal also allows users to compare more than two groups of patients at a time. In quantifying mortality during the pandemic, one distinction of value has been between individuals who died with COVID-19 as the proximal cause of death and individuals who died with COVID-19 as an incidental finding. We compared these two groups with survivors (Fig. 3d). We found that the 398 patients who died from COVID-19 in our dataset had higher viral loads than either of the other two groups, and that viral loads were statistically indistinguishable between the approximately 46,000 survivors and the 143 patients who died with COVID-19 as an incidental finding (p=0.07). For ease of comparison, the web portal displays distributions in a ridgeline plot from lowest to highest mean, top to bottom. When there are three or more groups, p-values are displayed as a heatmap, accompanied by explanatory text. (Because KS p-values are symmetric, only the top half of the heatmap is shown.)

Complex patient subgroups: race and presumed variant

The ability to interrogate complex subgroups by checking multiple boxes in the web-portal interface allows more subtle investigations. For example, Black patients have experienced disproportionate morbidity and mortality during the pandemic.³¹ However, the 13,806 patients who self-reported as White in our dataset on average had slightly higher viral loads than the 8,299 who self-reported as Black (KS p=3.3×10^-13). That the viral loads in the White group were on average higher suggests that differences in outcome between these groups are not explained by differences in viral load (Fig. 3f-g), despite the clear relationship between viral load and survivorship described above (Fig. 3d). Interestingly, the observed difference is more pronounced during the delta-variant wave (Fig. 3h). During the delta wave (July 7, 2021 to December 6, 2021), viral loads were on average three times as high for White patients (8.0×10⁵ copies/mL, n=1,024) as Black patients (2.7×10⁵ copies/mL, n=665; KS test p=5.2×10^-8) with a distinctly sharper high-viral-load peak in White patients. This difference was greater in patients over 30 years old and was almost entirely absent in patients under 30 (30-60 y.o: 398 Black patients and 719 White patients, p= 2.5×10^-5; <30 y.o.: 266 Black vs. 299 White patients, p=0.14). In contrast, viral load distributions for Black and White patients were more similar for both early in the pandemic and during the omicron variant time period (p=4.0×10^-5 for 4,393 Black and 7,042 White patients and p=0.02 for 2,295 Black and 4,341 White patients, respectively). This example illustrates the utility and (statistical) power of the portal for investigating subgroups of interest.

Antigen test performance

In the head-to-head comparison of PCR and antigen test results, 281 patients consented to participate. Of the PCR samples collected, 277 were tested; the remaining four were mishandled or leaked. Of the 277, 65 had a positive COVID-19 result by PCR (23%). The PCR-positive samples were all tested on either the Alinity m SARS-CoV-2 real time RT-PCR assay or the Alinity m Resp-4-Plex PCR assay. Viral loads in the PCR-positive patients ranged from approximately 10 to approximately 10⁹ copies/mL, with a peak in the distribution between 10⁶ and 10⁸. Of the 65 positive samples, 3 were sequenced and 20 selected at random were used to assess contagiousness in viral culture.

Antigen test performance

Of the 65 patients with positive PCR tests, 43 tested positive on the Binax antigen test and 40 tested positive on the CareStart antigen test. No invalid antigen tests (lacking the control line) were observed. Only one of the patients who tested negative by PCR tested positive on the antigen tests (both Binax and CareStart), confirming the high specificity of these tests. The proportion of positive antigen tests varied with viral load. At viral loads less than 10³ copies/mL, both antigen tests were always negative; at viral loads greater than 10⁷ copies/mL, both antigen tests were always positive. However, there was an overlap of antigen-test-positive and antigen-test-negative results at intermediate viral loads (Fig. 4a). k and v₀ values (see Methods) were comparable between the two antigen tests (k=1.184, v₀=4.538 for Binax and k=1.142, v₀=4.995 for CareStart). The resulting S-shaped curves were used to predict antigen test performance in the web portal.

Fig. 4: Antigen test results from head-to-head trial and performance on patient subgroups.

(a) Each antigen test result for each PCR-positive patient, vs. log₁₀ of viral load according to the simultaneous PCR test. (b) Antigen test performance on patients in neighborhoods stratified by median household income. (c) Performance of BinaxNow COVID-19 Ag Card on patients with median household income >$130,000. (d) Performance of BinaxNow COVID-19 Ag Card on patients with median household income <$52,000. The user can use the radio buttons to select any of the patient groups in the viral load distributions in the top section of the web portal. The imputed positive antigen-test results will appear as a lighter shade, and the negative results a darker shade.

The OTC antigen tests that have been widely available on the market since 2021 are considerably less sensitive than RT-qPCR for detecting SARS-CoV-2 infection. However, because their LODs are generally above the contagiousness threshold, they are quite sensitive for detecting contagiousness.¹⁰ Based on our clinical experience, we hypothesized that antigen tests would perform similarly on different patient groups and subgroups; this hypothesis was largely supported (Fig. 4b-d). The web portal also allows users to estimate sensitivity and specificity for the BinaxNow COVID-19 Ag Card and CareStart COVID-19 Antigen Home Test, based on the modelled performance curves, on any sufficiently large user-selected patient group (Fig 6). The two tests performed well: sensitivities for detecting contagiousness were roughly 0.85-0.90 across patient groups.

Contagiousness for omicron-era virus

For early-pandemic and delta-wave strains of SARS-CoV-2, the threshold viral load for contagiousness has previously been found to be approximately 10⁵ copies/mL.¹⁰ For omicron variants, we found that the threshold is statistically indistinguishable from this, at 4.5×10⁴ copies/mL (confidence interval, 1.1×10⁴-1.9×10⁵ copies/mL; p=0.23, Fig. 5). The omicron threshold was based on 20 PCR-positive results from our head-to-head clinical trial of 277 total patients. We confirmed that the dominant strain circulating in the Massachusetts Bay area was omicron (BA5.2/Clade 22B) via next-generation sequencing, with only rare single-nucleotide differences relative to those already described (Fig. 6a-d). Because the strain mix in Massachusetts (Fig. 6e) has been highly representative of the strain mix in the country as a whole throughout the pandemic (Fig. 6f), these results support the generalizability of these findings from a particular geographic area to the entire population.

Fig. 5: Determination of the contagiousness threshold for omicron-era SARS-CoV-2 strains.

Below a certain viral load in the patient sample (x-axis), no virus was recoverable from culture (y-axis, maximum of day 3 and day 6 supernatants). For viewing convenience, culture-negative samples are plotted at 0.1 copies/mL (dotted line, 1 copy/mL). The gray region shows the confidence interval for the threshold. The red line shows the midpoint of this region (on the log10 scale), 45,000 copies/mL, as the most likely threshold.

Fig. 6. Sequencing late-2022 strain and generalizability of Massachusetts-level results to the United States as a whole.

Results of sequencing of a BA5.2/Clade 22B patient sample from Aug.-Sep. 2022 (97.6% coverage). (a) Sample relative to COVID-19 phylogeny (with clade labels). (b) First 64 of the 72 nucleotide substitutions relative to the original Wuhan strain. (c) 52 amino acid substitutions relative to the Wuhan strain. (d) The five unique (“private”) mutations relative to the phylogenetic tree. (e) Distribution of strains in Massachusetts near the time of the sample according to covariants.org. (f) Comparison by frequency of the strains circulating in Massachusetts to those circulating in the United States at the same times demonstrating generalizability of Massachusetts-state variant patterns to the country as a whole. Red line, 1:1. Gray, early strains; purple, delta strains; green, omicron strains. R² is for least-squares linear regression of USA vs. Massachusetts data (regression slope=0.97, intercept=0.00).

Discussion

The COVID-19 pandemic has proven a catalyst for accelerating medical advances, including the development of more efficient methods for developing and testing critical diagnostic assays.^32–37 It has also drawn attention to the value of reliable public data, including large public datasets.^38–42 Here we describe such a dataset, to our knowledge the first large dataset of SARS-CoV-2 viral loads in patients across the history of the pandemic through to the present day. The rich clinical annotations of this dataset reveal similarities and differences in viral loads among patients by demographics, presentation, and comorbidity as well as by vaccination status, treatment, and socioeconomic status. Protected patient data was safeguarded by multiple mechanisms. The data revealed cases in which differences were expected as well as cases in which they were unexpected. The size of the dataset and extent of the annotation allow more comparisons than can reasonably be summarized in a single publication; availability of this dataset via the web allows anyone—clinicians, investigators, developers, regulators, and patients, alone or with the assistance of artificial intelligence and/or machine learning (AI/ML)-based tools—to explore and conduct research, to test existing hypotheses, and to generate new research questions.

The clinical utility of measuring and investigating viral load in SARS-CoV-2 has been demonstrated.^43–46 This is consistent with both the advantage of viral loads relative to Ct values^{47, 48} and their utility in earlier viral infections such as HIV and hepatitis C (HCV). SARS-CoV-2 viral loads have already proven useful in the development and characterization of COVID-19 diagnostics in multiple contexts, including testing on nasopharyngeal secretions, nasal secretions, and saliva.⁵^,6 Here we demonstrate their potential to regulators as a tool to streamline evaluation of new OTC tests. Specifically, we demonstrate that a test’s analytical performance measure, namely the LOD, can be used to estimate that test’s clinical sensitivity for detecting contagiousness in any patient group, without having to conduct a dedicated clinical trial for that group. The alternative of innumerable dedicated trials is beyond reasonable expectations of the financial capability of developers or the bandwidth of their clinical testing partners. For this reason regulatory agencies such as the FDA have expressed interest in methods that use large-scale real-world data to streamline test evaluation.

Two assumptions implicit in this approach are worth mention. First, we model the success rate of antigen tests solely as a function of viral load. The assumption is that no other non-negligible factor varies systematically between patient groups. Second, we assume that the curve— success rate as a function of viral load—can be adequately predicted from the data from a fairly small study. There are two possible sources of error in this curve: sampling error, which can be reduced by increasing the number of subjects sampled; and lack-of-fit error, the error inherent in trying to fit a function of the wrong form. The function used in the web portal’s calculations, was chosen on the basis of its history of use in dose-response-type situations, but carries the (reasonable) assumptions that the probability increases smoothly and continuously with increasing viral load, and approaches 0 with sufficiently low viral load and 1 with sufficiently high viral load. These constraints leave only two free parameters, which is desirable for statistical power and robustness to the noise inherent in any such study (e.g., sampling error and measurement error).

A user of the web portal who is accustomed to thinking of test quality solely in terms of LOD (limit of detection) might initially be surprised that the portal prefers two parameters, not one, to define an antigen test’s performance. In effect, the LOD parameter sets the location of the S-shaped curve that relates viral load and performance, and the second parameter—here, the 50% detection threshold—sets the S-shaped curve’s steepness. Without the second parameter, one could fit a curve in which the sensitivity is 0% at all viral loads below the LoD, and 95% at any higher viral load, which is clearly quite different from the relationship observed in the head-to-head study. How different the true shape of the antigen test performance curve is from the logit function used here, and thus whether fitting a different function would work better, can be elucidated by larger head-to-head studies; however, in the midst of a public health emergency, the cost in time of sampling more subjects must be weighed against the value of complete certainty. At any rate, the fitting error was low.

Three important limitations to the dataset in its current form also deserve mention. The first is due to incompleteness of some of the data fields, for example presentation and vaccination status (Table 1). Presentation information was only sometimes available in structured form in our data repository; we did not attempt to extract data from notes to complement incomplete records. Vaccination status was likewise only sometimes available in a structured manner; integration with state-level records could potentially fill in missing records. Second, patient-level annotations are not yet available for download as part of the dataset. Making viral loads freely and easily available for patient groups required significant attention to avoid potential loopholes that might risk patient privacy via identifiability. Methods included suppressing data transmission for groups that were small enough to present potential “journalist risk,”⁴⁹ jittering counts to prevent deduction of the sizes of suppressed groups, and rounding viral loads to two log-scale decimal places.⁵⁰ Further work is necessary to make patient-level annotations available. Third and finally, the size of the dataset, while large, is still insufficient for the smallest groups—for example, cystic fibrosis patients, patients who recently delivered, or Native Americans—to be sufficiently sizeable to draw robust conclusions. One solution is to add data from other care-giving institutions that performed substantial COVID-19 testing; another is to supplement existing large datasets, for example the 50-million-person CVD-COVID-UK initiative, with viral loads. The free availability of methods to convert from Ct values to viral loads facilitates such advances.⁴

As part of the drive toward precision medicine, clinical care benefits from personalization of diagnostic testing: the right test for the right patient, where the importance of patient heterogeneity is increasingly accepted. We have demonstrated that large-scale real-world data can assist this effort by enabling the estimation of personalized clinical sensitivities and specificities without the need for dedicated clinical trials on every patient group. This work is generalizable beyond COVID-19. Laboratory testing is an exceptionally rich source of real-world medical information. It is the highest volume medical activity, with some 20 billion tests performed each year in the United States alone. It is also the most cost-effective, costing just pennies on the healthcare dollar. It is integral to decision-making across medicine, for patients at every level of acuity, from screening to emergencies. Its results are almost always numerical or categorical, making it especially amenable to modern approaches like machine learning. And computational re-analysis is substantially less expensive than de novo trials. The present work supports the view that meaningful value can come economically from additional repurposing of the vast stores of real-world laboratory results that exist in healthcare institutions.

Data Availability

All data produced are available online at https://arnaoutlab.org/coviral

https://arnaoutlab.org/coviral

Author Contributions

Conceptualization, J.K. and R.A.; Software, A.M.; Investigation, D.H., E.C., S.D., and M.Y.; Resources: S.R.; Data Curation, A.M.; Writing — Original Draft, R.A. and A.M.; Writing — Review & Editing, P.K. and J.K.; Visualization, T.S., D.B., A.M., and R.A.; Supervision, R.A; Funding Acquisition, P.K., S.R., and R.A.

Declaration of interests

The authors declare no competing interests.

Acknowledgements

The authors acknowledge Elliot Hill for assistance on plugging ct2vl into the data processing workflow; to Timothy Graham, Gail Piatkowski, Baevin Feeser, and Griffin Weber for their advice and guidance on extracting data from EMR databases; to funding from the Reagan-Udall Foundation for the FDA (R.A.); and to funding via a Novel Therapeutics Delivery Grant from the Massachusetts Life Sciences Center (J.E.K.). We would also like to acknowledge Abbott (Scarborough, ME) for provision of the Binax Now antigen tests and Ginkgo Bioworks (Boston, MA) for provision of CareStart antigen tests used in this study.

Footnotes

↵* Lead Contact

References

1.↵
National Research Council, Division on Earth and Life Studies, Board on Life Sciences, Committee on a Framework for Developing a New Taxonomy of Disease (2011). Toward Precision Medicine: Building a Knowledge Network for Biomedical Research and a New Taxonomy of Disease (National Academies Press).
2.↵
Arnaout, R., Lee, R.A., Lee, G.R., Callahan, C., Cheng, A., Yen, C.F., Smith, K.P., Arora, R., and Kirby, J.E. (2021). The Limit of Detection Matters: The Case for Benchmarking Severe Acute Respiratory Syndrome Coronavirus 2 Testing. Clin Infect Dis. doi:10.1093/cid/ciaa1382.
OpenUrl CrossRef
3.↵
Wild, D. ed. (2013). The immunoassay handbook: theory and applications of ligand binding, ELISA, and related techniques 4th ed. (Elsevier).
4.↵
Hill, E.D., Yilmaz, F., Callahan, C., Cheng, A., Braun, J., and Arnaout, R. (2022). ct2vl : Converting Ct Values to Viral Loads for SARS-CoV-2 RT-qPCR Test Results (Microbiology) doi:10.1101/2022.06.20.496929.
OpenUrl Abstract/FREE Full Text
5.↵
Callahan Cody, Lee Rose A., Lee Ghee Rye, Zulauf Kate, Kirby James E., Arnaout Ramy, and Miller Melissa B. Nasal Swab Performance by Collection Timing, Procedure, and Method of Transport for Patients with SARS-CoV-2. Journal of Clinical Microbiology 59, e00569–21. doi:10.1128/JCM.00569-21.
OpenUrl CrossRef
6.↵
Callahan, C., Ditelberg, S., Dutta, S., Littlehale, N., Cheng, A., Kupczewski, K., McVay, D., Riedel, S., Kirby, J.E., and Arnaout, R. (2021). Saliva is Comparable to Nasopharyngeal Swabs for Molecular Detection of SARS-CoV-2. Microbiol Spectr 9, e0016221. doi:10.1128/Spectrum.00162-21.
OpenUrl CrossRef
7.↵
Pilarowski, G., Lebel, P., Sunshine, S., Liu, J., Crawford, E., Marquez, C., Rubio, L., Chamie, G., Martinez, J., Peng, J., et al. (2021). Performance Characteristics of a Rapid Severe Acute Respiratory Syndrome Coronavirus 2 Antigen Detection Assay at a Public Plaza Testing Site in San Francisco. J Infect Dis 223, 1139–1144. doi:10.1093/infdis/jiaa802.
OpenUrl CrossRef PubMed
8.↵
Mina, M.J., Parker, R., and Larremore, D.B. (2020). Rethinking Covid-19 Test Sensitivity - A Strategy for Containment. N Engl J Med 383, e120. doi:10.1056/NEJMp2025631.
OpenUrl CrossRef PubMed
9.↵
Stanley, S., Hamel, D.J., Wolf, I.D., Riedel, S., Dutta, S., Contreras, E., Callahan, C.J., Cheng, A., Arnaout, R., Kirby, J.E., et al. (2022). Limit of Detection for Rapid Antigen Testing of the SARS-CoV-2 Omicron and Delta Variants of Concern Using Live-Virus Culture. J Clin Microbiol 60, e00140–22. doi:10.1128/jcm.00140-22.
OpenUrl CrossRef
10.↵
Kirby, J.E., Riedel, S., Dutta, S., Arnaout, R., Cheng, A., Ditelberg, S., Hamel, D.J., Chang, C.A., and Kanki, P.J. (2022). SARS-CoV-2 Antigen Tests Predict Infectivity based on viral culture: comparison of antigen, PCR viral load, and viral culture testing on a large Sample Cohort. Clinical Microbiology and Infection, S1198743X22003743. doi:10.1016/j.cmi.2022.07.010.
OpenUrl CrossRef
11.↵
Charlson, M.E., Pompei, P., Ales, K.L., and MacKenzie, C.R. (1987). A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J Chronic Dis 40, 373–383. doi:10.1016/0021-9681(87)90171-8.
OpenUrl CrossRef PubMed Web of Science
12.↵
Greenberg, J.A., Hohmann, S.F., Hall, J.B., Kress, J.P., and David, M.Z. (2016). Validation of a Method to Identify Immunocompromised Patients with Severe Sepsis in Administrative Databases. Annals ATS 13, 253–258. doi:10.1513/AnnalsATS.201507-415BC.
OpenUrl CrossRef
13.↵
Knight, S.R., Ho, A., Pius, R., Buchan, I., Carson, G., Drake, T.M., Dunning, J., Fairfield, C.J., Gamble, C., Green, C.A., et al. (2020). Risk stratification of patients admitted to hospital with covid-19 using the ISARIC WHO Clinical Characterisation Protocol: development and validation of the 4C Mortality Score. BMJ 370, m3339. doi:10.1136/bmj.m3339.
OpenUrl Abstract/FREE Full Text
14.↵
Elixhauser, A., Steiner, C., Harris, D.R., and Coffey, R.M. (1998). Comorbidity measures for use with administrative data. Med Care 36, 8–27. doi:10.1097/00005650-199801000-00004.
OpenUrl CrossRef PubMed Web of Science
15.↵
Shain, E.B., and Clemens, J.M. (2008). A new method for robust quantitative and qualitative analysis of real-time PCR. Nucleic Acids Res 36, e91. doi:10.1093/nar/gkn408.
OpenUrl CrossRef PubMed
16.↵
Kirby, J.E., Cheng, A., Cleveland, M.H., Degli-Angeli, E., DeMarco, C.T., Faron, M., Gallagher, T., Garlick, R.K., Goecker, E., Coombs, R.W., et al. (2022). A Multi-Institutional Study Benchmarking Cycle Threshold Values for Major Clinical SARS-CoV-2 RT-PCR Assays. 2022.06.22.22276072. doi:10.1101/2022.06.22.22276072.
OpenUrl Abstract/FREE Full Text
17.↵
CoVariants https://covariants.org/.
18.↵
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12, 2825–2830.
OpenUrl CrossRef
19.↵
Wölfel, R., Corman, V.M., Guggemos, W., Seilmaier, M., Zange, S., Müller, M.A., Niemeyer, D., Jones, T.C., Vollmar, P., Rothe, C., et al. (2020). Virological assessment of hospitalized patients with COVID-2019. Nature 581, 465–469. doi:10.1038/s41586-020-2196-x.
OpenUrl CrossRef PubMed
20.
La Scola, B., Le Bideau, M., Andreani, J., Hoang, V.T., Grimaldier, C., Colson, P., Gautret, P., and Raoult, D. (2020). Viral RNA load as determined by cell culture as a management tool for discharge of SARS-CoV-2 patients from infectious disease wards. Eur J Clin Microbiol Infect Dis 39, 1059–1061. doi:10.1007/s10096-020-03913-9.
OpenUrl CrossRef PubMed
21.↵
Huang, C.-G., Lee, K.-M., Hsiao, M.-J., Yang, S.-L., Huang, P.-N., Gong, Y.-N., Hsieh, T.-H., Huang, P.-W., Lin, Y.-J., Liu, Y.-C., et al. (2020). Culture-Based Virus Isolation To Evaluate Potential Infectivity of Clinical Specimens Tested for COVID-19. J Clin Microbiol 58, e01068–20. doi:10.1128/JCM.01068-20.
OpenUrl Abstract/FREE Full Text
22.↵
Artic Network https://artic.network/ncov-2019/ncov2019-bioinformatics-sop.html.
23.↵
Quick, J., Grubaugh, N.D., Pullan, S.T., Claro, I.M., Smith, A.D., Gangavarapu, K., Oliveira, G., Robles-Sikisaka, R., Rogers, T.F., Beutler, N.A., et al. (2017). Multiplex PCR method for MinION and Illumina sequencing of Zika and other virus genomes directly from clinical samples. Nat Protoc 12, 1261–1276. doi:10.1038/nprot.2017.066.
OpenUrl CrossRef PubMed
24.↵
Hadfield, J., Megill, C., Bell, S.M., Huddleston, J., Potter, B., Callender, C., Sagulenko, P., Bedford, T., and Neher, R.A. (2018). Nextstrain: real-time tracking of pathogen evolution. Bioinformatics 34, 4121–4123. doi:10.1093/bioinformatics/bty407.
OpenUrl CrossRef PubMed
25.↵
Pratt, J.W., and Gibbons, J.D. Concepts of nonparametric theory (Springer International Publishing).
26.↵
Benjamini, Y., and Hochberg, Y. (1995). Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society: Series B (Methodological) 57, 289–300. https://doi.org/10.1111/j.2517-6161.1995.tb02031.x.
OpenUrl CrossRef PubMed Web of Science
27.↵
Tukey, J. (1953). Multiple Comparisons. J. Am. Stat. Assoc. 48, 624–625.
OpenUrl
28.↵
Morgan, A. COVIRAL webapp back-end. https://github.com/chhotii-alex/antigen-flask.
29.↵
Morgan, A. COVIRAL webapp front-end. https://github.com/chhotii-alex/antigen-sensitivity.
30.↵
Lamb, Y.N. (2020). Remdesivir: First Approval. Drugs 80, 1355–1363. doi:10.1007/s40265-020-01378-w.
OpenUrl CrossRef
31.↵
Ahmed, A., Song, Y., and Wadhera, R.K. (2022). Racial/Ethnic Disparities in Delaying or Not Receiving Medical Care During the COVID-19 Pandemic. J GEN INTERN MED 37, 1341–1343. doi:10.1007/s11606-022-07406-7.
OpenUrl CrossRef
32.↵
Watson, C. (2022). Rise of the preprint: how rapid data sharing during COVID-19 has changed science forever. Nature Medicine 28, 2–5. doi:10.1038/s41591-021-01654-6.
OpenUrl CrossRef
33.
Rando, H.M., Boca, S.M., McGowan, L.D., Himmelstein, D.S., Robson, M.P., Rubinetti, V., Velazquez, R., Greene, C.S., and Gitter, A. (2021). An Open-Publishing Response to the COVID-19 Infodemic. CEUR Workshop Proc 2976, 29–38.
OpenUrl
34.
Ross, J.S. (2021). Covid-19, open science, and the CVD-COVID-UK initiative. BMJ 373, n898. doi:10.1136/bmj.n898.
OpenUrl FREE Full Text
35.
Callahan, C.J., Lee, R., Zulauf, K.E., Tamburello, L., Smith, K.P., Previtera, J., Cheng, A., Green, A., Azim, A.A., Yano, A., et al. (2020). Open Development and Clinical Validation Of Multiple 3D-Printed Nasopharyngeal Collection Swabs: Rapid Resolution of a Critical COVID-19 Testing Bottleneck. J Clin Microbiol, JCM.00876-20, jcm;JCM.00876-20v1. doi:10.1128/JCM.00876-20.
OpenUrl Abstract/FREE Full Text
36.
Arnaout, R.A. (2021). Cooperation under Pressure: Lessons from the COVID-19 Swab Crisis. Journal of Clinical Microbiology 59, e01239–21. doi:10.1128/JCM.01239-21.
OpenUrl CrossRef
37.↵
Tse, E.G., Klug, D.M., and Todd, M.H. (2020). Open science approaches to COVID-19. F1000Res 9, 1043. doi:10.12688/f1000research.26084.1.
OpenUrl CrossRef
38.↵
Perillat, L., and Baigrie, B.S. (2021). COVID-19 and the generation of novel scientific knowledge: Evidence-based decisions and data sharing. J Eval Clin Pract 27, 708–715. doi:10.1111/jep.13548.
OpenUrl CrossRef
39.
Li, R., von Isenburg, M., Levenstein, M., Neumann, S., Wood, J., and Sim, I. (2021). COVID-19 trials: declarations of data sharing intentions at trial registration and at publication. Trials 22, 153. doi:10.1186/s13063-021-05104-z.
OpenUrl CrossRef
40.
Gardener, A.D., Hick, E.J., Jacklin, C., Tan, G., Cashin, A.G., Lee, H., Nunan, D., Toomey, E.C., and Richards, G.C. (2022). Open science and conflict of interest policies of medical and health sciences journals before and during the COVID-19 pandemic: A repeat cross-sectional study: Open science policies of medical journals. JRSM Open 13, 20542704221132139. doi:10.1177/20542704221132139.
OpenUrl CrossRef
41.
Besançon, L., Peiffer-Smadja, N., Segalas, C., Jiang, H., Masuzzo, P., Smout, C., Billy, E., Deforet, M., and Leyrat, C. (2021). Open science saves lives: lessons from the COVID-19 pandemic. BMC Med Res Methodol 21, 117. doi:10.1186/s12874-021-01304-y.
OpenUrl CrossRef
42.↵
Wood, A., Denholm, R., Hollings, S., Cooper, J., Ip, S., Walker, V., Denaxas, S., Akbari, A., Banerjee, A., Whiteley, W., et al. (2021). Linked electronic health records for research on a nationwide cohort of more than 54 million people in England: data resource. BMJ 373, n826. doi:10.1136/bmj.n826.
OpenUrl Abstract/FREE Full Text
43.↵
Rao, S.N., Manissero, D., Steele, V.R., and Pareja, J. (2020). A Systematic Review of the Clinical Utility of Cycle Threshold Values in the Context of COVID-19. Infect Dis Ther 9, 573–586. doi:10.1007/s40121-020-00324-3.
OpenUrl CrossRef PubMed
44.
Satlin, M.J., Chen, L., Gomez-Simmonds, A., Marino, J., Weston, G., Bhowmick, T., Seo, S.K., Sperber, S.J., Kim, A.C., Eilertson, B., et al. (2022). Impact of a Rapid Molecular Test for Klebsiella pneumoniae Carbapenemase and Ceftazidime-Avibactam Use on Outcomes After Bacteremia Caused by Carbapenem-Resistant Enterobacterales. Clinical Infectious Diseases 75, 2066–2075. doi:10.1093/cid/ciac354.
OpenUrl CrossRef
45.
Savela, E.S., Viloria Winnett, A., Romano, A.E., Porter, M.K., Shelby, N., Akana, R., Ji, J., Cooper, M.M., Schlenker, N.W., Reyes, J.A., et al. (2022). Quantitative SARS-CoV-2 Viral-Load Curves in Paired Saliva Samples and Nasal Swabs Inform Appropriate Respiratory Sampling Site and Analytical Test Sensitivity Required for Earliest Viral Detection. J Clin Microbiol 60, e0178521. doi:10.1128/JCM.01785-21.
OpenUrl CrossRef
46.↵
Viloria Winnett, A., Porter, M.K., Romano, A.E., Savela, E.S., Akana, R., Shelby, N., Reyes, J.A., Schlenker, N.W., Cooper, M.M., Carter, A.M., et al. (2022). Morning SARS-CoV-2 Testing Yields Better Detection of Infection Due to Higher Viral Loads in Saliva and Nasal Swabs upon Waking. Microbiol Spectr 10, e0387322. doi:10.1128/spectrum.03873-22.
OpenUrl CrossRef
47.↵
Brümmer, L.E., Katzenschlager, S., Gaeddert, M., Erdmann, C., Schmitz, S., Bota, M., Grilli, M., Larmann, J., Weigand, M.A., Pollock, N.R., et al. (2021). Accuracy of novel antigen rapid diagnostics for SARS-CoV-2: A living systematic review and meta-analysis. PLoS Med 18, e1003735. doi:10.1371/journal.pmed.1003735.
OpenUrl CrossRef PubMed
48.↵
Lee, R.A., Herigon, J.C., Benedetti, A., Pollock, N.R., and Denkinger, C.M. (2021). Performance of Saliva, Oropharyngeal Swabs, and Nasal Swabs for SARS-CoV-2 Molecular Detection: A Systematic Review and Meta-analysis. Journal of Clinical Microbiology. doi:10.1128/JCM.02881-20.
OpenUrl Abstract/FREE Full Text
49.↵
El Emam, K. (2013). Guide to the de-Identification of Personal Health Information (Auerbach Publishers, Incorporated).
50.↵
El Emam, K., Rodgers, S., and Malin, B. (2015). Anonymising and sharing individual patient data. BMJ 350, h1139. doi:10.1136/bmj.h1139.
OpenUrl FREE Full Text

View the discussion thread.

Posted May 07, 2023.

Download PDF

Data/Code

Citation Tools

Subject Area

Pathology

Subject Areas

All Articles

Addiction Medicine (405)
Allergy and Immunology (718)
Anesthesia (210)
Cardiovascular Medicine (3002)
Dentistry and Oral Medicine (339)
Dermatology (256)
Emergency Medicine (449)
Endocrinology (including Diabetes Mellitus and Metabolic Disease) (1061)
Epidemiology (12898)
Forensic Medicine (12)
Gastroenterology (840)
Genetic and Genomic Medicine (4697)
Geriatric Medicine (432)
Health Economics (742)
Health Informatics (2986)
Health Policy (1081)
Health Systems and Quality Improvement (1102)
Hematology (399)
HIV/AIDS (942)
Infectious Diseases (except HIV/AIDS) (14211)
Intensive Care and Critical Care Medicine (866)
Medical Education (439)
Medical Ethics (116)
Nephrology (481)
Neurology (4486)
Nursing (239)
Nutrition (657)
Obstetrics and Gynecology (827)
Occupational and Environmental Health (746)
Oncology (2329)
Ophthalmology (659)
Orthopedics (262)
Otolaryngology (330)
Pain Medicine (290)
Palliative Medicine (85)
Pathology (506)
Pediatrics (1218)
Pharmacology and Therapeutics (514)
Primary Care Research (509)
Psychiatry and Clinical Psychology (3848)
Public and Global Health (7072)
Radiology and Imaging (1570)
Rehabilitation Medicine and Physical Therapy (940)
Respiratory Medicine (933)
Rheumatology (454)
Sexual and Reproductive Health (453)
Sports Medicine (390)
Surgery (497)
Toxicology (62)
Transplantation (214)
Urology (187)

[1] 1.↵
National Research Council, Division on Earth and Life Studies, Board on Life Sciences, Committee on a Framework for Developing a New Taxonomy of Disease (2011). Toward Precision Medicine: Building a Knowledge Network for Biomedical Research and a New Taxonomy of Disease (National Academies Press).

[2] 2.↵
Arnaout, R., Lee, R.A., Lee, G.R., Callahan, C., Cheng, A., Yen, C.F., Smith, K.P., Arora, R., and Kirby, J.E. (2021). The Limit of Detection Matters: The Case for Benchmarking Severe Acute Respiratory Syndrome Coronavirus 2 Testing. Clin Infect Dis. doi:10.1093/cid/ciaa1382.
OpenUrl CrossRef

[3] 3.↵
Wild, D. ed. (2013). The immunoassay handbook: theory and applications of ligand binding, ELISA, and related techniques 4th ed. (Elsevier).

[4] 4.↵
Hill, E.D., Yilmaz, F., Callahan, C., Cheng, A., Braun, J., and Arnaout, R. (2022). ct2vl : Converting Ct Values to Viral Loads for SARS-CoV-2 RT-qPCR Test Results (Microbiology) doi:10.1101/2022.06.20.496929.
OpenUrl Abstract/FREE Full Text

[5] 5.↵
Callahan Cody, Lee Rose A., Lee Ghee Rye, Zulauf Kate, Kirby James E., Arnaout Ramy, and Miller Melissa B. Nasal Swab Performance by Collection Timing, Procedure, and Method of Transport for Patients with SARS-CoV-2. Journal of Clinical Microbiology 59, e00569–21. doi:10.1128/JCM.00569-21.
OpenUrl CrossRef

[6] 6.↵
Callahan, C., Ditelberg, S., Dutta, S., Littlehale, N., Cheng, A., Kupczewski, K., McVay, D., Riedel, S., Kirby, J.E., and Arnaout, R. (2021). Saliva is Comparable to Nasopharyngeal Swabs for Molecular Detection of SARS-CoV-2. Microbiol Spectr 9, e0016221. doi:10.1128/Spectrum.00162-21.
OpenUrl CrossRef

[7] 7.↵
Pilarowski, G., Lebel, P., Sunshine, S., Liu, J., Crawford, E., Marquez, C., Rubio, L., Chamie, G., Martinez, J., Peng, J., et al. (2021). Performance Characteristics of a Rapid Severe Acute Respiratory Syndrome Coronavirus 2 Antigen Detection Assay at a Public Plaza Testing Site in San Francisco. J Infect Dis 223, 1139–1144. doi:10.1093/infdis/jiaa802.
OpenUrl CrossRef PubMed

[8] 8.↵
Mina, M.J., Parker, R., and Larremore, D.B. (2020). Rethinking Covid-19 Test Sensitivity - A Strategy for Containment. N Engl J Med 383, e120. doi:10.1056/NEJMp2025631.
OpenUrl CrossRef PubMed

[9] 9.↵
Stanley, S., Hamel, D.J., Wolf, I.D., Riedel, S., Dutta, S., Contreras, E., Callahan, C.J., Cheng, A., Arnaout, R., Kirby, J.E., et al. (2022). Limit of Detection for Rapid Antigen Testing of the SARS-CoV-2 Omicron and Delta Variants of Concern Using Live-Virus Culture. J Clin Microbiol 60, e00140–22. doi:10.1128/jcm.00140-22.
OpenUrl CrossRef

[10] 10.↵
Kirby, J.E., Riedel, S., Dutta, S., Arnaout, R., Cheng, A., Ditelberg, S., Hamel, D.J., Chang, C.A., and Kanki, P.J. (2022). SARS-CoV-2 Antigen Tests Predict Infectivity based on viral culture: comparison of antigen, PCR viral load, and viral culture testing on a large Sample Cohort. Clinical Microbiology and Infection, S1198743X22003743. doi:10.1016/j.cmi.2022.07.010.
OpenUrl CrossRef

[11] 11.↵
Charlson, M.E., Pompei, P., Ales, K.L., and MacKenzie, C.R. (1987). A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J Chronic Dis 40, 373–383. doi:10.1016/0021-9681(87)90171-8.
OpenUrl CrossRef PubMed Web of Science

[12] 12.↵
Greenberg, J.A., Hohmann, S.F., Hall, J.B., Kress, J.P., and David, M.Z. (2016). Validation of a Method to Identify Immunocompromised Patients with Severe Sepsis in Administrative Databases. Annals ATS 13, 253–258. doi:10.1513/AnnalsATS.201507-415BC.
OpenUrl CrossRef

[13] 13.↵
Knight, S.R., Ho, A., Pius, R., Buchan, I., Carson, G., Drake, T.M., Dunning, J., Fairfield, C.J., Gamble, C., Green, C.A., et al. (2020). Risk stratification of patients admitted to hospital with covid-19 using the ISARIC WHO Clinical Characterisation Protocol: development and validation of the 4C Mortality Score. BMJ 370, m3339. doi:10.1136/bmj.m3339.
OpenUrl Abstract/FREE Full Text

[14] 14.↵
Elixhauser, A., Steiner, C., Harris, D.R., and Coffey, R.M. (1998). Comorbidity measures for use with administrative data. Med Care 36, 8–27. doi:10.1097/00005650-199801000-00004.
OpenUrl CrossRef PubMed Web of Science

[15] 15.↵
Shain, E.B., and Clemens, J.M. (2008). A new method for robust quantitative and qualitative analysis of real-time PCR. Nucleic Acids Res 36, e91. doi:10.1093/nar/gkn408.
OpenUrl CrossRef PubMed

[16] 16.↵
Kirby, J.E., Cheng, A., Cleveland, M.H., Degli-Angeli, E., DeMarco, C.T., Faron, M., Gallagher, T., Garlick, R.K., Goecker, E., Coombs, R.W., et al. (2022). A Multi-Institutional Study Benchmarking Cycle Threshold Values for Major Clinical SARS-CoV-2 RT-PCR Assays. 2022.06.22.22276072. doi:10.1101/2022.06.22.22276072.
OpenUrl Abstract/FREE Full Text

[17] 17.↵
CoVariants https://covariants.org/.

[18] 18.↵
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12, 2825–2830.
OpenUrl CrossRef

[19] 19.↵
Wölfel, R., Corman, V.M., Guggemos, W., Seilmaier, M., Zange, S., Müller, M.A., Niemeyer, D., Jones, T.C., Vollmar, P., Rothe, C., et al. (2020). Virological assessment of hospitalized patients with COVID-2019. Nature 581, 465–469. doi:10.1038/s41586-020-2196-x.
OpenUrl CrossRef PubMed

[20] 20.
La Scola, B., Le Bideau, M., Andreani, J., Hoang, V.T., Grimaldier, C., Colson, P., Gautret, P., and Raoult, D. (2020). Viral RNA load as determined by cell culture as a management tool for discharge of SARS-CoV-2 patients from infectious disease wards. Eur J Clin Microbiol Infect Dis 39, 1059–1061. doi:10.1007/s10096-020-03913-9.
OpenUrl CrossRef PubMed

[21] 21.↵
Huang, C.-G., Lee, K.-M., Hsiao, M.-J., Yang, S.-L., Huang, P.-N., Gong, Y.-N., Hsieh, T.-H., Huang, P.-W., Lin, Y.-J., Liu, Y.-C., et al. (2020). Culture-Based Virus Isolation To Evaluate Potential Infectivity of Clinical Specimens Tested for COVID-19. J Clin Microbiol 58, e01068–20. doi:10.1128/JCM.01068-20.
OpenUrl Abstract/FREE Full Text

[22] 22.↵
Artic Network https://artic.network/ncov-2019/ncov2019-bioinformatics-sop.html.

[23] 23.↵
Quick, J., Grubaugh, N.D., Pullan, S.T., Claro, I.M., Smith, A.D., Gangavarapu, K., Oliveira, G., Robles-Sikisaka, R., Rogers, T.F., Beutler, N.A., et al. (2017). Multiplex PCR method for MinION and Illumina sequencing of Zika and other virus genomes directly from clinical samples. Nat Protoc 12, 1261–1276. doi:10.1038/nprot.2017.066.
OpenUrl CrossRef PubMed

[24] 24.↵
Hadfield, J., Megill, C., Bell, S.M., Huddleston, J., Potter, B., Callender, C., Sagulenko, P., Bedford, T., and Neher, R.A. (2018). Nextstrain: real-time tracking of pathogen evolution. Bioinformatics 34, 4121–4123. doi:10.1093/bioinformatics/bty407.
OpenUrl CrossRef PubMed

[25] 25.↵
Pratt, J.W., and Gibbons, J.D. Concepts of nonparametric theory (Springer International Publishing).

[26] 26.↵
Benjamini, Y., and Hochberg, Y. (1995). Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society: Series B (Methodological) 57, 289–300. https://doi.org/10.1111/j.2517-6161.1995.tb02031.x.
OpenUrl CrossRef PubMed Web of Science

[27] 27.↵
Tukey, J. (1953). Multiple Comparisons. J. Am. Stat. Assoc. 48, 624–625.
OpenUrl

[28] 28.↵
Morgan, A. COVIRAL webapp back-end. https://github.com/chhotii-alex/antigen-flask.

[29] 29.↵
Morgan, A. COVIRAL webapp front-end. https://github.com/chhotii-alex/antigen-sensitivity.

[30] 30.↵
Lamb, Y.N. (2020). Remdesivir: First Approval. Drugs 80, 1355–1363. doi:10.1007/s40265-020-01378-w.
OpenUrl CrossRef

[31] 31.↵
Ahmed, A., Song, Y., and Wadhera, R.K. (2022). Racial/Ethnic Disparities in Delaying or Not Receiving Medical Care During the COVID-19 Pandemic. J GEN INTERN MED 37, 1341–1343. doi:10.1007/s11606-022-07406-7.
OpenUrl CrossRef

[32] 32.↵
Watson, C. (2022). Rise of the preprint: how rapid data sharing during COVID-19 has changed science forever. Nature Medicine 28, 2–5. doi:10.1038/s41591-021-01654-6.
OpenUrl CrossRef

[33] 33.
Rando, H.M., Boca, S.M., McGowan, L.D., Himmelstein, D.S., Robson, M.P., Rubinetti, V., Velazquez, R., Greene, C.S., and Gitter, A. (2021). An Open-Publishing Response to the COVID-19 Infodemic. CEUR Workshop Proc 2976, 29–38.
OpenUrl

[34] 34.
Ross, J.S. (2021). Covid-19, open science, and the CVD-COVID-UK initiative. BMJ 373, n898. doi:10.1136/bmj.n898.
OpenUrl FREE Full Text

[35] 35.
Callahan, C.J., Lee, R., Zulauf, K.E., Tamburello, L., Smith, K.P., Previtera, J., Cheng, A., Green, A., Azim, A.A., Yano, A., et al. (2020). Open Development and Clinical Validation Of Multiple 3D-Printed Nasopharyngeal Collection Swabs: Rapid Resolution of a Critical COVID-19 Testing Bottleneck. J Clin Microbiol, JCM.00876-20, jcm;JCM.00876-20v1. doi:10.1128/JCM.00876-20.
OpenUrl Abstract/FREE Full Text

[36] 36.
Arnaout, R.A. (2021). Cooperation under Pressure: Lessons from the COVID-19 Swab Crisis. Journal of Clinical Microbiology 59, e01239–21. doi:10.1128/JCM.01239-21.
OpenUrl CrossRef

[37] 37.↵
Tse, E.G., Klug, D.M., and Todd, M.H. (2020). Open science approaches to COVID-19. F1000Res 9, 1043. doi:10.12688/f1000research.26084.1.
OpenUrl CrossRef

[38] 38.↵
Perillat, L., and Baigrie, B.S. (2021). COVID-19 and the generation of novel scientific knowledge: Evidence-based decisions and data sharing. J Eval Clin Pract 27, 708–715. doi:10.1111/jep.13548.
OpenUrl CrossRef

[39] 39.
Li, R., von Isenburg, M., Levenstein, M., Neumann, S., Wood, J., and Sim, I. (2021). COVID-19 trials: declarations of data sharing intentions at trial registration and at publication. Trials 22, 153. doi:10.1186/s13063-021-05104-z.
OpenUrl CrossRef

[40] 40.
Gardener, A.D., Hick, E.J., Jacklin, C., Tan, G., Cashin, A.G., Lee, H., Nunan, D., Toomey, E.C., and Richards, G.C. (2022). Open science and conflict of interest policies of medical and health sciences journals before and during the COVID-19 pandemic: A repeat cross-sectional study: Open science policies of medical journals. JRSM Open 13, 20542704221132139. doi:10.1177/20542704221132139.
OpenUrl CrossRef

[41] 41.
Besançon, L., Peiffer-Smadja, N., Segalas, C., Jiang, H., Masuzzo, P., Smout, C., Billy, E., Deforet, M., and Leyrat, C. (2021). Open science saves lives: lessons from the COVID-19 pandemic. BMC Med Res Methodol 21, 117. doi:10.1186/s12874-021-01304-y.
OpenUrl CrossRef

[42] 42.↵
Wood, A., Denholm, R., Hollings, S., Cooper, J., Ip, S., Walker, V., Denaxas, S., Akbari, A., Banerjee, A., Whiteley, W., et al. (2021). Linked electronic health records for research on a nationwide cohort of more than 54 million people in England: data resource. BMJ 373, n826. doi:10.1136/bmj.n826.
OpenUrl Abstract/FREE Full Text

[43] 43.↵
Rao, S.N., Manissero, D., Steele, V.R., and Pareja, J. (2020). A Systematic Review of the Clinical Utility of Cycle Threshold Values in the Context of COVID-19. Infect Dis Ther 9, 573–586. doi:10.1007/s40121-020-00324-3.
OpenUrl CrossRef PubMed

[44] 44.
Satlin, M.J., Chen, L., Gomez-Simmonds, A., Marino, J., Weston, G., Bhowmick, T., Seo, S.K., Sperber, S.J., Kim, A.C., Eilertson, B., et al. (2022). Impact of a Rapid Molecular Test for Klebsiella pneumoniae Carbapenemase and Ceftazidime-Avibactam Use on Outcomes After Bacteremia Caused by Carbapenem-Resistant Enterobacterales. Clinical Infectious Diseases 75, 2066–2075. doi:10.1093/cid/ciac354.
OpenUrl CrossRef

[45] 45.
Savela, E.S., Viloria Winnett, A., Romano, A.E., Porter, M.K., Shelby, N., Akana, R., Ji, J., Cooper, M.M., Schlenker, N.W., Reyes, J.A., et al. (2022). Quantitative SARS-CoV-2 Viral-Load Curves in Paired Saliva Samples and Nasal Swabs Inform Appropriate Respiratory Sampling Site and Analytical Test Sensitivity Required for Earliest Viral Detection. J Clin Microbiol 60, e0178521. doi:10.1128/JCM.01785-21.
OpenUrl CrossRef

[46] 46.↵
Viloria Winnett, A., Porter, M.K., Romano, A.E., Savela, E.S., Akana, R., Shelby, N., Reyes, J.A., Schlenker, N.W., Cooper, M.M., Carter, A.M., et al. (2022). Morning SARS-CoV-2 Testing Yields Better Detection of Infection Due to Higher Viral Loads in Saliva and Nasal Swabs upon Waking. Microbiol Spectr 10, e0387322. doi:10.1128/spectrum.03873-22.
OpenUrl CrossRef

[47] 47.↵
Brümmer, L.E., Katzenschlager, S., Gaeddert, M., Erdmann, C., Schmitz, S., Bota, M., Grilli, M., Larmann, J., Weigand, M.A., Pollock, N.R., et al. (2021). Accuracy of novel antigen rapid diagnostics for SARS-CoV-2: A living systematic review and meta-analysis. PLoS Med 18, e1003735. doi:10.1371/journal.pmed.1003735.
OpenUrl CrossRef PubMed

[48] 48.↵
Lee, R.A., Herigon, J.C., Benedetti, A., Pollock, N.R., and Denkinger, C.M. (2021). Performance of Saliva, Oropharyngeal Swabs, and Nasal Swabs for SARS-CoV-2 Molecular Detection: A Systematic Review and Meta-analysis. Journal of Clinical Microbiology. doi:10.1128/JCM.02881-20.
OpenUrl Abstract/FREE Full Text

[49] 49.↵
El Emam, K. (2013). Guide to the de-Identification of Personal Health Information (Auerbach Publishers, Incorporated).

[50] 50.↵
El Emam, K., Rodgers, S., and Malin, B. (2015). Anonymising and sharing individual patient data. BMJ 350, h1139. doi:10.1136/bmj.h1139.
OpenUrl FREE Full Text

The Coviral Portal: Multi-Cohort Viral Loads and Antigen-Test Virtual Trials for COVID-19

Abstract

Introduction

Materials and Methods

Institutional review

Defining specific patient groups

Viral load

Presumed SARS CoV-2 variant

Evaluation of antigen tests vs. PCR

Contagiousness

Whole-genome viral NGS

Web portal and privacy protection

Statistical tests

Software and hardware

Results

A web portal for large-scale real-world SARS-CoV-2 viral load results for different patient groups

Overall viral load distributions

Viral load comparisons among patient groups: remdesivir treatment and patient presentation

Unexpected findings: pulmonary disease

Comparisons among multiple groups: survivorship and causes of death

Complex patient subgroups: race and presumed variant

Antigen test performance

Antigen test performance

Contagiousness for omicron-era virus

Discussion

Data Availability

Author Contributions

Declaration of interests

Acknowledgements

Footnotes

References

Citation Manager Formats

Subject Area