ABSTRACT
Background SARS-CoV-2 antigen rapid diagnostic tests (Ag-RDTs) are increasingly being integrated in testing strategies around the world. Studies of the Ag-RDTs have shown variable performance. In this systematic review and meta-analysis, we assessed the clinical accuracy (sensitivity and specificity) of commercially available Ag-RDTs.
Methods We registered the review on PROSPERO (Registration number: CRD42020225140). We systematically searched multiple databases (PubMed, Web of Science Core Collection, medRvix and bioRvix, FINDdx) for publications up until December 11th, 2020. Descriptive analyses of all studies were performed and when more than four studies were available, a random-effects meta-analysis was used to estimate pooled sensitivity and specificity in comparison to reverse transcriptase polymerase chain reaction testing. We assessed heterogeneity by subgroup analyses ((1) performed con-form with manufacturer’s instructions for use (IFU) or not, (2) symptomatic vs. asymptomatic, (3) duration of symptoms less than seven days vs. more than seven days, (4) Ct-value <25 vs. <30 vs. ≥30, (5) by sample type)) and with meta-regression. We assessed study quality and risk of bias using the QUADAS 2 assessment tool.
Results From a total of 11,715 articles, we extracted 98 analytical and clinical data sets. 74 clinical accuracy data sets were evaluated that included 31,202 samples. Across all meta-analyzed samples, the pooled Ag-RDT sensitivity was 73.8% (CI 68.6 to 78.5). If analysis was restricted to studies that followed the Ag-RDT manufacturers’ instructions using fresh upper respiratory swab samples, the sensitivity increased to 79.1% (95%CI 75.0 to 82.8). The SD Biosensor Standard Q and Abbott Panbio showed the highest sensitivity with 81.7% and 72.7%, respectively. The best Ag-RDT performance was found with nasopharyngeal sampling (77.3%, CI 72.0 to 81.9) in comparison to other sample types (e.g., anterior nasal or mid turbinate 63.5%, CI 49.5 to 75.5). Testing in the first week from symptom onset resulted in higher sensitivity (87.5%, CI 86.0 to 89.1) compared to testing after one week (64.1%, CI 54.4 to 73.8). The tests performed markedly better on samples with lower Ct-values, i.e., <30 (87.9%, CI 86.7 to 88.8), in comparison to those with Ct ≥ 30 (47.8%, CI 41.1 to 54.5). Bias concerns were raised across all data sets, and financial support from the manufacturer was re-ported in 28.2% of data sets.
Conclusion As Ag-RDTs detect most cases within the first week of symptom onset and those with high viral load, they can have high utility for screening purposes in the early phase of disease, and thus can be a valuable tool to fight the spread of SARS-CoV-2. Standardization of conduct and reporting of clinical accuracy studies would improve comparability and use of data.
Summary In this living systematic review we analyzed 98 data sets for performance of SARS-CoV-2 Ag-RDTs compared to RT-PCR. Best-performing tests achieved a sensitivity of 81.7%. Highest sensitivity was found in patients within seven days of symptom onset when NP swabs were utilized.
INTRODUCTION
As the COVID-19 pandemic continues around the globe, antigen rapid diagnostic tests (Ag-RDTs) for SARS-CoV-2 are seen as a complimentary to fight the virus’ spread (1). The number of Ag-RDTs on the market is increasing constantly (2). Initial data from independent evaluations suggests that the performance of SARS-CoV-2 Ag-RDTs may be lower than what is reported by the manufacturers. In addition, Ag-RDT accuracy seems to vary substantially between tests (3-5).
With the increased availability of Ag-RDTs, an increasing number of independent validations have been published. Such evaluations differ widely in their quality, methods and results, making it difficult to assess the true performance of the respective tests (6). To inform decision makers on the best choice of individual tests, an aggregated, widely available and frequently updated assessment of the quality, performance and independence of the data is urgently necessary. While other systematic reviews have been published, they only include data up until May 2020 (7-9), exclude preprints (10), or were industry sponsored (11). In addition, only one assessed the quality of studies in detail, with data up until May, 2020 (6).
With our systematic review and meta-analysis, we aim to close this gap in the literature and link to a website (www.diagnosticsglobalhealth.org) that is continuously updated.
METHODS
We developed a study protocol following standard guidelines for systematic reviews (12, 13), which is available upon request. The PRISMA checklist and the study protocol are provided in the Supplements (S1, S14). We also registered the review on PROSPERO (Registration number: CRD42020225140).
SEARCH STRATEGY
We performed a search of the databases PubMed, Web of Science, medRxiv and bioRxiv using search terms that were developed with an experienced medical librarian (MG) using combinations of subject headings (when applicable) and text-words for the concepts of the search question. The main search terms were “Severe Acute Respiratory Syndrome Corona-virus 2”, “COVID-19”, “Betacoronavirus”, “Coronavirus” and “Point of Care Testing”. The full list of search terms is available in the Supplement (S2). We also searched the FINDdx website (https://www.finddx.org/sarscov2-eval-antigen/) for relevant studies manually. We performed the search up until December 11th, 2020. No language restrictions were applied. Weekly searches are continued thereafter to update the website (www.diagnosticsglobalhealth.org).
INCLUSION CRITERIA
We included studies evaluating the accuracy of commercially available Ag-RDTs to establish a diagnosis of a SARS-CoV-2 infection against reverse transcriptase chain reaction (RT-PCR) or cell culture as reference standard. We included all study populations irrespective of age, presence of symptoms, or the study location. We considered cohort studies, nested cohort studies, case-control or cross-sectional studies and randomized studies. We included both peer reviewed publications and preprints.
We excluded studies in which patients were tested for the purpose of monitoring or ending quarantine. Also, publications with a population size smaller than 10 were excluded (although the size threshold of 10 is arbitrary, such small studies are more likely to give unreliable estimates of sensitivity or specificity).
INDEX TESTS
Point of Care (POC) Ag-RDTs for SARS-CoV-2 aim to detect infection by recognizing viral proteins. Most POC Ag-RDTs use specific labeled antibodies attached to a nitrocellulose matrix strip, to capture the virus antigen. Successful binding of the antibodies to the antigen is either detected visually (through the appearance of a line on the matrix strip (lateral flow assay)) or requires a specific reader for fluorescence detection. POC microfluidic enzyme-linked immunosorbent assays have also been developed. Ag-RDTs typically provide results within 10 to 30min (5).
REFERENCE STANDARD
Viral culture detects viable virus that is relevant for transmission but is available in research settings only. Since RT-PCR tests are more widely available and SARS-CoV-2 RNA (as reflected by RT-PCR cycle threshold (Ct) value) highly correlates with SARS-CoV-2 antigen quantities, we considered it an acceptable reference standard for the purposes of this systematic review (14).
STUDY SELECTION AND DATA EXTRACTION
Two reviewers (LEB and CE, LEB and SS or LEB and MB) reviewed the titles and abstracts of all publications identified by the search algorithm independently, followed by a full-text review for those eligible, to select the articles for inclusion in the systematic review. Any disputes were solved by discussion or by a third reviewer (CMD).
A full list of the parameters extracted is included in the Supplement (S13) and the data extraction file is available upon request. Studies that assessed multiple Ag-RDTs or presented results based on differing parameters (e.g., various sample types) were considered as individual data sets.
At first, four authors (SK, CE, SS, MB) extracted five randomly selected papers in parallel to align on the extraction of data. Afterwards, data extraction as well as the assessment of methodological quality and independency from test manufacturers (see below) was performed by one author per paper (SK, CE, SS, MB) and controlled by a second (LEB, SK, SS, MB). Any differences were resolved by discussion or by consulting a third author (CMD).
STUDY TYPES
We differentiated between clinical accuracy studies (performed on clinical samples) or analytical accuracy studies (performed on contrived samples with known viral load). Analytical accuracy studies
can differ widely in methodology, impeding an aggregation of their results. Thus, while we extracted the data for both kinds of studies, we only considered data from clinical accuracy studies as eligible for the meta-analysis. Separately, we summarized the results of analytical studies and compared them with the results of the meta-analysis for individual tests.
ASSESSMENT OF METHODOLOGICAL QUALITY
The quality of the clinical accuracy studies was assessed by applying the QUADAS-2 tool (15). The tool evaluates four domains: patient selection, index test, reference standard, and flow and timing. For each domain, the risk of bias is analyzed using different signaling questions. Beyond the risk of bias, the tool also evaluates the applicability of the study of each included study to the research question for every domain. The QUADAS 2 tool was adjusted to the needs of this review and can be found in the Supplement (S3).
ASSESSMENT OF INDEPENDENCY FROM MANUFACTURERS
We examined whether a study received financial support from a test manufacturer (including the free provision of Ag-RDTs), whether any study author was affiliated with a test manufacturer, or a respective conflict of interest was declared. Studies were judged not to be independent from the test manufacturers if at least one of these aspects were found present, otherwise they were considered to be independent.
STATISTICAL ANALYSIS AND DATA SYNTHESIS
We prepared forest plots for the sensitivity and specificity of each test and visually evaluated the heterogeneity between studies. If four or more data sets were available with more than 20 positive RT-PCR samples per data set for a predefined analysis, a meta-analysis was performed. We report point estimates of sensitivity and specificity for SARS-CoV-2 detection compared to the reference standard along with 95% confidence intervals (CI) using a bivariate random effect hierarchical model (implemented with the ‘metandi’ command in Stata). When there were less than four studies for an index test, only a descriptive analysis only was performed and accuracy ranges were reported. In sub-group analyses where papers presented data only on sensitivity, a univariate random effects logistic regression model was done (using the ‘metan’ command in Stata). We predefined the following subgroups for meta-analysis: by sampling and testing procedure in accordance with manufacturer’s instructions as detailed in the instructions for use (henceforth called IFU-conform) vs. non-IFU conform, age (<18; ≥18), sample type, by presence or absence of symptoms, symptom duration (<7 days versus ≥7 days), type of RT-PCR used, and by Ct-value range. For categorization by sample type, we assessed (1) nasopharyngeal (NP) alone or combined with other (e.g., oropharyngeal (OP)), (2) OP alone, (3) anterior nasal or mid-turbinate (AN/MT), (4) a combination of bronchial alveolar lavage and throat wash (BAL/TW) or (5) saliva.
We aimed to do meta-regression with the ‘mvmeta’ command in Stata to examine the impact of covariates including symptom duration and Ct-value range. We also performed the Deeks’ test for funnel-plot asymmetry as recommended to investigate publication bias for diagnostic test accuracy meta-analyses ((16), using the ‘midas’ command in Stata); a p-value<⍰0.10 for the slope coefficient indicates significant asymmetry. Analyses were performed using Stata 15 (Stata Corporation, College Station, TX, USA), and forest plots were generated using Review Manager 5.3 (Nordic Cochrane Centre, Copenhagen, Denmark).
SENSITIVITY ANALYSIS
Two types of sensitivity analyses were planned: First, estimation of sensitivity and specificity excluding case-control studies. Secondly, estimation of sensitivity and specificity excluding non-peer-reviewed studies. We compared the results of each sensitivity analysis against overall results to assess the potential bias introduced by considering case-control studies and non-peer reviewed studies.
DIAGNOSTICSGLOBALHEALTH.ORG
A summary of the data included in this paper is available on the website “www.diagnosticsglobalhealth.org”. At least once per week we update this website by continuing the literature search and process described above. We plan to update the meta-analysis every month and post on the website.
RESULTS
SUMMARY OF STUDIES
The systematic search resulted in 11,715 articles. After removing duplicates, 5,435 articles were screened, and 93 papers were considered eligible for data extraction. Of these, 41 were excluded because they did not present primary data (14, 17-56), leaving 52 studies to be included in the systematic review (Figure 1) (3, 57-107).
At the end of the data extraction process, 23 studies were still in preprint form (3, 57, 60, 62, 63, 66, 67, 75-77, 81, 86, 89, 91, 93, 94, 96, 97, 100, 102-104, 106). All studies were written in English, except for one in Spanish. Out of the 52 studies, 48 reported on clinical accuracy (3, 57-65, 67-83, 86-91, 93-107) and four on analytical accuracy (66, 84, 85, 92).
The 48 clinical accuracy studies were divided up in 74 data sets, while the four analytical accuracy studies accounted for 24 data sets. A total of 20 different Ag-RDTs were evaluated (15 lateral flow with visual readout, five requiring an automated reader), with 18 being assessed in a clinical accuracy study. Only 11 studies reported data for more than one test, and only four of these conducted a head-to-head assessment, i.e., testing at least two Ag-RDTs on the same sample or participant. The reference method was RT-PCR in all except one study, which used viral culture.
The most common reason for testing was the occurrence of symptoms (30.6% of data sets), while in another 15.3% of data sets persons were screened independent of symptoms. Close contact to a SARS-CoV-2 confirmed case was the reason for testing in further 5.1% of the data sets. In 8.2% of the data sets, persons were tested due to more than one of the reasons mentioned before and for 40.8% the reason for testing was unclear.
In total, 32,468 Ag-RDTs were done, 31,202 in clinical accuracy studies and 1,266 in analytical accuracy studies. In the clinical accuracy studies, the mean number of samples per clinical study was 422 (Range 17 to 4183). Only 274 tests were performed on pediatric samples and 10,154 on samples from adults. For the remaining 22,040 samples, age was not specified. 18,464 samples originated from symptomatic patients and 5,071 samples from asymptomatic patients. For 8,933 samples the patient’s symptom status could not be identified. The most common sample type evaluated was NP and mixed NP/OP (22,293 samples). There were substantially fewer data points for the other sample types: OP 796 samples and AN/MT 6,496 samples.
Of the data sets assessing clinical accuracy, 39.2% performed testing according to the manufacturers’ recommendations (i.e., IFU-conform), while 58.1% were not IFU-conforming. The most common deviations from the IFU were (1) a sample type that was not recommended for Ag-RDTs (28 (37.8%) data sets; 2 (2.7%) not known), (2) use of samples that were prediluted in transport media not recommended by the manufacturer (26 (35,1%) data sets; 9 (12,2%) not known) and (3) use of banked samples (21 (28.4%) data sets; 12 (16.2%) not known).
A summary of the clinical accuracy data by study, including the test(s) evaluated, sample size, sample type, sample condition and IFU conformity, can be found in Table 1. Most data sets were available for the Panbio test by Abbott Rapid Diagnostics (Germany; henceforth called Panbio): 21 data sets and 15,809 tests; while Standard Q test by SD Biosensor (South Korea; distributed in Europe by Roche, Germany; henceforth called Standard Q) was assessed in 16 data sets with 6036 tests performed. Detailed results for each clinical accuracy study are available in the Supplement (S 4).
METHOLOGICAL QUALITY OF STUDIES
The findings on study quality using the QUADAS 2 tool are presented in Figure 2. Most studies assessed a relevant patient population (73.0%). However, for only 31.1% of the studies the patient selection was considered representative of the setting and population chosen (i.e., they avoided in-appropriate exclusions, a case-control design and enrollment occurred consecutive or randomly).
The conduct and interpretation of the index tests was considered to have low risk for introduction of bias in 45.9% of studies (through e.g., appropriate blinding of persons interpreting the visual read-out). However, 51.4% of studies did not provide sufficient information to clearly judge the risk of bias. Only a subset of studies performed the Ag-RDTs according to IFU (39.2% of studies), while 58.1% were non-IFU conforming, which potentially affected the accuracy negatively (for 2.7% of studies it was unclear).
For half of the data sets (51.4%) the reference standard was performed ahead of the Ag-RDT or the operator conducting the Ag-RDT was blinded to its results, which resulted in a low risk of bias. However, almost half (47.3%) did not report sufficient information to judge the risk and one study specifically stated to have performed the reference standard not blinded to the Ag-RDT results. Nonetheless, the applicability of the reference test was judged to be of low concern for all studies, as cell culture or RT-PCR are expected to adequately define the target condition.
Most studies (67.6%) obtained the sample for the index test and reference test at the same time and applied the same reference standard across the samples. However, for 8.1% of data sets, we were concerned that not all selected patients were included in the analysis.
Financial support from the Ag-RDT manufacturer was found in 28.2% of the data sets. Five percent of the authors reported a conflict of interest and another five percent indicated employment by the manufacturer of the Ag-RDT studied.
DETECTION OF SARS-COV-2 INFECTION
Out of 74 clinical data sets (from 48 studies), ten were excluded from the meta-analysis, as they included less than 20 RT-PCR positive samples. Across the remaining 64 data sets, including any test and type of sample, the meta-analyzed sensitivity and specificity were 73.8% (95%CI 68.6 to 78.5) and 99.7% (95%CI 99.3 to 99.9). If testing was performed IFU-conform, sensitivity increased to 79.1% (95%CI 75.0 to 82.8) compared to non-IFU conform testing with a respective sensitivity of 68.5% (95%CI 58.4 to 77.2). Pooled specificity was the same in both groups (99.7% vs. 99.6%).
ANALYSIS OF SPECIFIC TESTS
Based on 47 out of the 64 clinical data sets with 24,543 tests performed, we were able to meta-analyze the sensitivity and specificity of five different Ag-RDTs: Standard Q, Panbio, the Standard F by Biosensor (South Korea; henceforth called Standard F), the COVID-19 Ag Respi-Strip by Coris BioConcept (Belgium, henceforth called Coris) and the Biocredit Covid-19 Antigen rapid test kit by RapiGEN (South Korea; henceforth called Rapigen). Across these, pooled estimates of sensitivity and specificity on all samples were 73.1% (95%CI 67.1 to 78.3) and 99.7% (95% CI 99.2 to 99.9), which were very similar to the overall pooled estimate across all meta-analyzed data sets (73.8% and 99.7%, above).
Standard Q had the highest pooled estimate of sensitivity with 81.7% (95% CI 74.8 to 87.0). The pooled sensitivity for Standard F and Panbio were 70.9% (95% CI 52.0 to 84.6) and 72.7% (95%CI 63.7 to 80.2), respectively. Coris and Rapigen only reached a pooled sensitivity of 41.9% (95%CI 29.9 to 54.8) and 65.8% (95%CI 44.4 to 82.3), respectively. It is of note that one of the studies on Coris found sensitivity to be 87% in samples with Ct-values <25, but 0% for Ct-values ≥25 (104). The pooled specificity was above 99% for Coris, Panbio and Standard Q and above 98% for Rapigen and Standard F. All results are presented in Figure 3. Hierarchical summary receiver-operating characteristic for Standard Q and Panbio are available in the Supplement (S6).
The remaining thirteen Ag-RDTs did not have sufficient data to allow for a test-specific meta-analysis. For the ESPLINE SARS-CoV-2 by Fujirebio (Japan; henceforth called Espline) sensitivity ranged widely from 23.5% to 80.7%, while both the 2019-nCov Antigen Rapid Test Kit by Shenzhen Bioeasy Biotechnology (China; henceforth called Bioeasy) and BD Veritor by Becton, Dickinson and Company (US, New Jersey; henceforth called BD Veritor) showed smaller variability with sensitivities within 66.7% to 93.9% and 76.3% to 96.4%, respectively. For the Sofia SARS Antigen FIA by Quidel (US, California; henceforth called Sofia), a sensitivity between 76.8% and 93.8% was reported (Table 1). Forest plots for the data sets for each Ag-RDT are provided in the Supplement (S5).
Specificity was above 98% for BD Veritor and Espline for studies on NP or NP/OP samples and for Sofia it was 96.9%. For Bioeasy, specificity was as low as 85.6% in one study, even though the test was performed as recommended by the manufacturer. The results for all Ag-RDTs that have been evaluated in more than one data set but did not qualify for a test specific meta-analysis are summa-rized in Table 2. The residual Ag-RDTs that were evaluated in one data set only are included in Table 1 and Supplement (S5).
Four studies accounting for 15 data sets conducted head-to-head clinical accuracy evaluations of tests using the same sample or samples from the same participant. These data sets are underlined in Table 1. Two such studies included more than 100 samples, whereas the other two included too small sample sizes to draw clear conclusions (96, 106). All tests were performed non-IFU conform as banked specimens were tested, the type of sample (OP, BAL/TW) was not recommended or viral/universal transport medium (VTM/UTM) was used resulting in pre-dilution. Not surprisingly, one head-to-head study found overall low sensitivity, with Standard Q (sensitivity 49.4%) being slightly more sensitive than Panbio (sensitivity 44.6%), but less sensitive than the CLINITEST® Rapid COVID-19 Antigen Test by Siemens Healthineers (Germany; sensitivity 54.9%) (89). Another study found Bioeasy (sensitivity 85.0%) to have higher sensitivity than Rapigen (sensitivity 62.0%) (105). In both studies, specificity was above 97.0% for all Ag-RDTs, except for SARS-CoV-2 Ag-RDT by Liming Bio (China; specificity 90.0%) (Supplement S5).
SUBGROUP ANALYSIS
The results are presented in Figure 4. Detailed results for the subgroup analysis are available in the Supplement (S7 to 11).
Subgroup analysis by IFU conformity
The summary results are presented in Figure 4B. When assessing only studies with an IFU-conforming sampling, a subgroup analysis by test type was possible for studies using Panbio (58, 59, 62, 63, 67, 68, 73, 77, 86, 97, 102) and Standard Q (62, 72, 74, 76, 81, 82, 88, 97) with 20 data sets performing 11,658 tests in total (Standard Q accounted for eight (40%) data sets and 3,293 (28.2%) tests). For Standard Q, we found a pooled sensitivity and specificity of 84.4% (95% CI 79.1 to 88.6) and 99.3% (95% CI 97.9 to 99.8) and for Panbio, we found a pooled sensitivity and specificity of 76.9% (95% CI 69.4 to 83.0) and 99.9% (95% CI 99.5 to 100.0), respectively. These results are largely similar to the subgroup analysis of the two tests when using NP samples.
Subgroup analysis by sample type
Most data sets evaluated NP or combined NP/OP swabs (49 data sets and 20,115 samples) as the sample type for the Ag-RDT. NP or combined NP/OP swabs achieved a pooled sensitivity of 77.3% (95% CI 72.0 to 81.9). Data sets that used AN/MT swabs for Ag-RDTs (five data sets and 6496 samples) showed a summary estimate for sensitivity of 63.5% (95% CI 49.5 to 75.5). Out of these five AN/MT data sets, three data sets used NP samples for the RT-PCR comparison (sensitivity 44.7% to 82.1%, specificity 99.1% to 100%) (57, 58, 82), while only two data sets used AN/MT for both Ag-RDT and RT-PCR testing (sensitivity 57,7% to 79,5%, specificity 98,7% to 100%; Figure 4A) (60, 93).
When evaluating results from two studies that reported direct head-to-head comparison of NP and MT samples from the same participants using the same Ag-RDT (Standard Q), the two sample types showed equivalent performance (81, 82).
Analysis of performance with an OP swab (722 samples), showed pooled sensitivity of only 48.2% (95%CI 42.7 to 53.8). However, all data were from one single head-to-head study that applied the same sample to four different tests after dilution with UTM (89). Specificity was above 99% for all three of the subgroups analyzed.
We were not able to perform a subgroup meta-analysis for BAL/TW due to insufficient data as there was only one study with 73 samples evaluating the Rapigen, Panbio and Standard Q (96). How-ever, BAL/TW would in any case be off label use and is not considered a POC sample.
Another off-label sample used in one study (58) was saliva. In this data set with 610 samples, overall sensitivity was 23.1% (95% CI 16.2 to 31.9), while even sensitivity in samples from symptomatic patients with a Ct-value ≤25 was of only 41% (95% CI 28 to 56). Specificity was reported to be 100% (95% CI 99 to 100) (58).
Three tests had sufficient data sets available to meta-analyze performance with NP swabs by test type. Standard Q with 83.3% (95%CI 77.3 to 87.9) sensitivity and 99.1% (95%CI 98.2 to 99.6) specificity and Panbio with 78.7% (95%CI 71.4 to 84.5) sensitivity and 99.9% (95%CI 99.5 to 100) specificity were the best performing tests. Coris had a sensitivity of only 41.9% (95%CI 29.9 to 54.8).
Subgroup analysis in symptomatic and asymptomatic patients
Within the data sets possible to meta-analyze, 12,625 samples (77.2%) were from symptomatic and 3,737 (22.8%) from asymptomatic patients. The pooled sensitivity for symptomatic patients was markedly different compared to asymptomatic patients with 78.1% (95%CI 69.6 to 84.8) versus 62.5% (95%CI 39.7 to 80.8), but confidence intervals were overlapping. Specificity was 99.7% (95%CI 99.4 to 99.9) for symptomatic and 99.9% (95%CI 97.6 to 100) for asymptomatic patients, respectively (Figure 4C).
Subgroup analysis comparing symptom duration
Limited data were available for this sub-analysis: data was analyzed for 2,875 patients with symptoms less than 7 days and 249 patients with symptoms ≥ 7 days. It was only possible to perform a univariate analysis of sensitivity. The pooled sensitivity for patients with onset of symptoms <7 days was 87.5% (95%CI 86.0 to 89.1) which is markedly higher than the 64.1% (95%CI 54.4 to 73.8) sensitivity found for individuals tested ≥ 7 days from onset of symptoms (Figure 4C).
Subgroup analysis by Ct-values
There were also limited data available for comparison of Ct-values in similar ranges. In an effort to use as much of the heterogeneous data as possible, the cut-offs for the Ct-value groups were relaxed by 2-3 points within each range. The <25 group included values reported as ≤24 to <25, the <30 group included values from ≤29 to ≤33. This resulted in some overlap for the <30 and ≥30 groups. The pooled sensitivity for Ct-values <25 was markedly better with 94.2% (95%CI 93.2 to 95.2) com-pared to ≥ 25 32.5% (95%CI 28.0 to 37.1; Figure 4D). A similar pattern was observed when the Ct-values were analyzed using cut-offs <30 or ≥30, resulting in a sensitivity of 87.9% (95%CI 86.7 to 89.2) and 47.8% (95%CI 41.1 to 54.5), respectively (Figure 4D). Sensitivity in samples with a low viral load (<5 log 10 copies/ml) ranged between 46.9% (lowest estimate in single study) to 48.1% (highest estimate in a single study). In contrast, higher viral load samples (>6 log 10 copies/ml) showed higher sensitivity, ranging from 71.4% to 100%.
Subgroup analysis by age, type of RT-PCR and viral load
We were not able to perform a meta-analysis for the subgroups by age, type of RT-PCR or viral load (viral copies/mL) due to insufficient data.
Sensitivity by age ranged from 72.7% to 100% in patients under 18 years. A similar picture was found in adults ≥18 years, with sensitivity ranging between 76.3% to 93.6%. Specificity was above 99% in both groups. In 52 (70.3%) of the data sets only one type of RT-PCR was used, whereas 15 (20.3%) tested samples in the same study using different RT-PCRs. For seven (9.4%) of the data sets we could not tell the type of RT-PCR. The Cobas® SARS-CoV-2 Test from Roche (Germany) was used most frequently in 24 (32.4%) of the data sets, followed by the Allplex® 2019 n-CoV Assay from Seegene in 16 (21.6%) and the SARS CoV-2 assay from Corman/TibMolBio in 14 (18.9%) of the data sets.
Meta regression
We were not able to perform a meta-regression due to the considerable heterogeneity in re-porting sub-groups, which resulted in too few studies with sufficient data for comparison.
Publication Bias
The result of the Deeks’ test indicate significant asymmetry in the funnel plot for all 64 datasets with complete results (p=0.01) and for Standard Q publications (p=0.03), but not Panbio publications (p=0.95). All funnel plots are listed in the Supplement (S12).
COMPARISON WITH ANALYTICAL STUDIES
The four included analytical studies provided 24 data sets in total, evaluating eight different Ag-RDTs. 45.8% of the samples originated from NP swabs, whereas throat saliva, a combination of naso-pharyngeal aspirate and throat swab, as well as a combination of NP and throat swab accounted for 16.6% each. One data set included sputum.
Overall, the reported analytical sensitivity (limit of detection) in the studies correlated with the results of the meta-analysis presented above. For example, one study on NP swabs found Rapigen (limit of detection (LOD) in log10 copies per swab (108): 10.2) and Coris (LOD 7.46) to perform worse than Panbio (LOD 6.55) and Standard Q (LOD 6.78)(66). Similar results were found in another study, where the Standard Q showed the lowest LOD (detecting virus up to what is an equivalent Ct-value of 28.67), when compared to that of Rapigen and Coris (detecting virus up to what is an equivalent Ct-value of only 18.44 for both)(84).
SENSITIVITY ANALYSIS
When case control studies (13/64) were excluded, the pooled sensitivity stayed the same with 73.8% (95%CI 68.7 to 78.4) compared to 73.8% (95%CI 68.6 to 78.5) in the overall analysis with no change in pooled specificity. When excluding pre-prints (35/64), sensitivity decreased slightly to 69.2% (95% CI 60.7-76.6) compared to the overall analysis.
DISCUSSION
In this comprehensive systematic review and meta-analysis, we have summarized the data of 52 studies evaluating the accuracy of 20 different Ag-RDTs. Across all meta-analyzed samples, our results show a sensitivity and specificity of 73.8% (95%CI 68.6 to 78.5) and 99.7% (95% CI 99.3 to 99.9). Over half of the studies did not perform the Ag-RDT in accordance with the test manufacturers’ recommendation, which affected sensitivity negatively. When considering only IFU-conform studies the sensitivity increased to 79.1% (95%CI 75.0 to 82.8). While we found the sensitivity to vary across specific tests, the specificity was more consistently high.
The two Ag-RDTs that have been approved through the WHO emergency use listing procedure, Abbott Panbio and SD Biosensor Standard Q (distributed by Roche in Europe), have not only drawn the largest research interest, but also continue to be the best performing tests when comparing their meta-analyzed accuracy to that of other Ag-RDTs (Standard F, Coris and Rapigen). Two other Ag-RDTs with more data available (however insufficient data to meta-analyze) also show higher performance (BD Veritor and Sophia). However, both require an instrument for operation.
Not surprisingly, lower Ct-values, the RT-PCR semi-quantitative correlate for a high virus concentration, resulted in a significantly higher Ag-RDT sensitivity when compared to a high Ct-value (pooled sensitivity 94.2% vs. 32.5%). This confirms prior data that suggested that antigen concentrations and Ct-values were tightly correlated in NP samples (14). Ag-RDTs also showed higher sensitivity in patients within 7 days after symptom onset than in patients later in the course of the disease (pooled sensitivity 87.5% vs. 64.1%), which is to be expected given that samples from patients within the first week after symptom onset have been shown to contain the highest virus concentrations (109). In line with this, studies presenting an unexpectedly low overall sensitivity either shared a small population size with an on average high Ct-value (83, 98) or performed the Ag-RDT not as per IFU, e.g., using saliva samples (58, 89). In contrast, studies with an unusually high Ag-RDT sensitivity were based on study populations with a high median Ct-value, between 18 and 22 (65, 94).
Our analysis also found that the accuracy of Ag-RDTs is substantially higher in symptomatic patients than in asymptomatic (pooled sensitivity 78.1% vs. 62.5%). Given that prior studies found largely no difference in the trajectory of viral load of patients with and without symptoms over the course of disease (109), this is likely explained by the varied stage in the course of disease at which testing is performed in asymptomatic patients presenting for one-time screening testing. While we were not able to perform a meta-regression assessing performance by duration of infection, studies that enrolled asymptomatic contacts of infected patients (3, 77, 99) were more likely to show higher Ag-RDT sensitivity than studies that performed random screening of asymptomatic persons (64, 93). This is explained by the fact, that asymptomatic persons who were tested after a contact with an infected person are more likely to be captured in the early phase of disease and have higher viral loads at the time of testing (110). However, with random screening, detection is possible at any point of disease (i.e., including late in disease, when PCR is still positive, but viable virus is rapidly decreasing (111)).
With regards to the sampling and testing procedure, we found Ag-RDTs to perform similarly across upper-respiratory swab samples (e.g., NP and AN/MT), particularly when considering the most reliable comparisons from head-to-head studies.
Similar to previous assessment (6), the methodological quality of the included studies revealed a very heterogenous picture. In the future, aligning the design of clinical accuracy studies to common agreed upon minimal specifications (e.g., by WHO or European Center of Disease Control) and reporting the results in a standardized way (112) would improve data quality and comparability.
The main strengths of our study lie in its comprehensive approach and continuous updates. By linking this review to our website www.diagnosticsglobalhealth.org, we strive to equip decision makers with the latest research findings on Ag-RDTs for SARS-CoV-2 and, to the best of our knowledge, are the first in doing so. Furthermore, our study shows rigorous methods as both the study selection and data extraction were performed by one author and independently controlled by a second, we conducted blinded test extractions ahead of the actual data extraction, and we prepared a detailed interpretation guide for the QUADAS-2 tool.
However, our study is limited in that the inclusion of both preprints and peer-reviewed literature could affect the quality of our report. Nonetheless, we aimed to counterbalance this effect by applying a thorough assessment of all clinical studies included, utilizing the QUADAS-2 tool. Furthermore, a sensitivity analysis excluding preprints was performed. In addition, we restricted our report to data found in common research databases and the FINDdx website. Even though we are aware that further data for example from governmental research institutes exists (113), such data could not be included as sufficient detail describing the methods and results are not publicly available. Finally, the strong heterogeneity in data reporting, as discussed above, limited the meta-analysis.
CONCLUSION
In summary, it can be concluded that there are Ag-RDTs available that have high sensitivity, particularly when performed in the first week of illness when viral load is high, and high specificity. However, our analysis also highlights the variability in results between tests (which is not reflected in the manufacturer reported data), indicating the need for independent validations. Furthermore, the analysis highlights the importance of tests to be done in accordance with the manufacturers’ recommended procedures and in alignment with standard diagnostic study and reporting guidelines. The accuracy achievable by the best-performing Ag-RDTs, combined with the rapid results turn-around time and ease of use, suggests that these tests could have a significant impact on the pandemic if applied in thoughtful testing and screening strategies.
Data Availability
All data is available upon request.
ABBREVIATIONS
- Ag-RDT
- antigen rapid diagnostic test
- AN/MT
- anterior nasal or midturbinate
- AR
- Aruba
- BAL/TW
- bronchoalveolar lavage or throat wash
- CI
- confidence interval
- Ct-value
- cycle threshold value
- ER
- Emergency Room
- FINDdx
- Foundation for Innovative New Diagnostics
- FP
- false positive
- FN
- false negative
- IFU
- instructions for use
- LRT
- lower respiratory tract
- N
- sample size
- NP
- nasopharyngeal
- OP
- oropharyngeal
- POC
- point of care
- PC
- professional-collected
- RT-PCR
- reverse transcriptase polymerase chain reaction
- SC
- self-collected
- TP
- true positive
- TR
- travelers
- TN
- true negative
- UT
- Utrecht
- VTM/UTM
- viral or universal transport medium
SOURCES
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.
- 19.
- 20.
- 21.
- 22.
- 23.
- 24.
- 25.
- 26.
- 27.
- 28.
- 29.
- 30.
- 31.
- 32.
- 33.
- 34.
- 35.
- 36.
- 37.
- 38.
- 39.
- 40.
- 41.
- 42.
- 43.
- 44.
- 45.
- 46.
- 47.
- 48.
- 49.
- 50.
- 51.
- 52.
- 53.
- 54.
- 55.
- 56.↵
- 57.↵
- 58.↵
- 59.
- 60.↵
- 61.
- 62.
- 63.
- 64.↵
- 65.↵
- 66.↵
- 67.
- 68.
- 69.
- 70.
- 71.
- 72.
- 73.
- 74.
- 75.
- 76.
- 77.↵
- 78.
- 79.
- 80.
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.
- 87.
- 88.
- 89.↵
- 90.
- 91.
- 92.↵
- 93.↵
- 94.↵
- 95.
- 96.↵
- 97.
- 98.↵
- 99.↵
- 100.
- 101.
- 102.
- 103.
- 104.↵
- 105.↵
- 106.↵
- 107.↵
- 108.↵
- 109.↵
- 110.↵
- 111.↵
- 112.↵