Abstract
Objective Independent evaluation of the sensitivity of CE-marked SARS-CoV-2 antigen rapid diagnostic tests (Ag RDT) offered in Germany.
Method The sensitivity of 122 Ag RDT was adressed using a common evaluation panel. Minimum sensitivity of 75% for panel members with CT<25 was used for differentiation of devices eligible for reimbursement in in the German healthcare system.
Results The sensitivity of different SARS-CoV-2 Ag RDT varied over a wide range. The sensitivity limit of 75% for panel members with CT <25 was met by 96 of the 122 tests evaluated; 26 tests exhibited lower sensitivity, few of which were completely failing. Some devices exhibited high sensitivity, e.g. 100% for CT<30.
Conclusion This comparative evaluation succeeded to distinguish less sensitive from better performing Ag RDT. Most of the Ag RDT evaluated appear to be suitable for fast identification of acute infections associated with high viral loads. Market access of SARS-CoV-2 Ag RDT should be based on minimal requirements for sensitivity and specificity.
Introduction
A large number of antigen-detecting rapid diagnostic tests (Ag RDTs) for SARS-CoV-2 are available on the European market, both for professional use and as self-tests. Rapid tests are based on lateral flow immunochromatography using antibodies against SARS-CoV-2 proteins (antigens), present in respiratory tract specimens. By far most Ag RDTs target the viral nucleoprotein (N), only very few assays work with spike protein (S) detection. Viral variants of concern (VOC) have been described mainly for the S gene, leaving the vast majority of SARS-CoV-2 Ag RDT unaffected; however, the few SARS-CoV-2 Ag RDT based on spike protein detection should be checked at regular intervals for potential deficiencies. While PCR is still the gold standard for virus detection, there is an increasing evidence that infectivity by respiratory secretions correlates with high viral loads present in the early phase of infection, e.g. before and after (0-10 days) onset of symptoms. Thus Ag RDTs allow rapid identification of acutely infected and potentially infectious individuals facilitating fast decisions on containment of virus spread, patient care, isolation and contact tracing. Furthermore, Ag RDTs may save limited reagents of more sensitive molecular diagnostics to serve other diagnostic needs, e.g. disease management or confirmation of Ag RDT reactive results.
In the European Union (EU), regulatory requirements for SARS-CoV-2 in vitro diagnostic medical devices (IVD) are defined by the IVD Directive 98/79/EC (IVDD) and have to be addressed by the manufacturer prior to access to the EU Common Market. However, certification (CE-marking) of SARS-CoV-2 diagnostics is currently done solely by the manufacturer (self-certification), without third party intervention. The exception are SARS-CoV-2 self-tests, where a notified body has to assess the lay person studies. However, due to the urgency in the Corona situation a national derogation can be agreed by the national Competent Authority, e. g. based on the performance of an identical test for professional use. From May 2022 the IVDD will be replaced by the IVD Regulation (EU) 2017/746 (IVDR) where a risk-based classification of IVDs is the basis for the scrutiny of their assessment (1). SARS-CoV-2 IVD will belong to the high-risk devices (class D) under the IVDR, requiring a Notified Body both for certification of the manufacturer`s quality management system and for assessment of the technical documentation of the device. Furthermore, EU reference laboratories (EURL) will be responsible for independent laboratory testing of class D devices to verify performance features and to assure batch to batch consistency.
However, at the time being independent evaluations of SARS-CoV-2 Ag RDT that allow conclusions on their performance are largely missing.
In the current situation with absence of strict regulatory requirements for most SARS-CoV-2 IVD, the German Ministry of Health decided to link the reimbursement of SARS-CoV-2 Ag RDT to provision of evidence of essential quality features of these assays. This evidence consisted of two parts: 1) provision by the manufacturer of evidence for compliance with minimum criteria, and 2) successful independent laboratory evaluation. Minimum criteria for sensitivity and specificity were jointly defined by Paul-Ehrlich-Institut (PEI) and Robert Koch Institut (RKI), two governmental authorities in Germany (2). Manufacturers or distributors of SARS-CoV-2 Ag RDT document for the respective SARS-CoV-2 Ag RDT compliance with these minimum criteria before the device can be listed as eligible for reimbursement on a dedicated webpage of Bundesinstitut für Arzneimittel und Medizinprodukte (BfArM), another governmental authority (3).
Devices were selected from the BfArM list for the comparative evaluation performed by PEI/RKI. The aim of this comparative evaluation was to both determine the “state of art” sensitivity of proficient devices and identify devices not reaching the minimum sensitivity level. Subsequently, devices with sensitivity below “state of the art” were removed from the BfArM list while all devices with successful evaluation outcome were published on PEI webpage (4). In the meantime more than 120 SARS-CoV-2 Ag-RDT have been evaluated in direct comparison using common SARS-CoV-2 specimens. The outcome and conclusions of this study are summarized in this manuscript.
Materials and Methods
Evaluation panel
Detailed characterization of the evaluation panel has been described by Puyskens et al. in the tandem publication to this study. In short, pools from nasopharyngeal and oropharyngeal swabs were prepared as random mixtures obtained from up to 10 swabs. While dry swabs were directly eluted in phosphate buffered saline (PBS), the residual amount virus transport media (VTM) contained in moist swabs was diluted in PBS. Care was taken not to use VTM containing the protein denaturing component guanidinium.
Individual pools are composed of samples with similar SARS-CoV-2 concentrations, expressed as cycle threshold (CT) values of semiquantitative PCR. In total 50 different pools were defined as members of the evaluation panel and stored as 500 µl aliquots at -80°C. The CT of each panel member was determined by PCR, and the putative number of RNA copies calculated with the aid of the reference preparations distributed by the German external quality assessment (EQA) provider INSTAND e. V. (5). Furthermore, presence of infectious virus detectable by propagation in Vero cell culture was determined for the individual pools.
The whole evaluation panel may be subdivided into three subgroups: panel members, which are characterized by very high (CT 17-25; 18 pools), high (CT 25-30; 23 pools) or moderate (CT 30-36; 9 pools) viral load. During the comparative evaluation 4 panel members of the original panel had to be replaced, resulting in a slight shift of subgroups composition in the resulting panel version 2: 17 pools covering the CT-range 17-25, 23 pools the CT range 25-30 and 10 pools the CT range 30-36.
Antigen stability
Real-time antigen stability in panel members was investigated using quantitative SARS-CoV-2 ELISA Lumipulse G SARS-CoV-2 Ag (Fujirebio Inc., Shinjuku-ku, Tokyo, Japan). Panel members were tested after initial thawing and throughout one week incubation at 4°C. Furthermore, potential impact of additional freeze/thaw cycle was addressed.
Comparative evaluation
In the beginning of the comparative evaluation, laboratories participating in the comparative evaluation included the Robert Koch-Institute, the Paul-Ehrlich-Institute, the Nationales Konsiliarlaboratorium für Coronaviren (Charité), the Bundeswehr Institute of Microbiology, the Bernhard-Nocht-Institut für Tropenmedizin, and laboratories of the association ALM (Akkreditierte Labors in der Medizin). At a later stage, because of the massively increasing work load, the evaluation was continued mainly by PEI and RKI. Panels were shipped on dry ice and, once thawed, 50 µl aliquots were prepared, kept at 4°C and used within 5 days, without further freeze / thaw step. For each Ag RDT and panel member, the 50 µl aliquot was completely absorbed using the specimen collection device, e.g. swab, provided with the respective test. The swabs were then eluted in the test-specific buffer, strictly following the respective instructions for use (IFU). After applying the sample/buffer solution onto the test cassette and incubation, visual read out of control and target lines was done independently by two lab technicians, with potential discrepant results preliminarily interpreted as “equivocal”. In favor of the tests evaluated, both reactive and equivocal results for the target line were eventually scored as positive. At PEI the test cassettes were immediately scanned using BLOTrix Reader R2L (BioSciTec GmbH) and analysed with BLOTrix 4 Cubos (B4C) software (BioSciTec GmbH), at other evaluation sites the test results were documented by photographs. Some tests were provided with reading instruments and read as per instruction manual provided.
Tests were selected from original manufacturers, as far as this information was available. Often duplicate version of the very same tests are marketed under a new test name, new manufacturer or different distributor. Repeat testing of duplicates was avoided as far as possible in order to cope with the already huge variety of different tests placed on the EU Common Market.
Results
Characterization of the evaluation panel
Panel members spanned the CT range between 17 and 36. A specimen with an assigned SARS-CoV-2 RNA concentration of 106 RNA copies/ml provided by INSTAND corresponded to the CT value of 25. Assuming that a CT difference of 1 corresponds to a concentration factor of 2 and taking into account that the individual panel members cover a CT range from 17 to 36, the SARS-CoV-2 RNA amounts in the panel cover a concentration range from >108 to <103 copies / ml, respectively. The CT values of 20 or 30 would than correspond to approximate SARS-CoV-2 RNA concentrations of 3 × 107 or 3 × 104 copies / ml, respectively.
SARS-CoV-2 propagation in cell culture resulted in positive results for several of the low CT / high titre specimens, indicating presence of infectious virus despite the various preparation steps (more details in the tandem publication of Puyskens et al).
Stability of the analyte SARS-CoV-2 antigen in all panel members was studied under different conditions using a quantitative ELISA. While additional freeze/thaw steps had negative effect on analyte stability, there was no significant impact on the antigen content after up to 7 days storage at 4°C of the liquid 50 µl aliquots (data not shown). From one 500 µl thawed aliquot, nine to ten 50 µl aliquots were immediately filled and used within 5 days for the corresponding number of 9-10 test kits, ensuring no stability issues.
Comparative evaluation
122 SARS-CoV-2 rapid tests were evaluated in direct comparison using the evaluation panel, with only few differences between the closely related panel versions 1 and 2. For acceptable Ag RDT performance a minimum sensitivity of 75% for the panel member subgroup with very high SARS-CoV-2 concentration (CT<25, viral load around 106 SARS-CoV-2 RNA/ml and higher) was defined. This criterion corresponds to the detection of at least 14 of 18 subgroup positives in panel version 1 (18 members with CT<25), or 13 of 17 (17 members with CT<25 in panel version 2), respectively.
Of the 122 SARS-CoV-2 Ag RDT evaluated, 96 tests (78.7%) had a sensitivity of >75% for panel members with high viral loads (CT<25; Table 1), and 26 tests (21.3%) were of lower sensitivity not meeting the sensitivity criterion (Table 2). Of the 96 tests meeting the sensitivity limit, 58 (60.4%) detected all panel members of the subgroup with CT< 25 (100% subgroup sensitivity), and another 17 tests (17.7%) exhibited a respective subgroup sensitivity of >90%. In addition, 19 tests (19.8%) showed a sensitivity of >75% even in the CT range 25-30.
The 96 tests meeting the sensitivity criteria were reactive with between 14 and 41 members of the 50 members panel (supplemental Table 1). On average throughout all successful tests, 27 panel members were reactive. Overall reactivity of SARS-CoV-2 Ag RDT strongly followed the analyte concentration throughout the panel, justifying the conclusions of this study (supplemental Table 1).
The 26 SARS-CoV-2 Ag RDT missing the sensitivity criteria either failed completely (2 tests with 0 reactives) or were reactive with 2 to 12 (average 6.3) panel members. Again, reactivity was dependent on the analyte concentration throughout the panel members (supplemental Table 2). Two tests failed because of constant faint background reactivity throughout all panel members; this background reactivity was also seen when using pure extraction buffer and was thus not caused by the panel composition (data not shown).
Discussion
There is convincing evidence that infectivity of SARS-CoV-2 correlates directly with high viral loads in respiratory specimens of acutely infected persons. It has therefore been suggested to use antigen tests rather to detect potential infectivity to help to control the spread of infection than for the purpose of clinical diagnosis. Ag RDTs are designed to detect SARS-CoV-2 antigens in respective samples and have become a key part of testing strategies in many countries since fall 2020. Hundreds of different Ag RDTs, most often of East-Asian origin, are available on the Common Market in Europe. Nearly all tests state in their IFUs sensitivity values of >90% for PCR-confirmed specimens. Such statements are unreliable and in strong contrast to the results of our study and to independent evaluations. Their only explanation is a strong preselection of high-positive specimens.
Lack of independent evaluation combined with unreliable statements of quality features led the German Ministry of Health to request a comparative evaluation of the sensitivity of test kits offered in Germany. At the time being there are no EU-wide requirements for quality features of COVID-19 IVDs such as a defined minimum sensitivity or minimum specificity, and manufacturers may themselves certify their devices to comply with basic requirements of the IVDD. Therefore, it is mainly left to individual Member States or international organizations to define minimum requirements for acceptance of respective tests.
In Germany, the Ministry of Health decided to link the reimbursement of SARS-CoV-2 Ag RDT to quality requirements to be fulfilled by acceptable devices. Minimum requirements were jointly formulated by Paul-Ehrlich-Institute and Robert Koch-Institute and state for SARS-CoV-2 Ag RDTs a minimum sensitivity of 80% for PCR positive specimens obtained within the first seven days after symptom onset; the minimum specificity was defined as >97%, and for both requirements a study population of at least 100 persons is required. Analogous requirements of SARS-CoV-2 Ag RDTs have been proposed by the World Health Organization for the emergency use listing (EUL) (6), the US Food and Drug Administration (FDA) (7), the European Center for Disease Control (ECDC) (8), the Swiss Authority Bundesamt für Gesundheit (BAG) (9) or the non-governmental Foundation for Innovative New Diagnostics (FIND) (10). With mandatory application of the EU IVD Regulation (IVDR) for SARS-CoV-2 Ag RDT their regulatory oversight will improve in near future. In this context it might be of interest that so-called Common Specifications have already been drafted for SARS-CoV-2 IVD. Although these draft quality requirements are still under consultation and not yet in effect, they will become essential prerequisites for future CE-marking of respective devices.
For the time being, successful participation in a comparative evaluation of the crucial test feature sensitivity was installed as another precondition for reimbursement of SARS-CoV-2 Ag RDT by the German healthcare system. The first round of this evaluation is summarized in this manuscript.
The sensitivity requirement defined for a successful outcome in the comparative evaluation study (>75% sensitivity for CT<25) is in line with international requirements mentioned above, as far as clinical specimens from acutely infected individuals are comparable to the members of our evaluation panel. We followed routine use of the tests as far as possible, including pre-analytical steps like antigen absorption of the test-specific swabs, and subsequent elution into the test-specific buffer. The vast majority of Ag RDTs included in our study showed sufficient sensitivity according to our criteria. Nevertheless, the results showed a wide range of varying sensitivities. There were few tests with fairly high and many tests with sufficient sensitivity, but also quite a few tests, which did not meet the minimum criterion. The study shows that the majority of SARS-CoV-2 Ag RDTs correctly identify high viral loads of <CT 25 (>106 virus RNA copies/mL) in samples from the respiratory tract with a sensitivity of >75%, supporting their use in the early symptomatic phase. However, although sensitivity often declined with CT>25, there were also few SARS-CoV-2 Ag RDTs with quite high sensitivity, even 100% for CT<30, or up to 86% for the complete CT range (CT 17-36). There are scientific publications of further independent head-to-head evaluations for SARS-CoV-2 Ag RDT which, however, are limited to the comparison of only few tests (11-17). Respective conclusions based on clinical specimens are consistent with our results, and the sensitivity ranking of different tests was widely in line with our results based on the evaluation panel.
Since most of the SARS-CoV-2 Ag RDTs offered in Europe are provided without a readout device, visual interpretation of test results is indispensable. We would like to emphasize that few discrepant tests results obtained by two experienced lab technicians were reported. These equivocal results were ultimately interpreted as reactive, in favor of the tests under investigation. However, visual readout and subjective interpretation of faint test lines, potentially caused by borderline concentration of the analyte, presents a challenge for less experienced users, e.g. lay persons using Ag RDTs as self tests.
A limitation of the study is its spot check nature since it cannot address variations between different batches of the same product, or variations between different test locations (see also the tandem publication of Puyskens et al).
In conclusion, by using the same panel for a large number of different SARS-CoV-2 Ag RDT we were able to evaluate the comparative performance of the different tests under the same conditions. The evaluation panel proved to be accurate for sensitivity differentiation of SARS-CoV-2 Ag RDTs, distinguishing better performing from less suitable tests. The continuation of the comparative evaluation is needed cope with the rapidly growing market of SARS-CoV-2 Ag RDT. Since the panel has now been exhausted, we will continue the evaluation with a new set of samples with similar features, accurately calibrated against its predecessor.
Data Availability
All relevant data are contained in the manuscript
Footnotes
This study has been submitted as tandem manuscript together with the study of Puyskens A et al. “Establishment of an evaluation panel for the decentralized technical evaluation of the sensitivity of 31 rapid detection tests for SARS-CoV-2 diagnostics”