Abstract
Background Real-time reverse transcription polymerase chain reaction (RT-PCR) targeting select genes of the SARS-CoV-2 RNA has been the main diagnostic tool in the global response to the COVID-19 pandemic. However, the diagnostic accuracy of the test has not been studied systematically outside of the laboratory setting. The aim of this study is to provide estimates of the diagnostic sensitivity and specificity of the RT-PCR test developed by China CDC.
Methods The study design is a secondary analysis of published findings on 1014 patients in Wuhan, China, of whom 601 tested positive and 413 were negative for COVID-19. Sensitivity and specificity were reconstructed using a Bayesian approach from probabilistic knowledge of the diagnostic errors. Predictive values of the test were calculated, resulting in estimates for the number of confirmatory tests that are needed for establishing the presence or absence of COVID-19, depending on the prior probability of a patient having the disease.
Results The sensitivity of the RT-PCR diagnostic test was estimated to be 0.777 (95% CI: 0.715, 0.849), while the specificity was 0.988 (95% CI: 0.933, 1.000). The confidence intervals include sampling error in addition to the error due to probabilistic knowledge of the data.
Discussion The Chinese version of the RT-PCR test had a conspicuous rate of false negative results, likely missing between 15% and 29% of patients with COVID-19. For a patient with a prior probability of COVID-19 greater than 18%, at least two negative test results would be needed to lower the chances of COVID-19 below 5%. Caution is advised in generalizing these findings to other versions of the RT-PCR test that are being used in diverse geographic regions.
Introduction
The cause of a disease outbreak that began in Wuhan, China in the last quarter of year 2019 was later identified as a novel coronavirus, labeled SARS-CoV-2 since it can cause severe acute respiratory syndrome. The disease associated with SARS-CoV-2 has been termed COVID-19 (1). The publication of the SARS-CoV-2 genome (2) led to the rapid development in January 2020 of real-time reverse transcription polymerase chain reaction (RT-PCR) tests for the diagnosis of COVID-19 while avoiding cross-reactions to other known coronaviruses. One version was developed in China that targeted the ORF1ab and N genes of viral RNA (3) while another version was developed in Germany that targeted the RdRp, E, and N genes (4). Real-time RT-PCR tests were developed and implemented thereafter by many laboratories around the world (5–7), even as COVID-19 became a global pandemic that continues to spread rapidly at this time. A listing of tests and protocols is being maintained online by the World Health Organization (8).
The rapid development and deployment of RT-PCR tests has been essential for the ability to measure and control the spread of SARS-CoV-2. However, the urgency of the pandemic has not allowed time for reliable and adequately powered clinical studies to be conducted to measure the diagnostic limitations of the RT-PCR test. At present, the bulk of knowledge about the sensitivity and specificity of RT-PCR is based on laboratory measurements that have goals related to the minimum threshold of detection of viral loads and the required number of thermal cycles of the chain reaction (9). Some attention has been given to the viral distribution by physical location, such as the differences in positive rates of RT-PCR in nasopharyngeal versus oropharyngeal swabs, or in the sputum and bronchoalveolar lavage fluid (10,11). Other factors that can impact the diagnostic success of RT-PCR include the timing of the test relative to disease onset, adequacy of the volume of fluids collected in the swab, and deviations from the laboratory-recommended protocol under real-world conditions. In terms of clinical decision-making, any of the causes of failure of the test can lead to incorrect diagnoses due to false positives and false negatives. This issue has received media attention (12) and a recent editorial written by a professor of medicine in an influential US newspaper (13) urged physicians to beware of false negatives of diagnostic testing for COVID-19 while acknowledging that reliable data on rates of false negatives were not yet available.
This study provides a timely assessment of the diagnostic sensitivity and specificity of RT-PCR that is based on a sample of 1014 patients in Wuhan, China (14). The original study had the aim of measuring the accuracy of chest CT imaging for diagnosis of COVID-19 and they assumed that RT-PCR was the gold standard. However, the authors provided additional information about the status of patients that allows for the reconstruction of the sensitivity and specificity of RT-PCR in the context of clinical decision-making. The knowledge of the efficacy and limitations of RT-PCR, even if known for only the Chinese version of the test at this time, can be expected to provide a valuable reference for medical practitioners and researchers at the frontlines of the fight against the pandemic. The findings have implications for policy makers as well, because policies for pandemic control in conditions of limited availability of tests that have a high rate of false negatives can be starkly different from policies in the presence of an abundant supply of a diagnostic test with excellent predictive values.
Data and Methods
Data
Data from published findings (14) have been used in this study. The study included 1014 patients suspected of having COVID-19 in Wuhan, China, who underwent RT-PCR and chest CT imaging diagnostic tests during a 30-day period in the months of January and February, 2020. The mean age was reported to be 51 ±15 years, and 46% were male. Throat swab samples were collected and the RT-PCR assays were reported to have used TaqMan One-Step RT-PCR kits from Shanghai Huirui Biotechnology Co., Ltd., or Shanghai BioGerm Medical Biotechnology Co., Ltd., both of which were reported to have been approved for use by China Food and Drug Administration. RT-PCR tests were positive for 601 patients (59.3%) and negative for the other 413 patients (40.7%). Although these tests were treated as the gold standard for comparison with chest CT imaging, the study authors provided valuable additional information. Patients who had negative RT-PCR tests but positive tests from chest CT were reassessed on the basis of clinical symptoms, CT features, and serial CT scans. The study staff concluded that among patients with negative RT-PCR tests, 147 could be classified as highly likely cases of COVID-19 and another 103 could be classified as probable cases of COVID-19. Moreover, among the 601 patients with positive RT-PCR tests, 21 patients were classified negative for COVID-19 from chest CT imaging. In this study, these 21 cases are assumed to have a low chance of being false positives, which is in alignment with the implicit assumption of the original study, indicated by their choice of RT-PCR as the reference for comparison with chest CT imaging. Apart from its role in the identification of probable false positives and negatives of RT-PCR, the chest CT imaging data is ignored in this study. In summary, the data are composed of firm knowledge of RT-PCR test results and probabilistic knowledge of the numbers of false positives and false negatives. The ranges and notations used for the true and false positives and negatives are presented in Table 1.
Statistical Analysis
A modified Bayesian approach was adopted to estimate the uncertainty that arose from imprecise knowledge of the data. The data, denoted X, consisted of the 2×2 contingency table that represented the observed joint distribution of the RT-PCR decision (positive or negative) and the binary disease status (COVID-19 present or absent). Since the number of positive and negative tests were known, the data were uniquely defined from the number of false positives, n1, and number of false negatives, n2, i.e. X = X(n1, n2). The uncertainty in the data stemmed from the uncertainty in values of the random variables, n1 and n2. The distributions of n1 and n2 were estimated from the level of confidence expressed about false identifications. This procedure has similarities to fuzzy logic in which linguistic uncertainties about terms such as highly likely and probable are represented by membership functions (15). Although it is motivated by fuzzy logic, the treatment used here is strictly based in probability theory. The starting point was an informative distribution defined on the probability space of a diagnostic error. Upon calculating the resulting distributions of n1 and n2, the estimated values and distributions of sensitivity and specificity were derived from the joint distribution of n1 and n2. More details are given below.
The approach is thus Bayesian in the computational sense; it starts with an informative distribution, akin to an informative prior distribution, and ends with a distribution of the desired parameters (16). However, the terminal points of the analysis do not describe the distribution of the same parameters, so use of the terms prior and posterior distributions has been avoided. Moreover, the likelihood function of the data that is calculated here has a different interpretation than the one obtained in normal conditions when the data are firmly known. For any given probability of false identification, the likelihood of the number of false positives or negatives is obtained from the binomial distribution. The number of trials for the binomial distribution are known from the data, while the binomial probability parameter, denoted here by θ, arises from a distribution that represents the degree of confidence expressed about the false identifications. Thus, the distribution of n1 is expressed by: where θL is the chance of a case being false positive, the subscript L refers to a low chance, Pr(θL) is the distribution of that chance over the probability space [0, 1], and Pr(nl| ρL) is the binomial distribution with probability θL and number of trials given by the maximum range for n1, which is known to be 21. Similarly, the distribution of n2 is expressed by: where θH and θM are the high and medium chances of a case being highly likely to be a false negative and probable false negative, respectively. The sum of nH and nM equals n2 and the distributions of nH and nM are binomial, given values of θH and θM along with the number of binomial trials, which are known to be 147 and 103, respectively. Beta distributions, which often serve as conjugate distributions for the binomial distribution, were used for θL, θM, and θH to describe low, medium, and high levels of confidence in the diagnostic errors that were identified (17).
The data X(n1, n2) were uniquely specified by n1 and n2, as were the sensitivity, S1, and specificity, S2. In particular, S1(n1, n2) = (N1 – n1) / (N1 – n1 + n2) and S2(n1, n2) = (N2 – n2) / (N2 – n2 + n1), where the values of N1 and N2 are known. Therefore, the joint distribution Pr(nl, n2) provided a mapping to the distributions of the sensitivity and specificity. The careful considerations described up to this point resulted in estimates of the expected values of sensitivity, specificity and a measure of the uncertainty in their values that arose from imperfect knowledge of the data. Another source of uncertainty is due to sampling error, which was estimated using established methods for the standard error for proportions.
Lastly, this study evaluated the predictive values of the test that provide the chance of disease in a patient conditional upon results of the diagnostic test. Sensitivity is a conditional probability that can be reversed using the Bayes formula to provide the positive predictive value of the test:
A similar formula can be expressed for the probability of COVID-19 in a patient even if the test is negative, i.e. Pr (COVID-19 | Negative RT-PCR). In the above equation, Pr(COVID-19) on the right hand side can be interpreted as the prevalence of the disease when the testing is being done in the general population. More generally, Pr(COVID-19) is the prior probability of presence of the disease. This is also the more appropriate interpretation in the present circumstances in which the testing is reserved largely for symptomatic patients. The medical professional may suspect that a patient has COVID-19, which could be quantified into the prior probability. The RT-PCR test is then carried out and the test result decides the posterior probability. Statistical analysis was done using the R programming language (18) in the RStudio software environment (19).
Results
Shape parameters of the beta distributions for low, medium, and high confidence levels for being a false test result were a=5, b=20 for Pr(θL), a=20, b=20 for Pr(θM), and a=20, b=5 for Pr(θH). The median value of θL was 0.192, with 10th and 90th percentiles given by 0.105 and 0.306, respectively. The median value of θM was 0.500, with 10th and 90th percentiles equal to 0.399 and 0.601, respectively. For θH, the median value was 0.808, with 10th and 90th percentiles given by 0.694 and 0.895, respectively. The distributions for θL, θM, and θH are shown in Figure 1.
The maximum likelihood of the joint distribution of n1 and n2 (Figure 2) was located at n1 = 3 and n2 = 172. This solution corresponds to an estimate of 770 patients with COVID-19 and 244 without the disease. Due to the low number of false positives, the expectation value of specificity was high: Ŝ2 = 0.988. In contrast, the high false negative count was reflected in the lower expectation value of sensitivity: Ŝl = 0.777. The 95% confidence intervals for Pr(nl, n2) led to the corresponding limits for Pr(S1 | n1, n2) and Pr(S2 | n1, n2). Sensitivity had a 95% confidence interval from 0.746 to 0.821, while the 95% confidence interval for specificity ranged between 0.958 and 1.000. Figure 3 depicts the distributions of sensitivity and specificity.
The 95% confidence intervals mentioned above provided a measure of the uncertainty that arose from imperfect knowledge of the data. Additionally, the sampling error was estimated from the standard error for proportions evaluated at each of the endpoints of the 95% confidence interval for the data-related uncertainty. The overall 95% confidence intervals that incorporate the two sources of error are shown in Table 2.
The impact was explored of selecting different shape parameters for the beta distributions that describe low, medium, and high confidences. First, narrower beta distributions were defined by using shape parameters a=20, b=80 for Pr(θL), a=80, b=80 for Pr(θM), and a=80, b=20 for Pr(θH). The median values of θL, θM, and θH were 0.198, 0.500, and 0.802, respectively. The span between the 10th and 90th percentiles was approximately 0.1, which may be compared to 0.2 for the distributions described previously. Second, wider beta distributions were defined by using shape parameters a=3, b=10 for Pr(θL), a=10, b=10 for Pr(θM), and a=10, b=3 for Pr(θH). The median values of θL, θM, and θH were 0.217, 0.500, and 0.783, respectively. The span between the 10th and 90th percentiles was approximately 0.29. The estimated values of sensitivity, specificity, and their 95% confidence intervals are shown in Table 2. Point estimates of the parameters showed very little variation, but the choice of narrower/wider beta distributions resulted in somewhat narrower/wider confidence intervals.
The predictive values of the RT-PCR diagnostic test are shown in Figure 4 for prior probabilities ranging from 0 to 1. The two curves in the figure show the posterior probabilities of the presence of COVID-19 when test results are positive or negative. Additionally, Table 3 displays the number of confirmatory tests that are needed to establish presence or absence of COVID-19 at confidence levels of 90% and 95%. For example, if the prior probability of presence of COVID-19 in a patient is judged to be 0.6, a single negative RT-PCR test would reduce that probability to 0.253. A second negative test would reduce it further to 0.070, which would be sufficient if at least 90% confidence is required to establish absence of the disease. However, for the confidence level of 95%, a third negative test would be needed to lower the probability of COVID-19 below 0.05.
Discussion
RT-PCR tests are commonly used for the diagnosis of many influenza viruses and coronaviruses, including the viruses responsible for the 2002-04 SARS coronavirus outbreak, the 2009 H1N1 influenza pandemic, and the 2012 MERS coronavirus outbreak. RT-PCR tests are often treated as the gold standard in comparisons of diagnostic methods, which has led to few sources of reliable data about their diagnostic accuracy in clinical practice. The virus culture process is considered a better standard, but it takes several days in comparison to the few hours needed for RT-PCR tests. In one such comparison (20), RT-PCR was found to have sensitivity greater than 96% relative to virus culture for the diagnosis of H1N1 influenza. High accuracy of RT-PCR has also been reported for MERS (21). On the other hand, low accuracy has been reported for detection of SARS with real-time RT-PCR (22,23), although rates of detection were improved with the refinement of laboratory methods (24).
In the current COVID-19 pandemic, it has been a great boon to have had the rapid development of several versions of RT-PCR diagnostic tests that target the detection of different genes from the viral RNA. Laboratory testing has shown that at least one version of the RT-PCR assay can detect viral loads as small as 3.2 RNA copies per reaction (4) and that it does not cross-react to other known coronaviruses, particularly when the primer for the assay is well-chosen (7). However, there is widespread doubt about how well the tests work in practice (12,13). One source of error arises from the uncertain distribution of the virus in the body at various times during the COVID-19 disease trajectory (11). Comparisons of specimens from nasal and throat swabs indicate better sensitivity in nasal swabs and diminished sensitivity in throat swabs, particularly after the first few days of disease onset (10). The variation in the severity of the viral infection between subjects is another source of error; milder infections are more likely to escape detection. Other sources of error include sample collection, storage and transportation errors, such as collecting a low volume of fluid in swabs and depletion of the sample. Laboratory errors during assay processing are possible too.
The sensitivity and specificity of diagnostic testing using RT-PCR for COVID-19 that were estimated in this study may be considered to provide the cumulative impact of the various possible sources of error. It is clear that the sensitivity of the test is its weakest aspect while the specificity appears to be very good. Between 15% and 29% of COVID-19 cases may have gone undetected by the RT-PCR diagnostic test designed by China CDC that was implemented with TaqMan One-Step RT-PCR kits. As far as the medical practitioner is concerned, the predictive values of the diagnostic test are of utmost importance. For COVID-19, if a medical practitioner suspected that there was a 50% chance that a patient had the disease, a subsequent positive RT-PCR test would increase that chance to 98.4%. On the other hand, after a negative RT-PCR test the patient still has 18.4% chance of the disease. A second confirmatory negative test would be needed to bring the chance of disease below 5%.
It is possible that some of the diagnostic errors were mitigated by the actions of medical professionals who might have taken a critical view of negative test results for symptomatic patients. Nevertheless, the false negative rate is still likely to be among the main reasons for the difficulty in controlling the breakout in its early stages. The problem of false negatives implies that public health measures that rely on singling out and isolating the cases of COVID-19 are unlikely to be successful on large scales. For instance, if the true prevalence is 1% in a population, testing would miss approximately 22 cases of COVID-19 for every 10,000 people tested. A highly transmissible virus can continue to propagate through the misdiagnosed cases.
The proportion of throat swab specimens that were positive for SARS-CoV-2 in RT-PCR tests conducted on patients with confirmed COVID-19 have been reported as being a mere 32% (11) and almost twice as much – 60% in severe cases and 61% in mild cases – in another study (10). Neither of those values is equivalent to sensitivity as defined in this study because multiple specimens were drawn from a smaller set of participants in the mentioned studies. Nonetheless, it seems reasonable to conclude that the sensitivity estimated in this study is higher than what was suggested by the mentioned studies. A possible reason might be that viral loads may have been higher for data collected in the epicenter of the pandemic. On the other hand, it is worth noting that the sensitivity of RT-PCR for SARS-CoV-2 that was estimated in this study is in close alignment with the value of 0.80 reported for the sensitivity of detection of SARS with RT-PCR (24). Perhaps the similarity is not too surprising since the genomes of the two viruses have been reported to be 82% similar (25).
The primary limitation of this study is that the estimated sensitivity and specificity apply to the particular version of the RT-PCR test that was urgently created (3) and that was being used in Wuhan, China, during January and February, 2020. Laboratories around the world reacted rapidly to the pandemic and created their own versions of the RT-PCR test, as well as tests of other types. It may be expected that experimentation was done with protocols and procedures that resulted in changes in the performance of RT-PCR tests that were developed later. Another limitation of the study is that it is a retrospective study based on probabilistic knowledge of diagnostic errors. A study that is designed to compare the diagnostic accuracy of RT-PCR with a better gold standard method would be able to provide more definitive estimates and narrower confidence intervals. Data about the severity of infections of sampled patients and measures of viral load that were found in the RT-PCR tests, such as cycle threshold, were not available, which is another limiting factor of this study.
Conclusions
The diagnostic sensitivity and specificity of the RT-PCR test for COVID-19 were reconstructed from data on 1014 patients in Wuhan, China. Uncertainty that arose from incomplete knowledge of the joint distributions of test results and disease status was quantified with a modified Bayesian analysis, along with the quantification of uncertainty due to sampling error. The results indicated that the RT-PCR test administered via throat swabs had a conspicuous rate of false negative results, likely missing between 15% and 29% of patients with COVID-19. For any patient who is suspected to have COVID-19 with higher than a roughly 1-in-5 chance, at least two confirmatory negative RT-PCR tests would be necessary to reduce the likelihood of disease below 5%. The limitation of the study findings is that they apply to one version of the RT-PCR diagnostic test for COVID-19 that was developed and distributed urgently by China CDC. Study findings may not generalize to other versions of the RT-PCR test that are being used in diverse geographic regions.
Data Availability
Published data were used in this study.
Acknowledgements
I wish to thank Stanley Cron for his review and feedback on the first draft of this manuscript.