Abstract
Background Routine testing for SARS-CoV-2 in the community is essential for guiding key epidemiological decisions from the quarantine of individual patients to enrolling regional and national preventive measures. Yet, the primary testing tool, the RT-qPCR based testing, is notoriously known for its low sensitivity, i.e. high risk of missed detection of carriers. Quantifying the false-negative rate (FNR) of the RT-qPCR test at the community settings and its dependence on patient demographic and disease progression is therefore key in designing and refining strategies for disease spread prevention.
Methods Analyzing 843,917 test results of 521,696 patients, we identified false-negative (FN) and true-positive (TP) results as negative and positive results preceded by a COVID-19 diagnosis and followed by a later positive test. Regression analyses were used to determine associations of false-negative results with time of sampling after diagnosis, patient demographics and viral loads based on RT-qPCR Ct values of the next positive tests.
Findings The overall FNR was 22.8%, which is consistent with previous studies. Yet, this rate was much lower at the first 5 days following diagnosis (10.7%) and only increased in later dates. Furthermore, the FNR was strongly associated with demographics, with odds ratio of 1.74 (95% CI: 1.58-1.9) for women over men and 2.54 (95% CI: 2.39-2.69) for a 20 versus a 50 year old patient. Finally, FNR was associated with viral loads (p-value 0.002), with a difference of 1.1 (95% CI: 0.60-1.57) between the average Ct of the N gene in a positive test following a false-negative compared to a positive test following a true-positive.
Interpretation Our results show that in the first few days following diagnosis, when results are critical for quarantine decisions, RT-qPCR testing is more reliable than previously reported. Yet the reliability of the test result is reduced in later days as well as for women and younger patients, where the viral loads are typically lower.
Funding This research was supported by the ISRAEL SCIENCE FOUNDATION (grant No. 3633/19) within the KillCorona – Curbing Coronavirus Research Program.
Introduction
The ongoing COVID-19 pandemic has already infected more than 45 million people worldwide (https://coronavirus.jhu.edu/map.html, October 30th, 20201). A major tool in combating the pandemic is testing for viral carriage, which is used for both diagnostic and epidemiologic purposes. The most commonly used viral detection tests are based on the reverse transcription quantitative polymerase chain reaction of viral genes (RT-qPCR). This nucleic acid test is of high specificity, i.e. very low false-positive rate2–5. In contrast, a high false-negative rate was reported for these tests6–10. These high false-negative rates impede local and global efforts to slow down disease spread, as patients incorrectly diagnosed as non-carriers may subsequently infect additional people11. Systematically quantifying the rate of false-negative results and its dependencies on disease progression and patient demographics is critical for disease spread modeling, public health policy making and person-level quarantine decisions.
Various approaches have been taken to estimate the false-negative rate of COVID-19 RT-qPCR tests. Measuring the rate of false-negative results in a population of patients with highly specific pathologies (e.g. chest CT), has initially alerted physicians and epidemiologists of the high false-negative rate, estimated at approximately 30% 4–7,9. A meta-analysis of multiple such studies found that the reported rates were highly variable with a mean false-negative rate of 11%12. However, and as previously noted12,13, these meta-analysis studies were necessarily based on a combination of variable studies of non-uniform origins and methodologies, typically involving small groups of patients. A more recent systematic approach was based on ‘longitudinal testing’ in which the accuracy of each test is determined based on later tests of the same patient: a negative test which is directly followed by a positive one is deemed false negative. Application of this approach in a hospital setting resulted in an estimation of a false-negative rate of 17.8% 13. Systematic large scale studies at the community, critical for epidemiological disease control, have been lacking.
Beyond the average false negative rate, it is also important to understand whether and how the false-negative rate is associated with patient-specific and sample-specific attributes. Meta-analysis studies showed a strong association of false-negative results with time since exposure14 or time since onset of symptoms15. At the patient specific level, as viral load is associated with time since onset of symptoms, sex and age16–22, it has been proposed that false-negative rates might also depend on demographics, but current studies lacked statistical power for quantifying such dependencies23.
Here, we apply a longitudinal testing based approach to a large dataset of patient-level test series with linked demographics and electronic health records, to qunatify the false-negative rate of COVID-19 test results at the community and its associations with age, sex and time since diagnosis. Finally, we test whether the risk of false-negative results is associated with viral load at the single-patient level.
Methods
Data collection
Anonymized clinical records of SARS-CoV-2 RT-qPCR test results (test reports) were retrieved by Maccabi Healthcare Services (MHS) for the period between February 8th and September 24th 2020. Records of COVID-19 or COVID-19-related diagnoses by physicians (diagnosis reports) and referrals based on suspected exposure to the disease (epidemiological-based referrals) were retrieved for these patients. When available, fluorescence measurements data of the PCR test were retrieved for each test (RT-qPCR measurements). Randomly generated identifiers were used to link between test results and diagnosis codes.
Test results
MHS aggregates all test results for all its members, whether or not the test itself was performed by MHS laboratory. Test results data included, for each test: random patient number, sample number, sample execution date and test result. Test results were either “positive” (7.4%), “negative” (92%) or “borderline-positive” (0.6%, which we considered as positive in our analysis). Patients for whom two tests with different results were recorded on the same day were excluded from the analysis (274 patients, 0.05%).
Diagnosis reports
Diagnoses are routinely recorded in MHS database. For all patients with at least one positive test result, we retrieved any symptom-based COVID-19 diagnoses recorded prior to their first test. Diagnosis report data included: random patient number, diagnosis date, and diagnosis code (ICD9 and internal MHS codes).
Epidemiologically-based referrals
Since April 3rd 2020, an epidemiologically-based referral was filled by physicians referring a patient to a SARS-CoV-2 RT-qPCR test. Each referral report included: random patient number, referral number, referral date and referral cause.
RT-qPCR measurements
Fluorescence measurements data were retrieved from all 7 Bio-rad CFX96 RT-qPCR machines of MHS central laboratories. For each test, the following data were included: sample number, PCR machine number, test well number, test date, fluorescence measurements for 45 amplification cycles in 4 channels: FAM, Cal Red 610 Quasar 670 and HEX, corresponding to the measurements of E gene, RdRp gene, N gene and the internal control, respectively.
Assigning patient diagnosis date
For each patient, the earliest date of COVID-19 symptoms and/or epidemiologically-based referral was considered as “date of diagnosis”. When both symptom-based diagnosis and epidemiological-based referrals were available, they were usually recorded on the same day. For simplicity, we excluded a small number of patients for whom both a diagnosis and a referral were available, but were more than a day apart (5.2% of diagnosed patients).
Calculating the false negative rate
For any patient with at least one positive test, a ‘positive period’ was defined as the period between their date of diagnosis and their last positive test. Negative test reports during this period were regarded as false-negative (FN), while positive test reports during this period were regarded as true-positive (TP). False-negative rate (FNR) was calculated as
Logistic regression
Logistic regression of a false-negative versus true-positive result was performed using Python’s statsmodels library. The probability of a false-negative result was fitted to the test result (true-positive: 1, false-negative: 0) for all tests within the positive period.
Linear regression
Linear regression of Ct values for each fluorescence channel was performed using Python’s statsmodels library.
Calculating odds ratios from logistic regression
Odds ratios (OR) were calculated from the coefficients of the above logistic regression. For the binary variable sex (male: 1, female: 0), OR was defined as: ORsex = exp(Csex), while for the continuous variables age and day, ORs were defined as: exp(Cage ∗ ageolder − Cage ∗ ageyounger) younger versus older, exp(Cday ∗ ageearly − Cday ∗ agelate) early versus late, where Csex, Cday, Cage respectivley. are the coefficients for sex, day and age variables,
Curve fitting for FNR over time
Test results were grouped by day since diagnosis. FNR was calculated separately for each group. Data was fitted by , where t is time since diagnosis, and a, b, Rearly and Rlate are the fitted parameters, with the latter two standing for the false negative rates at the early and late phases, respectively.
Differences in FNR between age groups
According to patient age, test results were divided into two groups of similar size (<40, ≥40 years). FNR was calculated separately for each group. Statistical significance for differences in FNR between groups was tested using a two-sided Fisher’s exact test (SciPy in Python).
Calculating Ct
Fluorescence measurements of each channel for each well were normalized by the mean measurement of the first 5 PCR cycles. For each sample, Ct was defined for each gene as the PCR cycle in which the normalized fluorescence measurement crossed a set threshold (FAM: 1.1, Cal Red 610: 1.2, Quasar 670: 1.2 and HEX: 1.2).
Ethical approval
The study protocol was approved by the ethics committee of Maccabi Healthcare Services, Tel-Aviv, Israel.
Results
Among all ∼2 million MHS patients, we identified 843,917 recorded tests for 521,696 patients (table 1). Within this set, 51,499 had at least one positive result. As quarantine discharge policy was based on test results, patients were often repeatedly tested, resulting in a series of test results for each patient. In our analysis, we focused on 7,872 patients with well-defined test series, satisfying the following conditions: (1) had a defined diagnosis date; (2) had at least one positive sample within 14 days following the diagnostic date; (3) had a test series that ended with a negative result (table 1). The vast majority of these test series ended with 2 or more negative results, in agreement with the discharge policy (68% of patients, supplementary table 1).
False negative and true positive test results were defined based on their context within a longitudinal patient series of test results. We considered, for each patient, the series of test results following diagnosis (figure 1A,B). The median day since diagnosis until the first negative result, indicating recovery, was 18 (IQR: 13-24; table 1). For some patients a negative result came only after more than 50 days (2.4% of patients). For each patient, we then consider the period from diagnosis date to the last positive sample as a period in which the patient is carrying the virus, even if not at detectable loads yet. Taking an epidemiological stand, negative test results within this patient-specific prospectively-positive time period (“positive period”) were regarded as false-negative (FN) results. Similarly, positive test results within this period were regarded as true-positive (TP) ones (figure 1A). To avoid bias for true-positive results, the last positive result, used for marking the end of the positive period, was not counted towards TP. The rate of negative results increased over time, and at 20 days after diagnosis the number of negative tests first surpassed the number of positive or false-negative results (figure 1B,C). In total, we identified 1,982 test results defined as FN and 6,715 test results defined as TP, indicating an overall false-negative rate of 22.8% which is within the wide range of previously reported FNR13,23–28.
To identify personalized features associated with false-negative results, we performed multivariate logistic regression for the odds of a false-negative result (Methods: ‘Logistic regression’ and ‘Calculating odds ratio from logistic regression’). Patient age, sex and number of days from day of diagnosis were all associated with a false-negative result (supplementary table 2). Patient age was strongly and negatively associated with a false-negative result, with odds ratio of 2.54 (95% CI: 2.39-2.69) for a 10 versus a 40 year old patient. The number of days from the day of diagnosis was positively correlated with a false-negative result, with OR of 2.16 (95% CI: 1.96-2.38) for samples taken at day 15 compared to samples taken at day 0. Lastly, patient sex was also associated with false-negative results, with female to male odds ratio of 1.74 (95% CI: 1.58-1.9).
Following the observed association between time after diagnosis and false-negative result, we characterized the FNR during disease progression. Calculating FNR per day after diagnosis (Methods: ‘calculating FNR’), we found that FNR followed 3 distinct phases: at the first few days following diagnosis, it was fairly constant and low (10.7%, days 0-5). It then gradually increased over days 6-15, and finally it plateaued at high rates of about 39% (Methods: ‘Curve fitting for FNR over time’; figure 2A).
We then focused on the earlier days after diagnosis (days 0-5), in which FNR was relatively low, and in which a precise diagnosis is most critical for epidemiological patient-level quarantine control. Multivariate logistic regression analysis of test results for these days alone identified the association of false-negative results during these days with sex and age. A 10 years old versus 40 years old patient has an odds ratio of 3.79 (95% CI: 3.42-4.2), and female patients have an OR of 2.01 (95% CI: 1.69-2.39) compared with males (supplementary table 3). Dividing the patients into 2 age groups of similar size (<40 and ≥40, table 1), we found that FNR during this initial period was significantly higher for the younger age group (p-value 0.0002, Fisher’s exact test, OR=0.71, 95% CI: 0.58-0.87; figure 2B). Similarly, the FNR during the later period (days 6-24) also significantly decreased with age (p-value 0.02, Fisher’s exact test).
Based on previous reports of viral load differences along disease progression, between males and females and among age groups5,16–19,21,29–34, we hypothesized that differences in FNR across demographic factors and disease progression may stem from changes in viral load, which would be reflected in the measured Ct values. To test this hypothesis, we first tested for associations of Ct values of the three viral genes (N gene, E gene and RdRp) and the internal control gene (IC) with patient age and sex and number of days after diagnosis (figure 3, supplementary figure 1). Indeed, a linear regression model revealed positive correlation of the Ct of viral genes with the number of days after diagnosis, and negative correlation with age and sex (male; Methods: ‘Linear regression’, supplementary table 4). An opposite association was found with the IC gene, in agreement with within-tube competition for reagents between the multiplexed reactions35 (supplementary figure 1C). The viral load association with demographics and time, therefore, mirrored the associations found for the FNR. Finally, we tested more directly for association of false-negative rate with viral load at the individual patient level. Since Ct values are not available for false-negative results, we used as a proxy the Ct values of the next positive result. Comparing the distribution of Ct values of positive test results following false-negative tests with, as a control, the Ct values of positive test results following true-positive tests, we found that indeed false-negative results are associated with reduced viral load for all three viral genes (figure 4 and supplementary figure 2; Mann-Whitney U test; p-value of 1.4 ∗ 10−5, 1.5 ∗ 10−2, 1.8 ∗ 10−3 for N, E and RdRp genes, respectively).
Discussion
Our analysis of large dataset of electronic health records of COVID-19 patients showed that while on average the FNR is about 23%, consistent with past measurements, this rate varies strongly with age, sex and time after diagnosis. At the first few days following diagnosis, the FNR is only 10% on average and even lower for men and older patients. Combining these data with raw fluorescence measurements of RT-qPCR tests for the presence of SARS-CoV-2 genes provides evidence that false-negative rates stem from low viral loads at the single-patient level.
Our study has several limitations. First, we treat all positive tests as true positive. While errors may occur, the rate of false-positive results is very low2–5 and we do not expect it to significantly affect our results. Future studies can further improve the reliability of confirmation of positive cases by combining PCR test results with serology tests. Second, we treat negative results at the end of test series as ‘true-negative’, while it is possible that if the test series were continued new positive tests might have been detected. Again, we do not expect this to significantly affect our results: most series in our study end with two consecutive negative results, and the chances for two consecutive false-negative tests are very low. Moreover, this bias will mostly affect the calculated false-negative rate at later days after diagnosis. Third, as viral loads after infection first increase and only later decrease, it is possible that false-negative rates follow an opposite pattern: first decreasing and only later increasing. Analysing our cohort, we could only identify the later phase of increasing false-negative rate. However, it is possible that with different cohorts or inclusion criteria, both phases can be observed. Fourth, it will be interesting to see how changing the way Ct is calculated can fine-tune the way positive and negative results are determined based on conflicting results of the genes. Finally, we emphasize that most of the patients in the study cohort were symptomatic; therefore, our results may not represent the false-negative rate for asymptomatic patients.
Despite these limitations, our results provide important epidemiological and clinical input as to the patient specific sensitivity of tests, with important implications for epidemiological policy for patient-specific quarantine decisions and disease prevention and control. In particular, they underscore that the risk of false-negative at the very early days following diagnosis might be lower than previously thought, reinforcing the use of tests for disease prevention and individual quarantine assessments.
Data Availability
Data is available upon reasonable request.