Abstract
Suppressing SARS-CoV-2 will likely require the rapid identification and isolation of infected individuals, on an ongoing basis. RT-PCR (reverse transcription polymerase chain reaction) tests are accurate but costly, making regular testing of every individual expensive. The costs are a challenge for all countries and particularly for developing countries. Cost reductions can be achieved by combining samples and testing them in groups. We propose an algorithm for grouping subsamples, prior to testing, based on the geometry of a hypercube. At low prevalence, this testing procedure uniquely identifies infected individuals in a small number of tests. We discuss the optimal group size and explain why, given the highly infectious nature of the disease, parallel searches are preferred. We report proof of concept experiments in which a positive sample was detected even when diluted a hundred-fold with negative samples. Using these methods, the costs of mass testing could be reduced by a factor of ten to a hundred or more. If infected individuals are quickly and effectively quarantined, the prevalence will fall and so will the costs of regularly testing everyone. Such a strategy provides a possible pathway to the longterm elimination of SARS-CoV-2. Field trials of our approach are now under way in Rwanda and initial data from these are reported here.
SARS-CoV-2 represents a major threat to global health. Rapidly identifying and quarantining infected individuals is one of the most important strategies available to contain the virus. However, each diagnostic SARS-CoV-2 test costs 30-50 US dollars [1]. Therefore, testing every individual in a population regularly, as may be essential to eliminating the virus, is very expensive. The costs are unaffordable for most low-income countries with limited resources for massive SARS-CoV-2 testing. It is therefore important to ask: are there more efficient ways to find infected people?
The first step in testing, swab collection, is labour intensive but does not require expensive chemicals or equipment. It may therefore be feasible to collect swabs regularly from everyone. The next step involves RT-PCR machines [2]. These require expensive chemical reagents, currently in short supply, as well as skilled personnel. Reducing the cost requires that we minimize the number of tests. Testing rapidly is also vital because SARS-CoV-2 is so infectious. Any time wasted during testing results in a higher prevalence of the virus, which spreads quickly [3].
To find infected individuals, the naive approach is to test everyone. For a group of size N, this takes N tests. However, far fewer tests are actually needed, especially at low prevalence. It is much more efficient to pool (or combine) samples and test them together. Pooling samples has already been discussed as a way to efficiently estimate the prevalence of the virus [4, 5]. Our focus here is, instead, on how to use pooling of subsamples to identify infected individuals.
The idea of group testing dates back to a paper of Dorfman in 1943 [6] (for recent public advocacy, see [7]). Dorfman’s algorithm reduces the number of tests, per person, required to find all infected individuals to at low prevalence p (see Appendix A). We shall describe an algorithm which requires only ≈ ep ln(1/p) tests per person at low p, substantially improving on Dorfman’s. Our algorithm is largely parallel, so that finding an infected individual usually happens in a single step. There are algorithms which require fewer tests, such as binary searches. However, they are adaptive and hence serial. As the prevalence falls, the optimal group size N grows, and adaptive searches take longer. During this time, the disease continues to spread. Below, we show that, at low prevalence, this effect disfavours adaptive searches. Among parallel algorithms, ours scales similarly to the best ones known in the literature (see Appendix B).
Group testing is most obviously effective when there are no infected individuals at all. When the samples from a group are pooled and tested together, one test suffices to show that no-one is infected. Our algorithm takes full advantage of this powerful result. As we shall see, the optimal group size N is chosen so that the first test, conducted on the whole group, is usually negative.
Now consider the case where only one individual is infected. The idea behind our algorithm is geometrical: we pool subsamples, prior to testing, in the pattern suggested by a high-dimensional hypercube. The group of individuals to be tested is represented by a set of N points on a cubic lattice in D dimensions, organized in the form of a hypercube with L points on a side, so that
Instead of directly testing the sample taken from each individual, we first divide it into D equal subsamples. These DN subsamples are recombined as follows. Slice the hypercube into L planar slices, perpendicular to one of the principal directions on the lattice. Form such a set of slices in each of the D principal directions. Altogether, there are DL slices, each consisting of N/L = LD-1 points. We combine the LD-1 subsamples corresponding to each slice. If there is just one infected individual, then one slice out of the L slices, taken in each of the D directions, will yield a positive result. That slice directly indicates the coordinate of the point corresponding to the infected individual, along the associated principal direction.
Therefore the number of tests required to uniquely identify the infected individual is where we used (1). Treating D as a continuous variable, the right hand side of (2) diverges at both small and large D, possessing a minimum at corresponding to L=e and a total of e ln N trials. In reality, D and L must be integers, but using L = 3 achieves almost the same efficiency (in the total number of trials, e is replaced with 3/ ln 3 ≈ 2.73, less than half a per cent larger, whereas using L = 2 or 4 gives 2/ ln 2 = 4/ ln 4 ≈ 2.89, more than 5 per cent larger). With no further constraint, finding one infected person in a group of 106 individuals using L = 3 would require only 38 tests, performed in just one round of testing.
In practice, we are limited by the capacity of the testing machine. A typical swab yields 105 viral RNA molecules/ml [8]. For each slice of the hypercube, we combine N/L subsamples of the virus, each of volume v. If the volume of each combined sample, V = Nv/L, exceeds the capacity of the PCR we will have to only use a portion of it. We should also keep in mind that at least 1 viral RNA is needed for an unambiguous result, and we must remain well above this limit.
Setting L = 3 and N = 3D, we find
For example, if V/v = 100 then (4) yields D ≈ 5, from which (3) yields N = 243. If v is a microliter, then V is 100 microliters. In a positive combined sample, there would be 100 viral RNA molecules. Even if only 10 microliters are used in the PCR machine, it would contain 10 viral RNA molecules, sufficient for a positive result. The typical number of tests required to find the infected individual is then only 3ln N ≈ 17, an efficiency gain of 14.
As a proof of concept, using oropharyngeal swab specimens collected during COVID-19 surveillance in Rwanda, we have shown that positive specimens can still be detected even after they are diluted by up to 100-fold, i.e.,V/v = 100, see Fig. 2. (For recent experiments demonstrating positive sample detection after 30- and 32-fold dilution, see Refs. [9, 10]). Sample pooling is now used for cost-effective, large scale testing in Rwanda to understand the spatial spread of SARS-Cov-2 nationally and identify new infection hotspots to enable a rapid response by public health officials. For example, using the hypercube algorithm, we recently screened 1,280 individuals using only 64 tests, an efficiency gain of 20 (see Figures 5 and 6 in Appendix C).
Note that the viral load found in a swab specimen is relatively low if collected during the early stages of viral replication [8]. Therefore, swabs taken during this period may contain insufficient virus to yield a positive result. The sensitivity of the test is typically increased by testing specimens collected at sequential time points. Methods like the one we describe here facilitate such sequential testing on a massive scale by drastically reducing the associated costs. In view of the large potential efficiency gains, it is worth exploring whether testing machines could be engineered to accommodate larger test volumes V.
We have so far assumed there is just one infected individual in the group to be tested. However, we cannot know this ahead of time. What if there are 2, 3 or more infected people present? The number of infected individuals present in a group of size N, at prevalence p, should be described by Poisson statistics with mean λ = pN (for recent measurements of p, in Iceland, see Ref. [11]). For λ well below unity, the probability to find m infected individuals falls rapidly with m. As we shall see, this is the regime in which our algorithm is most efficient.
We summarize the algorithm as follows:
For a given initial estimate of the prevalence p, select the group size N to be the optimal power of 3, discussed below. Test the whole sample to see whether one or more infected individuals are present. A negative test indicates everyone in the group is clear of infection.
If the test is positive, run one round of testing according to the hypercube algorithm with L = 3. The distribution of results for the LD slices should, for large N, accurately indicate the number m of infected individuals, with many consistency checks including a new estimate of the prevalence. If m = 1, the infected individual is immediately identified.
If m = 2, select a principal direction in which two slices are positive. Take one of these, itself a hypercube of dimension D – 1, and run the hypercube algorithm again. The coordinates of the corresponding infected individual are then uniquely identified, and those of the second infected individual are inferred by elimination.
If m > 2, select a principal direction in which all three slices are positive. Taking one of these, run the hypercube algorithm again. If that slice contains one infected individual, it is immediately identified. If the slice contains 2 or 3 infected individuals, their coordinates are discovered in 1 or 2 additional hypercube runs. The last infected individual is found by elimination. So, for m = 1,2 or 3, only m runs are needed.
For m > 3, it is an exercise in combinatorics to work out the expected number of runs. This turns out to be a number slightly higher than m. The corrections to m for m = 4, 5, 6 are given in Appendix B. For reasons given there, these corrections turn out to be relatively unimportant.
Figure 3 shows the expected number of tests required per person, to identify all infected individuals, for each value of the viral prevalence p (see Appendix B). Also shown is the optimal group size when constrained to be an exact power of 3. In Appendix B we also derive the approximation that the optimal group size N ≈ 0.350/p, and the expected number of tests T per person 〈T/N ≈ ep ln(0.734/p), shown as the dotted black line in the Figure.
We now compare our search algorithm with those already considered, for diverse purposes, in computer science or mathematical statistics. Information theory sets a lower bound on the number of tests per person required to identify an infected individual (see Appendix B). At low p, this bound is ≈ p log2(1/p). Binary searches approach this limit by performing an iterated series of tests (see, e.g., Refs. [12, 13]). These algorithms require fewer tests than ours by a factor of e ln2 ≈ 1.88. However, when considering a rapidly spreading infectious disease like COVID-19, saving time is important because infected individuals who are still at large can infect others. A parallel, or largely parallel, search such as ours reduces this risk. In contrast, binary searches are adaptive and must be performed serially. At low prevalence, the optimal group size N scales as 1/p (see Appendix A). A binary search takes log2 N ~ log2(1/p) steps to find an infected individual [12]. Each PCR test takes several hours, to which must be added any time taken for subsample sorting and selection. So, multiple of rounds of testing would consume significant time. During this period, the prevalence p of the virus grows exponentially. The doubling time for the virus is somewhat uncertain but it has been recently estimated, using data from China [3], to be τ2 ≈2.4 days. If each PCR test takes τ days, the viral prevalence grows by during an adaptive search. It follows that, at fixed τ/τ2, the adaptive algorithms do worse at low p. For example, if we assume that at most 3 rounds of adaptive PCR tests may be performed in a working day, then τ = 1/3 day. For prevalences p below per cent, a parallel approach like ours is then preferred. Reducing the costs of staff and lab time also favours a largely parallel strategy.
Another key consideration is robustness to error. A search method such as ours allows for many consistency checks which will help to eliminate false positives or negatives. In contrast, binary searches rely on repeated testing of the positive sample, making errors harder to identify.
The most striking consequence of our approach is how quickly cost of testing the whole population falls as the prevalence declines. This should incentivize decision-makers to act firmly in the early phases of the pandemic. Although driving the prevalence down initially is costly, maintaining a low prevalence thereafter and, indeed, eliminating COVID-19 altogether, will become progressively more affordable.
Data Availability
Data presented in Appendix on Methods and Supplementary Information
Contributions
L.M., S.N. coordinated the experiments; P.N., T.N., E.N., M.S., C.M., D.N., M.N. contributed to patient recruitment and data collection from the community; P.N., Y.B., J.S., A.U., R.R., E.L.N., E.M., E.M., N.R., I.E.M., J.B.M., C.M.M. contributed to laboratory RT-PCR test validation, data analysis, and interpretation. W.N. and N.T. contributed to the theory.
Acknowledgments
We thank the Rwanda Ministry of Health through RBC for stimulating discussions and correspondence. Kendrick Smith and Corinne Squire provided valuable encouragement and helpful references. Research at AIMS is supported in part by the Carnegie Corporation of New York and by the Government of Canada through the International Development Research Centre and Global Affairs Canada. Research at Perimeter Institute is supported in part by the Government of Canada through the Department of Innovation, Science and Economic Development Canada and by the Province of Ontario through the Ministry of Colleges and Universities.
Appendix A: Comparison with Dorfman
In a landmark paper in 1943, R. Dorfman considered the problem of searching for infected individuals by grouping (or pooling) samples. His approach remains influential (see, e.g., Refs. [14-17]). Consider a population of n individuals, broken up into groups of N members each. If the probability that any individual is infected (the prevalence) is p, the probability that a group is free of infection is (1 — p)N. Conversely, the probability that at least one member is infected is p’ = 1 — (1 — p)N. Dorfman’s strategy was to test all groups, and then to test every member of every infected group. The expected number of tests is then and the number of tests required per person is
In the last step, we assumed that p ≪ 1. The number of tests per person is minimised when the group size . The expected number of tests per person, at the optimal group size, is approximately .
Let us compare these results with those obtained using our hypercube algorithm. Assuming Poisson statistics for the number of infected individuals m in a group of size N, with λ = pN, the expected number of tests per person is given by where rm denotes the number of runs of our hypercube algorithm needed to identify all infected individuals. Each run consists of e ln N tests. We shall ignore any changes in the argument of the logarithm due to fewer than N tests sometimes being needed. These changes are unimportant at large N. For m = 1,2 or 3, rm = m. For larger m, more than m runs may be needed. Setting rm = m + cm for m > 3, the cm denote the average excess number of runs. They are tedious but straightforward to compute. The first few are: . In any case, they turn out to be unimportant because the minimum of (A3) is dominated by contributions from low m.
The last expression in (A3) represents the approximation that the cm corrections are neglected. The first term diverges at small N, and the second diverges at large N. Thus, a minimum exists. It is located at λ = pN ≈ 0.350, independent of p. For such a small value of λ, contributions from higher m are strongly suppressed. One subtlety is that, at very low p, since N = λ/p, for the relevant values of λ the argument of the logarithm becomes very large, so its derivative with respect to λ becomes suppressed with respect to its value. This effect means that, for extremely small p, the derivative of the corrections can compete with the derivative of the logarithm. We have checked that only for p < 10−10 does this alter the minimum value of λ by more than 0.01. Hence, for all practical purposes, we can safely ignore the cm corrections.
The optimal group size and the corresponding minimal number of tests per person are thus given, to an excellent approximation, by
For a prevalence p of 1 per cent, Dorfman’s approach yields a group size of approximately 10 and approximately 0.2 tests per person, whereas ours yields a group size of 35 and 0.12 tests per person. For a prevalence of 0.1 per cent, Dorfman’s optimal group size is 32 whereas ours is 350. His approach requires 0.06 tests per person, whereas ours needs only 0.018. For all lower, but still reasonable, values of p our approach prevails by an increasing margin.
Further geometrical insight into our approach and its relation to Dorfman’s may be gained by considering its generalization to a hypercuboid. A set of n people to be tested may be represented as a rectangular volume in a D-dimensional cubic lattice with L1,L2,…,LD points on a side, constrained to obey L1L2 … LD = n. For simplicity, consider the case when there is exactly one infected individual to be found. Our approach is to group the points in slices taken along each the principal directions of the lattice, and to test each slice. The required number of tests is L1 + L2 + … + LD. This is minimized, for fixed n, when the Li are all equal, i.e., for a hypercube. One can think of Dorfman’s approach to the same problem as a reduced D = 2 version, with L1 the number of groups and L2 the group size so L1L2 = n. Dorfman’s approach is first to test the L1 groups, and then the L2 members of the positive group (this is the sense in which it is reduced: in the second step, he tests individuals not groups). The number of tests required is L1 + L2 which is minimized, at fixed n = L1L2, by taking so this is the optimal group size. Setting the prevalence p = 1/n, we recover the results noted for Dorfman’s algorithm above. The advantage of our approach over Dorfman’s is that of going to higher dimensions, and using group testing uniformly. Whereas testing procedures may be represented as matrices in Dorfman’s and other approaches [18], in our approach higher dimensional tensors are required.
Appendix B: Information theory bounds
Information theory sets a lower bound on the number of tests required to uniquely identify all infected individuals. The uncertainty in who is infected is associated with an entropy, where the sum is over all possible states and the pi are the corresponding probabilities. If a test outputs a zero or a one then for t tests the number of possible test outputs is 2t and the corresponding information gained is at most t ln 2. In order to learn everything about the system, one requires an information gain of at least S, hence
Consider a sample of size n, with m infected individuals chosen at random. The number of such states is . Therefore, from (B2), the minimum number of tests required is . Assuming a binomial distribution with prevalence p, and replacing m with its expectation p n, we find the expected number of tests per person is ~ p log2(1/p). Binary searches can approach this limit by performing an adaptive series of tests (see, e.g, Refs. [12, 13]). However, the number of tests in each search scales as log2 n ~ log2(1/p) for the optimal value of n, at low p. In contrast, the most efficient known parallel searches, called “noiseless, nonadaptive” tests in Ref. [18], require a factor of eln2 more tests (see their equations (2.8) and (2.10)), just as our algorithm does.
Appendix C: Methods and Supplementary Information
Observational study design
We conducted an experiment to evaluate the hypothesis that known SARS-CoV-2 positive oropharyngeal swab specimens collected during COVID-19 surveillance in Rwanda will test positive after they are combined with as many as 99 known SARS-CoV-2 negative specimens. This was followed by an observational study that aimed to apply our hypercube algorithm to increase the efficiency of community testing for COVID-19 in Rwanda. In the experiment, two different sets of sample pools were tested for SARS-CoV-2 using RT-PCR. Each set consisted of three sample pools containing one known SARS-CoV-2 positive sample diluted in ratios of 1:20, 1:50, and 1:100 by combining it with equivalent amounts of 19, 49, and 99 known SARS-CoV-2 negative samples, respectively (See Fig. 4). In the observational study, 1280 individuals selected from the community were tested for SARS-CoV-2 using RT-PCR. One third of the individuals were participants in a screening for Severe Acute Respiratory Infections (SARI) and Influenza Like Illness (ILI) conducted in 30 per cent of the health facilities found across the 30 districts of Rwanda. The remaining two thirds were from COVID-19 screening of at-risk groups in the capital city of Kigali. The latter group is comprised mainly of people (market vendors, bank agents, and supermarket agents) who remained active during the lockdown imposed by the Government of Rwanda to contain COVID-19. The positive fraction of RT-PCR tests for SARS-CoV-2 conducted in Rwanda in March 2020 suggests an upper-bound of 2 per cent for the virus prevalence in the country. Using p =2 per cent in the hypercube algorithm indicated an optimal sample group size of 17.5. For convenience, the 1280 individual samples were combined in 64 groups of 20 samples before testing for SARS-CoV-2 (See Figures 5 and 6).
We used two established experimental protocols for SARS-CoV-2 testing, namely 1) a protocol by DAAN Gene Co., Ltd., Sun Yat-sen University, which is available online [19], and is also under review by the WHO [20], and 2) another by Corman et al., [2] which is widely used by the scientific community.
Sample collection
Oropharyngeal swabs were collected by wiping the tonsils and posterior pharynx wall with two swabs, and the swab heads were immersed into Viral Transport Medium (VTM). Samples were transported in VTM to the Rwanda National Reference Laboratory (NRL) immediately after collection. Samples that had to be transported over a long distance were stored in dry ice.
RT-PCR for SARS-CoV-2
Total viral RNA was extracted from swab specimens using the QIAamp Viral RNA 91 Mini Kit (Qiagen, Hilden, Germany), according to the manufacturer’s instructions. RNA samples were screened for SARS-CoV-2 using a 2019-nCoV RNA RT-PCR test targeting two genes respectively encoding an open reading frame (denoted Orf1ab) and nucleocapsid protein (denoted N) (DAAN Gene Co., Ltd. Of Sun Yat-sen University, 19, Xi-angshan Road, Guangzhou Hi-Tech Industrial Development Zone, China). For Orf1ab, CCCT-GTGGGTTTTACACTTAA and ACGATTGTGCATCAGCTGA were used as forward and reverse primers, respectively, together with a 5’-VIC CCGTCTGCGGTATGTGGAAAGGTTATGG-BHQ1-3’ probe. For N, GGGGAACTTCTCCTGCTAGAAT and CAGACATTTTGCTCT-CAAGCTG were used as forward and reverse primers, respectively, together with a 5’-FAM-TTGCTGCTGCTTGACAGATT-TAMRA-3’ probe. The RT-PCR reaction was set up according to the manufacturer’s protocol, with a total volume of 25 μL. The reaction was run on the ABI Prism 7500 SDS Instrument (Applied Biosystems) at 50°C for 15 min for reverse transcription, denatured at 95°C for 15 min, followed by 45 PCR cycles of 94°C for 15 sec and 55°C for 45 sec. A threshold cycle (Ct value) <40 indicated a positive test, while Ct value >40 indicated a negative test. Positive controls for the reaction showed amplification as determined by curves for FAM and VIC detection channels, and a Ct value ≤ 32. Positive tests were confirmed using LightMix SarbecoV E-gene and LightMix Modular SARS-CoV-2 RdRp RT-PCRs targeting the envelope (E) and RNA directed RNA Polymerase (RdRp) genes, respectively, as described by the manufacturer (TIB MOLBIOL Syntheselabor GmbH, Eresburgstr. 22-23, D-12103 Berlin, Germany). Both the primers used and the RT-PCR reaction conditions were previously described [2].
Ethics approval
Ethics approval was obtained from the Rwanda National Ethics Committee (Ref: FWA Assurance No. 00001973 IRB 00001497 of IORG0001100/20March2020) and written informed consents were obtained from the patients.
Data availability
All data are available from the corresponding authors upon reasonable request.
Footnotes
↵* nturok{at}perimeterinstitute.ca