Combined multiplex panel test results are a poor estimate of disease prevalence without adjustment for test error
=================================================================================================================

* Robert Challen
* Anastasia Chatzilena
* George Qian
* Glenda Oben
* Rachel Kwiatkowska
* Catherine Hyams
* Adam Finn
* Krasimira Tsaneva-Atanasova
* Leon Danon

## Abstract

Multiplex panel tests identify many individual pathogens at once, using a set of component tests. In some panels the number of components can be large. If the panel is detecting causative pathogens for a single syndrome or disease then we might estimate the burden of that disease by combining the results of the panel, for example determining the prevalence of pneumococcal pneumonia as caused by many individual pneumococcal serotypes. When we are dealing with multiplex test panels with many components, test error in the individual components of a panel, even when present at very low levels, can cause significant overall error. Uncertainty in the sensitivity and specificity of the individual tests, and statistical fluctuations in the numbers of false positives and false negatives, will cause large uncertainty in the combined estimates of disease prevalence. In many cases this can be a source of significant bias. In this paper we develop a mathematical framework to characterise this issue, present novel statistical methods that adjust for this bias and quantify uncertainty, and use simulation to test these methods. As multiplex testing becomes more commonly used for screening in routine clinical practice, accumulation of test error due to the combination of large numbers of test results needs to be identified and corrected for.

**Author summary** During analysis of pneumococcal incidence data obtained from serotype specific multiplex urine antigen testing, we identified that despite excellent test sensitivity and specificity, the small error rate in each individual serotype test has the potential to compound and cause large uncertainty in the resulting estimates of pneumococcal prevalence, obtained by combining individual results. This limits the accuracy of estimates of the burden of disease caused by vaccine preventable pneumococcal serotypes, and in certain situations can produce marked bias.

## Introduction

Multiplex panel testing is a convenient and rapid diagnostic approach and is increasingly being used in clinical practice to differentiate between viral and bacterial causes of a range of disorders [1]. It has also been used in epidemiological studies to identify pneumococcal subtypes targeted by vaccines [2] or monitor disease spread [3]. Multiplex panel tests have been developed for a wide range of clinical syndromes caused by different pathogens, or for specific diseases caused by different subtypes of the same pathogen [1], and may be based on immunological [4, 5] or genetic techniques [6–11]. The number of targets tested for in each multiplex are increasing, but range from a handful, up to 48 different causative agents [3]. In this paper we demonstrate that when large multiplex panels are used, even small errors in the component tests can cause significant compound error and potential bias if the results are combined, usually leading to an overestimate of the prevalence of the combined condition.

In the schematic in Fig 1, we distinguish between multiplex testing (subfigures A-D) and other types of multiple testing (subfigures E-G). Subfigures A-D show two component tests which identify each of two subtypes of disease. The disease subtypes are present independently of each other and the disease super-type is present if any of the subtypes is present (B-C). In panel A we see that a false positive in one component, results in a false positive in the combined panel. In subfigure B one subtype is correctly detected, in C the other subtype, and in subfigure D a false positive result for one subtype and a false negative for the other results in an overall result which is correct for the wrong reason. In all subfigures A-D, the combined test result would be interpreted as positive. As described above, this design of test is usually extended to many more than two subtypes to make a multiplex panel.

![Fig 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/12/15/2023.12.14.23299860/F1.medium.gif)

[Fig 1.](http://medrxiv.org/content/early/2023/12/15/2023.12.14.23299860/F1)

Fig 1. Two scenarios for multiple testing.
Panels A-D depict a multiplex panel test which is the subject of this analysis. It depicts the situation where multiple tests are employed to detect multiple subtypes of disease which may be present separately or together, the results of which are combined to give an overall result, such that if any component test is positive, the combination is positive. An alternative, shown in panels E-H, and not in scope of this paper concerns the situation where multiple tests are used to identify a single condition. In this case two interpretations of the multiple test results are possible, which either maximise test sensitivity or test specificity.

Figures 1 E-H show a different test design which is more related to multiple modalities of testing [12]. In this situation, the multiple tests are looking for the same underlying cause of disease which does not have subtypes. In Figure /reffig1 E, both tests are true negatives and the overall result also a true negative. The interpretation of the two tests can be: a) that any single test being positive infers disease, in which case all subfigures F-H show positive combined results, or b) that both tests must be positive to identify the disease, in which case only subfigure H represents a positive result. These are not regarded as multiplex tests.

In more formal language, we define a multiplex test as consisting of a set of independent components which test different independent hypotheses, the results of which are combined to give a panel result where a positive test result in any component implies a positive test result in the panel. From this point, only multiplex panel tests will be discussed.

If a condition is composed of many subtypes, then each individual subtype must be a fraction of the overall condition prevalence. The more subtypes in a multiplex panel, the smaller that fraction will be, without loss of generality. If the prevalence of each component is low, then each component test is operating at a level where the positive predictive value of the test (i.e. the probability that a positive test result represents a true positive rather than a false positive) is also relatively low. This leads to a high probability of observing false positives in each component. We will also observe false negatives depending on the sensitivity of the test, but if the prevalence of a subtype is low, there are fewer true positives to be missed.

The effect of this can be seen in Fig 2 where we look at the theoretical distribution of false negatives and false positives in 1000 tests for three hypothetical disease subtypes, present at 2%, 0.5% and 0% prevalence, assuming a test with high specificity of 99.75% and moderate sensitivity of 80%. At 2% prevalence, false positive test results are likely to be balanced by the false negatives (Fig 2 A) and the expected test positivity is expected to be lower than 2%, the true value of prevalence in this simulation, (Fig 2 B and C). When the prevalence of the subtype is lower, at 0.5%, this pattern is reversed, and the false positives will tend to outweigh the false negatives (Fig 2 D) leading to a higher test positivity than prevalence (Fig 2 E and F). In the 0% scenario (Fig 2 G,H and I) all positives are by definition false positives, distributed with high variance leading to a test positivity above 0.

![Fig 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/12/15/2023.12.14.23299860/F2.medium.gif)

[Fig 2.](http://medrxiv.org/content/early/2023/12/15/2023.12.14.23299860/F2)

Fig 2. Error distributions of test results in low pre-test probability settings.
Distribution of false positives (cyan bars, with expected value as a blue vertical line) and false negatives (orange bars, expected value red line) of 1000 hypothetical test results with 0.9975 specificity and 0.8 sensitivity at different prevalence levels. (A), (D) and (G) show the disaggregated distribution of false positives and false negatives and (B), (E) and (H) show the combined error distribution of test positive observations (grey bars), and expected test positivity (magenta line) compared to the true condition positives (black line).

If a multiplex panel which consists of 20 subtypes is applied to a disease which is present at a prevalence of 10%, then it is reasonable to expect that the three patterns in Fig 2 will be present in some combination. The components have a mix of false positives and false negatives, in a manner dependent on the distribution of disease subtypes. In this particular scenario (20 highly specific tests at 10% prevalence) the balance of these will be towards false positives. Because any positive component results in a positive panel result, the component false positive errors compound in combination. In this example the error combines in such a way that the panel result will contain more false positives than false negatives, and the resulting test positivity rate will be an overestimate of true prevalence.

The compounding of error in numerous components is analogous to parallel testing of multiple statistical hypotheses. In this situation, a Bonferroni correction is often used to reduce the risk of over-interpreting the results of statistical tests of significance [13]. In a similar way, results from parallel testing of disease sub-types are at risk of being be over-interpreted without a clear understanding of the nature of test errors.

In the remainder of this paper we quantify this risk, and summarise the mathematical properties of multiplex tests. We use a realistic simulation based on the example of pneumococcal serotypes to demonstrate the implications and study potential mitigation strategies. In S1 Appendix we provide the detail of the mathematical analysis, and validate our findings against a broad range of simulation scenarios. In S2 Appendix we provide specific detail on propagation of uncertainty associated with combined multiplex panel testing, and validate this against a set of realistic simulations. Supporting implementations of all methods described here are provided in S3 R package.

## Materials and methods

In this section we describe the mathematical analysis, the methods used to adjust for potential bias and uncertainty, and the simulations used to test and illustrate the problem. The majority of the detailed methods are found in S1 Appendix and S2 Appendix. The equations presented here are for ease of reference and are not essential to the remainder of the analysis presented in this summary paper.

### Mathematical analysis and validation

Given a set of *N* multiplex panel component tests, the combined test result is defined as positive if any of the panel component tests are positive. For a specific patient *k* this is represented by the following expression, where *I* is an indicator function and *O* is observed test positivity. ![Formula][1]</img>  The test positivity rate (or apparent prevalence: ![Graphic][2]</img>) for the panel result of *N* tests for a group of *K* patients is given by: ![Formula][3]</img>  A panel result is positive if any component result is positive, and in S1 Appendix we show that a true negative panel result can only be the result of a combination of true negative component results. From this we go on to determine estimates of sensitivity and specificity expressions for combined panels as shown below. In Eq 3 and 4, ![Graphic][4]</img> is the apparent prevalence (test positivity rate) for the component tests. *sens**n* and *sens**N* is the sensitivity of the components and combined panel, with *spec**N* and *spec**n* as the specificity. ![Formula][5]</img>  From this, we use the Rogan-Gladen estimator of true prevalence [14], to derive expressions for the true prevalence of a combined panel based on the test positivity, sensitivity and specificity of the components. ![Formula][6]</img>  In S1 Appendix these estimators are demonstrated to perform well in a broad range of scenarios based on randomly generated synthetic multiplex panels, and the behaviour of these estimators is analysed in detail.

### Application to realistic situations

To illustrate the implications of multiplex test error for epidemiological studies, we have constructed a simulation based on pneumococcal serotypes, to demonstrate uncertainty and risk of bias that could occur in studies that investigate the overall burden of pneumococcal disease using multiplex testing.

We previously published the frequency of the 20 pneumococcal serotypes contained in the 20-valent pneumococcal conjugate vaccine (PCV20), that were identified in an invasive pneumococcal disease (IPD) cohort in Bristol between January 2021 and December 2022 [15]. This IPD distribution was scaled to give a realistic distribution of 20 subtypes in a hypothetical population with an overall PCV20-type pneumococcal prevalence of 10%. We simulate testing this population with a hypothetical multiplex panel which detects the 20 individual serotypes. For illustration purposes, we assume all component tests of the multiplex panel are moderately sensitive (80%) and highly specific (99.75%), (these assumptions are loosely based on existing serotype specific detection tests). The simulated test results for individual serotypes were aggregated into a PCV7 group (any positive of serotypes 4, 6B, 9V, 14, 18C, 19F, 23F), a PCV13 group (PCV7 groups plus 1, 3, 5, 6A, 7F, 19A), a PCV15 group (PCV13 plus 22F and 33F), and a PCV20 group (all serotypes). This allows us to compare “true” simulation prevalence to test positivity rates (apparent prevalence). Using the estimators for panel sensitivity and specificity above, we use the synthetic data set to estimate the true prevalence from test positivity, of both components and panels. With the same basic simulation we vary component test sensitivity and specificity, and investigate how the difference between “true” simulation prevalence (10%) and simulated test positivity rates (apparent prevalence) depends on test performance in a realistic scenario.

### Uncertainty propagation

Our mathematical analysis assumes precisely known values for the specificity and sensitivity of component tests. However, these quantities can only be estimated as a result of control-group testing. Because individual subtypes are usually present at low levels when there are multiple subtypes, the number of positive disease controls for any given subtype is typically small [2]. This places a limit on the precision of estimates of component test sensitivity, which in turn makes interpretation of test positivity in both components and panels challenging.

For single tests, there are approaches to estimating true prevalence from test positivity, which incorporate uncertainty in sensitivity and specificity, in both frequentist [16–18] and Bayesian frameworks [18–20]. In S2 Appendix we extend these two frameworks to account for multiplex testing, and implement a third resampling procedure combined with the Rogan-Gladen estimator to propagate uncertainty. We test this against a synthetic data set that is based on the IPD distribution scaled to an overall pneumococcal prevalence of 10% (further described in S2 Appendix). These methods are implemented as an R package “testerror” in S3 R package.

## Results

In the illustrative simulation motivated by IPD serotype distributions, the serotypes range from having no observed cases to making up 25.6% of the total [15]. When this is scaled to a synthetic population with 10% overall prevalence, the component prevalence ranges from 0% to 3.8% and, as with the theoretical examples in Fig 2 D-I, the majority of serotypes fall into the category where the apparent prevalence is higher than the true prevalence due to false positives, despite assuming a highly specific test with 99.75% specificity (Fig 3 A). The bias towards overestimation due to false positives is strongest for subtypes with low, or zero, prevalence, whereas the underestimation due to low sensitivity is strongest for subtypes with higher prevalence (also demonstrated in Fig 2 A-C).

![Fig 3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/12/15/2023.12.14.23299860/F3.medium.gif)

[Fig 3.](http://medrxiv.org/content/early/2023/12/15/2023.12.14.23299860/F3)

Fig 3. True versus apparent prevalence in multiplex test components and panel results.
The apparent prevalence as a function of true prevalence in a simulated realistic scenario with excellent test specificity (99.75%) and moderate test sensitivity (80%). (A) shows the individual component relationship and (B) shows the panel relationship when 20 components are combined. Black lines show the relationship and the grey transparent lines are a guide to the eye showing perfect agreement. Note that (A) and (B) are on very different scales.

In the synthetic but realistic scenario in Fig 3 A, with excellent test specificity (99.75%) and moderate test sensitivity (80%), test positivity rate (apparent prevalence) is expected to be higher than true prevalence under a threshold of 1.2%. When a set of 20 components are combined, that together result in a true panel prevalence of 10%, the combined errors mean that the panel test positivity is higher than the true prevalence (Fig 3 B, dashed black lines). In Fig 1 D and S1 Appendix we identify that false positives in one test balance out false positives in another test, and this makes panel test sensitivity a complex quantity that counter-intuitively depends on disease prevalence, component distribution, sensitivity and specificity. As a result, the relationship between true panel prevalence and apparent panel prevalence (test positivity) is non-linear (Fig 3 B), and in this particular simulation, test positivity will be an over-estimate of true prevalence, until true prevalence exceeds 22%.

Component sensitivity and specificity determine the difference between true and apparent prevalence as shown in Fig 4. This considers the same scenario of 10% prevalence, but shows the relative difference between true and apparent prevalence when varying sensitivity and specificity. The previous assumptions are marked as a blue cross in the figure, and at this high level of specificity (i.e. 99.75% - right dotted vertical line in Fig 4) the ratio between apparent and true prevalence is mostly influenced by test sensitivity. If sensitivity is low enough (less than 50%) the false negative rate exceeds the combined false positive rate and apparent prevalence is smaller than true prevalence. In any situation where the specificity is lower, the balance of error is most influenced by test specificity, and test sensitivity becomes much less important as a factor determining the difference between true and apparent prevalence. Even marginally lower values of test specificity result in test positivity being a gross overestimate of panel prevalence. If the component test specificity is only 98% (left dotted line) the combined 2% false positive rate of 20 components is sufficient to drive the overall panel test positivity to 4 times the level of the true prevalence set in this simulation, regardless of the test sensitivity.

![Fig 4.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/12/15/2023.12.14.23299860/F4.medium.gif)

[Fig 4.](http://medrxiv.org/content/early/2023/12/15/2023.12.14.23299860/F4)

Fig 4. Bias in apparent prevalence as an estimator for true prevalence.
A simulated scenario of 20 components realistically distributed following patterns seen in IPD, with a simulated true prevalence of 10%, and assuming the same sensitivity and specificity for each of the component tests. Expected test positivity rates are calculated for all combinations of sensitivity and specificity, and compared to the true prevalence (10%) as a ratio. At sensitivity of 80% and specificity of 99.75% (the blue cross) the test positivity rate will be about 1.26 times higher than true prevalence. Blue areas represent parameter space where test positivity is an underestimate of true prevalence due to excess of false negatives, and red areas where test positivity is an overestimate due to excess of false positives.

We have described that even low false positive rates in component tests lead to overestimates of uncommon components. The converse is true for components with comparatively high prevalence. In the scenario we have been using as an example, despite the excellent specificity of the tests and 10% overall prevalence the balance of the component estimates is such that test positivity will overestimate true prevalence. This is seen more clearly in Fig 5 (left subfigure) in which simulated true prevalence levels (blue) are lower than test positivity (red) for all but two of the components (serotypes 3 and 8). In the right subfigure we see the effect of combining these into groups of 7, 13, 15 and 20 components, representing combinations of serotypes targeted by vaccines. As predicted, overestimates of prevalence are compounded and the size of each overestimate depends both on the number and distribution of test components.

![Fig 5.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/12/15/2023.12.14.23299860/F5.medium.gif)

[Fig 5.](http://medrxiv.org/content/early/2023/12/15/2023.12.14.23299860/F5)

Fig 5. Correction of bias in a single IPD scenario.
The relative frequency of the 20 pneumococcal serotypes contained in PCV20, and identified in Bristol within the last 2 years, informed a simulation of a serotype distribution with an overall PCV20 pneumococcal prevalence of 10% (blue lines) in a sample size of 1000 synthetic patients. Test positivity was simulated assuming each serotype test had a sensitivity of 80% and a specificity of 99.75% (red lines) resulting in underestimates of ‘true’ prevalence for serotypes 3 and 8, and overestimates for the rest. In the right subfigure combined test positivity for each PCV group (red lines) overestimate true prevalence (blue lines) for this scenario. We estimate true prevalence from test positivity (red lines), incorporating uncertainty in component sensitivity and specificity using a Bayesian model described in S2 Appendix. These estimates are shown as point estimates and 95% credible intervals (black), which accurately estimate the true prevalence (blue lines).

In S2 Appendix we describe methods for correcting this bias in both frequentist and Bayesian frameworks using results from the mathematical analysis (S1 Appendix). In Fig 5 the Bayesian correction is applied and we are able to correctly predict the true prevalence (blue) allowing for uncertainty in our knowledge of test sensitivity and specificity. This is examined in a broader range of scenarios in S2 Appendix but in summary both Bayesian and Lang-Reiczigel (frequentist) approaches work well when we have good prior information about test sensitivity and specificity, but if these assumptions are very wrong, then we cannot expect either method to produce accurate estimates.

## Discussion

Combining multiplex test results into a panel commonly results in test positivity that significantly overestimates true prevalence. Multiplex testing simultaneously tests many hypotheses, and by combining the result into a single panel result leads to compounding of error. This error can be significant because of the low positive predictive value of individual component tests operating at low pre-test probability. This is critically dependent on component test specificity, and very high specificity is essential in tests which are designed to be interpreted as a combined result.

Panel test sensitivity is difficult to characterise. When multiplex tests are combined, components with a larger pre-test probability will generate more false negatives. In panel tests, false negative results in one component are over-ridden by any positives in other components. The specificity of the overall panel test is therefore a complicated function of component test sensitivity, specificity and pre-test probability (component prevalence), leading to higher panel sensitivity at higher prevalence. This is counter-intuitive as test sensitivity is usually regarded as independent of prevalence. This makes it challenging to compare panel test positivity rates in populations with different prevalence.

It remains possible to estimate true prevalence from test positivity, despite the complexities around panel test specificity and sensitivity. Positivity estimates generated by panel tests can be significantly biased and the expected value of test positivity is not a binomially distributed quantity (as demonstrated in Fig 2) so we cannot infer confidence intervals from an observation. The raw test positivity / apparent prevalence of a panel test is therefore very hard to interpret. We recommend use of the techniques described in this paper to produce modelled true prevalence estimates with confidence limits.

Sensitivity and specificity assumptions that incorporate uncertainty are critical in producing accurate modelled true prevalence estimates. Specificity estimates for multiplex testing usually rely on a disease free control group, which may also be used to determine cut points to achieve set specificity levels, and can usually give us a reasonable estimate of component test specificity. Determining the sensitivity of the components of a multiplex test is much harder as it needs proven cases of disease with known subtype. These are difficult to find for rare disease subtypes, and gold standard identification of disease subtypes is not always available, or free from error [21, 22]. This results in a great deal of uncertainty in estimates of component test sensitivity. In some situations panel test sensitivity is estimated directly, however as we saw above, panel test sensitivity is dependent on a range of factors including overall prevalence, and component distribution. Any direct estimates of panel sensitivity are not generalisable outside of the specific population tested. The methods presented here for modelling true prevalence from multiplex tests do allow for the uncertainty in sensitivity and specificity to be propagated appropriately. The accuracy of this correction, however, is dependent on the quality of the estimates of specificity and sensitivity (see S2 Appendix), and complete mis-specification of either quantity prevents correct estimation of true prevalence. To improve accuracy and narrow the confidence intervals of estimates of prevalence it is far more important to characterise the sensitivity and specificity of the test than increase the sample size of testing. With a poorly understood test it is hard to draw any conclusions from the results.

The bias in panel test positivity is an inevitable consequence of combining multiple tests in environments with moderate to low prevalence. It can be mitigated in a number of ways: a) the specificity of the component tests is increased, b) second line confirmatory testing is performed, c) the multiplex test can only be applied to populations with a very high overall disease prevalence. In the last case we may be able to use a multiplex test to determine which subtype of disease is causative if we already know the patient has the disease by using a different test, or using specific clinical diagnostic criteria that select patients with high probability of disease.

There are analagous situations where multiplex panel tests are used with similar potential risks. For example the Biofire FilmArray™respiratory panel 2.1 is one of a number of multiplex panels directed at respiratory pathogens [1]. It detects 19 viruses [21, 23]. We have trialled using this in Bristol to investigate co-infection of respiratorypathogens. There are multiple comparative evaluations of the Biofire FilmArray™panel [7, 21, 22, 24–26] but there has not yet been a large scale evaluation of test specificity using disease free controls for each individual panel. Identifying a patient as having co-infection by any of the 19 viral diseases in the panel, requires similar adjustment for the combined test uncertainty of all of the panel components to estimate co-infection frequency.

## Conclusion

In this paper we have characterised the degree of uncertainty that results if multiplex panel test results are combined to give an overall result. The principal example of this is pneumococcal disease, in which specific component tests of a urine antigen detection test (UAD) identify up to 24 individual pneumococcal serotypes [2, 4]. This is designed to be highly specific with individual serotype tests being around 99.75%. The serotypes are generally grouped together by the vaccines that target them, to determine vaccine preventable disease, or all together as an estimate of pneumococcal disease burden [15]. This use of multiplex UAD testing is susceptible to the uncertainty and biases described in this analysis. Even considering the highly specific nature of the UAD tests [4], as the number of components increases so does the risk of bias. Any seemingly minor decrease in test specificity is expected to have a large impact on estimates of disease burden. Despite excellent specificity, without correction, the large number of tests in the panel creates uncertainty in prevalence estimates using UAD tests, and difficulty in comparing results to those of other similar studies. In this analysis we present methods to correct and quantify uncertainty in prevalence estimates using multiplex panels such as the UAD. These methods are a useful tool but critically rely on estimates of test sensitivity and specificity, and without these it is very hard to estimate disease burden using UAD results.

Uncertainty in test results due to lower sensitivity and specificity result in more noise at lower levels of prevalence [27, 28]. In vaccine effectiveness studies using a test negative design this phenomenon acts to mask the effect of a vaccine in the lower prevalence vaccinated group. Hence test error always results in an underestimate of vaccine effectiveness [28]. The less sensitive the test, the greater this underestimate. For pneumococcal vaccination, the serotype of pneumococcal disease is determined using urine antigen detection (UAD) test panels [2, 4]. Theory suggests that, because of the issues identified here, conclusions on vaccine effectiveness based on the UAD tests are an underestimate [28]. The underestimate of vaccine effectiveness helps mitigate any bias resulting from test error in disease burden estimates, and hence the anticipated impact of a vaccine in the real world may be relatively unaffected. Further work would be needed to formally assess this.

## Supporting information

S1 Appendix [[supplements/299860_file03.pdf]](pending:yes)

S2 Appendix [[supplements/299860_file04.pdf]](pending:yes)

## Data Availability

All data and code produced are available online at [https://github.com/bristol-vaccine-centre/testerror](https://github.com/bristol-vaccine-centre/testerror)

[https://github.com/bristol-vaccine-centre/testerror](https://github.com/bristol-vaccine-centre/testerror) 

## Supporting information

**S1 Appendix. Sensitivity and specificity of combined panel tests**. Derivation of the performance metrics and true prevalence adjustments for combination tests.

**S2 Appendix. Propagation of uncertainty of combined panel tests**. Bayesian and frequentist approaches to estimating the uncertainty of panel test results.

**S3 R package. testerror: Uncertainty in Multiplex Panel Testing**. R package providing methods to support the estimation of epidemiological parameters based on the results of multiplex panel tests, doi:10.5281/zenodo.7691196.

[https://bristol-vaccine-centre.github.io/testerror/](https://bristol-vaccine-centre.github.io/testerror/).

## Funding

We would like to acknowledge the help and support of the JUNIPER partnership (MRC grant no MR/X018598/1) which RC and LD and are affiliated with. KTA gratefully acknowledges the financial support of the Engineering and Physical Sciences Research Council (EPSRC) via grant EP/T017856/1. CH was funded by the National Institute for Health Research (NIHR) via an Academic Clinical Fellowship (ACF-2015-25-002). The views expressed are those of the authors. Funding for the AvonCAP study was provided by Pfizer, however, the manuscript development and the analysis that is the subject of this manuscript were conducted independently of Pfizer.

## Declarations

CH is Principal Investigator of the AvonCAP study which is an investigator-led University of Bristol study funded by Pfizer. AF is a member of the Joint Committee on Vaccination and Immunization (JCVI). He receives research funding from Pfizer as Chief Investigator of the AvonCAP study and he leads another project investigating transmission of respiratory bacteria in families jointly funded by Pfizer and the Gates Foundation. RC, AC, GQ, GO, RK, and LD receive research funding from Pfizer via the AvonCAP study.

## Contributions

RC and LD generated the research questions. RC, KT, LD performed the mathematical analysis and simulations, and RC created the supporting software package. LD and AF provided oversight of the research. All authors contributed to the preparation of the manuscript and its revision for publication and had responsibility for the decision to publish.

*   Received December 14, 2023.
*   Revision received December 14, 2023.
*   Accepted December 15, 2023.


*   © 2023, Posted by Cold Spring Harbor Laboratory

This pre-print is available under a Creative Commons License (Attribution-NoDerivs 4.0 International), CC BY-ND 4.0, as described at [http://creativecommons.org/licenses/by-nd/4.0/](http://creativecommons.org/licenses/by-nd/4.0/)

## References

1.  1.Ramanan P, Bryson AL, Binnicker MJ, Pritt BS, Patel R. Syndromic Panel-Based Testing in Clinical Microbiology. Clinical Microbiology Reviews. 2017;31(1). doi:10.1128/cmr.00024-17.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1128/cmr.00024-17&link_type=DOI) 

2.  2.Bonten MJM, Huijts SM, Bolkenbaas M, Webber C, Patterson S, Gault S, et al. Polysaccharide Conjugate Vaccine against Pneumococcal Pneumonia in Adults. New England Journal of Medicine. 2015;372(12):1114–1125. doi:10.1056/NEJMoa1408544.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMoa1408544&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25785969&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F12%2F15%2F2023.12.14.23299860.atom) 

3.  3.Henson SN, Elko EA, Swiderski PM, Liang Y, Engelbrektson AL, Piña A, et al. PepSeq: A Fully in Vitro Platform for Highly Multiplexed Serology Using Customizable DNA-barcoded Peptide Libraries. Nature Protocols. 2023;18(2):396–423. doi:10.1038/s41596-022-00766-8.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41596-022-00766-8&link_type=DOI) 

4.  4.Pride MW, Huijts SM, Wu K, Souza V, Passador S, Tinder C, et al. Validation of an Immunodiagnostic Assay for Detection of 13 Streptococcus Pneumoniae Serotype-Specific Polysaccharides in Human Urine. Clinical and Vaccine Immunology. 2012;19(8):1131–1141. doi:10.1128/CVI.00064-12.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoiY2RsaSI7czo1OiJyZXNpZCI7czo5OiIxOS84LzExMzEiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMy8xMi8xNS8yMDIzLjEyLjE0LjIzMjk5ODYwLmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 

5.  5.Kalina WV, Souza V, Wu K, Giardina P, McKeen A, Jiang Q, et al. Qualification and Clinical Validation of an Immunodiagnostic Assay for Detecting 11 Additional Streptococcus Pneumoniae Serotype-specific Polysaccharides in Human Urine. Clinical Infectious Diseases: An Official Publication of the Infectious Diseases Society of America. 2020;71(9):e430–e438. doi:10.1093/cid/ciaa158.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/cid/ciaa158&link_type=DOI) 

6.  6.Mengelle C, Mansuy JM, Prere MF, Grouteau E, Claudet I, Kamar N, et al. Simultaneous Detection of Gastrointestinal Pathogens with a Multiplex Luminex-based Molecular Assay in Stool Samples from Diarrhoeic Patients. Clinical Microbiology and Infection. 2013;19(10):E458–E465. doi:10.1111/1469-0691.12255.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/1469-0691.12255&link_type=DOI) 

7.  7.Murphy CN, Fowler R, Balada-Llasat JM, Carroll A, Stone H, Akerele O, et al. Multicenter Evaluation of the BioFire FilmArray Pneumonia/Pneumonia Plus Panel for Detection and Quantification of Agents of Lower Respiratory Tract Infection. Journal of Clinical Microbiology. 2020;58(7):10.1128/jcm.00128–20. doi:10.1128/jcm.00128-20.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1128/jcm.00128-20&link_type=DOI) 

8.  8.Jääskeläinen AJ, Piiparinen H, Lappalainen M, Koskiniemi M, Vaheri A. Multiplex-PCR and Oligonucleotide Microarray for Detection of Eight Different Herpesviruses from Clinical Specimens. Journal of Clinical Virology. 2006;37(2):83–90. doi:10.1016/j.jcv.2006.05.010.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.jcv.2006.05.010&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=16872894&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F12%2F15%2F2023.12.14.23299860.atom) 

9.  9.Jansen RR, Schinkel J, Koekkoek S, Pajkrt D, Beld M, de Jong MD, et al. Development and Evaluation of a Four-Tube Real Time Multiplex PCR Assay Covering Fourteen Respiratory Viruses, and Comparison to Its Corresponding Single Target Counterparts. Journal of Clinical Virology. 2011;51(3):179–185. doi:10.1016/j.jcv.2011.04.010.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.jcv.2011.04.010&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21571585&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F12%2F15%2F2023.12.14.23299860.atom) 

10. 10.Gröndahl B, Puppe W, Hoppe A, Kühne I, Weigl JAI, Schmitt HJ. Rapid Identification of Nine Microorganisms Causing Acute Respiratory Tract Infections by Single-Tube Multiplex Reverse Transcription-PCR: Feasibility Study. Journal of Clinical Microbiology. 1999;37(1):1–7. doi:10.1128/jcm.37.1.1-7.1999.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiamNtIjtzOjU6InJlc2lkIjtzOjY6IjM3LzEvMSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIzLzEyLzE1LzIwMjMuMTIuMTQuMjMyOTk4NjAuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 

11. 11.Hendolin PH, Markkanen A, Ylikoski J, Wahlfors JJ. Use of Multiplex PCR for Simultaneous Detection of Four Bacterial Species in Middle Ear Effusions. Journal of Clinical Microbiology. 1997;35(11):2854–2858. doi:10.1128/jcm.35.11.2854-2858.1997.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiamNtIjtzOjU6InJlc2lkIjtzOjEwOiIzNS8xMS8yODU0IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjMvMTIvMTUvMjAyMy4xMi4xNC4yMzI5OTg2MC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

12. 12.Weinstein S, Obuchowski NA, Lieber ML. Clinical Evaluation of Diagnostic Tests. American Journal of Roentgenology. 2005;184(1):14–19. doi:10.2214/ajr.184.1.01840014.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2214/ajr.184.1.01840014&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=15615943&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F12%2F15%2F2023.12.14.23299860.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000226507900005&link_type=ISI) 

13. 13.Shaffer JP. Multiple Hypothesis Testing. Annual Review of Psychology. 1995;46:561–585.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1146/annurev.ps.46.020195.003021&link_type=DOI) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1995QF07700021&link_type=ISI) 

14. 14.Rogan WJ, Gladen B. Estimating Prevalence from the Results of a Screening Test. American Journal of Epidemiology. 1978;107(1):71–76. doi:10.1093/oxfordjournals.aje.a112510.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/oxfordjournals.aje.a112510&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=623091&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F12%2F15%2F2023.12.14.23299860.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1978EJ19400010&link_type=ISI) 

15. 15.Hyams C, Challen R, Hettle D, Amin-Chowdhury Z, Grimes C, Ruffino G, et al. Serotype Distribution and Disease Severity in Adults Hospitalized with Streptococcus pneumoniae Infection, Bristol and Bath, UK, 2006[U+2012]2022. Emerging Infectious Diseases. 2023;29(10). doi:10.3201/eid2910.230519.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.3201/eid2910.230519&link_type=DOI) 

16. 16.Lang Z, Reiczigel J. Confidence Limits for Prevalence of Disease Adjusted for Estimated Sensitivity and Specificity. Preventive Veterinary Medicine. 2014;113(1):13–22. doi:10.1016/j.prevetmed.2013.09.015.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.prevetmed.2013.09.015&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F12%2F15%2F2023.12.14.23299860.atom) 

17. 17.Thomas A, Shaheen NA, Hussein MA. An Efficient Confidence Interval Estimation for Prevalence Calculated from Misclassified Data. Biostatistics & Epidemiology. 2022;():1–17. doi:10.1080/24709360.2022.2076530.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1080/24709360.2022.2076530&link_type=DOI) 

18. 18.Flor M, Weiß M, Selhorst T, Müller-Graf C, Greiner M. Comparison of Bayesian and Frequentist Methods for Prevalence Estimation under Misclassification. BMC Public Health. 2020;20(1):1135. doi:10.1186/s12889-020-09177-4.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s12889-020-09177-4&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F12%2F15%2F2023.12.14.23299860.atom) 

19. 19.Gelman A, Carpenter B. Bayesian Analysis of Tests with Unknown Specificity and Sensitivity. Journal of the Royal Statistical Society Series C: Applied Statistics. 2020;69(5):1269–1283. doi:10.1111/rssc.12435.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/rssc.12435&link_type=DOI) 

20. 20.Diggle PJ. Estimating Prevalence Using an Imperfect Test. Epidemiology Research International. 2011;2011:e608719. doi:10.1155/2011/608719.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1155/2011/608719&link_type=DOI) 

21. 21.Loeffelholz MJ, Pong DL, Pyles RB, Xiong Y, Miller AL, Bufton KK, et al. Comparison of the FilmArray Respiratory Panel and Prodesse Real-Time PCR Assays for Detection of Respiratory Pathogens. Journal of Clinical Microbiology. 2020;49(12):4083–4088. doi:10.1128/jcm.05010-11.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1128/jcm.05010-11&link_type=DOI) 

22. 22.Leber AL, Everhart K, Daly JA, Hopper A, Harrington A, Schreckenberger P, et al. Multicenter Evaluation of BioFire FilmArray Respiratory Panel 2 for Detection of Viruses and Bacteria in Nasopharyngeal Swab Samples. Journal of Clinical Microbiology. 2018;56(6):10.1128/jcm.01945–17. doi:10.1128/jcm.01945-17.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1128/jcm.01945-17&link_type=DOI) 

23. 23.Chang YC, Hsiao CT, Chen WL, Su YD, Hsueh PR. BioFire FilmArray Respiratory Panel RP2.1 for SARS-CoV-2 Detection: The Pitfalls. Journal of Infection. 2022;85(5):e149–e151. doi:10.1016/j.jinf.2022.07.030.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.jinf.2022.07.030&link_type=DOI) 

24. 24.Popowitch EB, O’Neill SS, Miller MB. Comparison of the Biofire FilmArray RP, Genmark eSensor RVP, Luminex xTAG RVPv1, and Luminex xTAG RVP Fast Multiplex Assays for Detection of Respiratory Viruses. Journal of Clinical Microbiology. 2020;51(5):1528–1533. doi:10.1128/jcm.03368-12.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1128/jcm.03368-12&link_type=DOI) 

25. 25.Babady NE. The FilmArray® Respiratory Panel: An Automated, Broadly Multiplexed Molecular Test for the Rapid and Accurate Detection of Respiratory Pathogens. Expert Review of Molecular Diagnostics. 2013;13(8):779–788. doi:10.1586/14737159.2013.848794.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1586/14737159.2013.848794&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24151847&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F12%2F15%2F2023.12.14.23299860.atom) 

26. 26.Chan M, Koo SH, Jiang B, Lim PQ, Tan TY. Comparison of the Biofire FilmArray Respiratory Panel, Seegene AnyplexII RV16, and Argene for the Detection of Respiratory Viruses. Journal of Clinical Virology. 2018;106:13–17. doi:10.1016/j.jcv.2018.07.002.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.jcv.2018.07.002&link_type=DOI) 

27. 27.Haile SR. Bias in (Sero)Prevalence Estimates; 2022.
    
    
28. 28.Endo A, Funk S, Kucharski AJ. Bias Correction Methods for Test-Negative Designs in the Presence of Misclassification. Epidemiology and Infection. 2020;148:e216. doi:10.1017/S0950268820002058.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1017/S0950268820002058&link_type=DOI)

 [1]: /embed/graphic-3.gif
 [2]: /embed/inline-graphic-1.gif
 [3]: /embed/graphic-4.gif
 [4]: /embed/inline-graphic-2.gif
 [5]: /embed/graphic-5.gif
 [6]: /embed/graphic-6.gif