Improving estimates of pertussis burden in Ontario, Canada 2010-2017 by combining validation and capture-recapture methodologies ================================================================================================================================ * Shilo H. McBurney * Jeffrey C. Kwong * Kevin A. Brown * Frank Rudzicz * Andrew Wilton * Natasha S. Crowcroft ## Abstract An underestimation of pertussis burden has impeded understanding of transmission and disallows effective policy and prevention to be prioritized and enacted. Capture-recapture analyses can improve burden estimates; however, uncertainty remains around incorporating health administrative data due to accuracy limitations. The aim of this study is to explore the impact of pertussis case definitions and data accuracy on capture-recapture estimates. We used a dataset from March 7, 2010 to December 31, 2017 comprised of pertussis case report, laboratory, and health administrative data. We compared Chao capture-recapture abundance estimates using prevalence, incidence, and adjusted false positive case definitions. The latter was developed by removing the proportion of false positive physician billing code-only case episodes after validation. We calculated sensitivity by dividing the number of observed cases by abundance. Abundance estimates demonstrated that a high proportion of cases were missed by all sources. Under the primary analysis, the highest sensitivity of 78.5% (95% CI 76.2-80.9%) for those less than one year of age was obtained using all sources after adjusting for false positives, which dropped to 43.1% (95% CI 42.4-43.8%) for those one year of age or older. Most code-only episodes were false positives (91.0%), leading to considerably lower abundance estimates and improvements in laboratory testing and case report sensitivity using this definition. Accuracy limitations can be accounted for in capture-recapture analyses using different case definitions and adjustment. The latter enhanced the validity of estimates, furthering the utility of capture-recapture methods to epidemiological research. Findings demonstrated that all sources consistently fail to detect pertussis cases. This is differential by age, suggesting ascertainment and testing bias. Results demonstrate the value of incorporating real time health administrative data into public health surveillance if accuracy limitations can be addressed. ## Introduction Pertussis remains one of the most common vaccine-preventable diseases in Canada [1]. Despite being a reportable disease, an underestimation of cases and deaths has impeded understanding of transmission. Ascertainment bias is a key concern, which occurs when atypical cases are underdiagnosed including older individuals experiencing mild disease [2, 3]. This issue is worsened by testing bias, with younger, severe cases more likely to be tested, have a positive result, and be reported [4, 5]. Burden estimates vary regionally based on interactions between case definitions, the type of surveillance and data available, practitioner knowledge, immunization programs, and the extent of local transmission [4, 6]. Underestimation may be attributed to failing to consider pertussis diagnostically, atypical presentations, infrequent diagnostic testing, suboptimal test accuracy, lack of uniformity in case definitions, and reporting issues [2, 5]. In combination with complicated epidemiological characteristics, pertussis surveillance is consequently challenging [4, 7]. However, monitoring burden is essential for informing and assessing the impact of immunization programs and policy [4, 6-9]. To improve pertussis burden estimates, one strategy is to supplement surveillance data with health administrative data [10]. When several sources are available, capture-recapture analyses can be used to better estimate burden [11]. This analytic approach has been recently used to assess completeness of contact-tracing for Ebola and detection of Covid-19 infections [12, 13]. For pertussis, capture-recapture has been used to estimate the number of deaths in England and the number of cases in Ontario [10, 14]. The latter estimated that 21-73% of cases have been missed using combined surveillance, laboratory, and health administrative data [10]. However, considerable uncertainty in estimation remained around the validity of using the latter, and particularly Ontario Health Insurance Plan (OHIP) physician billing diagnostic codes. The aim of this study is to explore the impact of using different pertussis case definitions and adjusting for data accuracy on capture-recapture results, with the goal of enhancing the utility of this method for improving burden estimates to inform public health surveillance, prevention, and policy. ## Methods The University of Toronto’s Health Sciences Research Ethics Board (37885) and the Public Health Ontario (PHO) Ethics Review Board (2019-006.02) approved this study. Data were linked and analyzed at ICES (formerly the Institute of Clinical Evaluative Sciences) using unique encoded identifiers. ### Data sources We obtained Public Health Information System (iPHIS) and PHO Laboratory Information System (Labware) data from a PHO linked dataset previously used to improve estimates of pertussis burden in Ontario [10]. iPHIS data contained confirmed, probable, and “does not meet” (DNM) pertussis case reports. We considered reports greater than 365 days apart a new case, with data available from April 1, 2006 to March 31, 2015. When duplicates occurred in the same year, we gave priority to the highest level of confirmation. Labware data included positive, indeterminate, and negative pertussis laboratory tests, with cases defined as at least one positive result by PCR or culture. We counted positive results more than 90 days apart as a new case, and data were available from December 7, 2009 to March 31, 2015. We updated the PHO dataset until March 31, 2018 and combined it with health administrative data from December 1, 2009 from three databases held at ICES: the Canadian Institute for Health Information (CIHI) Discharge Abstract Database (DAD); the CIHI National Ambulatory Care Reporting System (NACRS); and the Ontario Health Insurance Plan (OHIP) Claims Database (Fig 1). Collected data included ICD-10 codes A37.0 (whooping cough, *Bordetella pertussis)* and A37.9 (whooping cough, unspecified species) from hospitalizations and emergency room visits and OHIP diagnostic billing code 033 (whooping cough, *Bordetella pertussis*). We restricted OHIP claims to billings from homes, offices, and long-term care facilities. The Registered Persons Database (RPDB) was used to obtain data on patient sex, age, and date of death. We excluded health administrative data entries with same-day immunizations as they are unlikely to reflect a true pertussis case (Fig 1). We excluded all entries with no index date or a date of death prior to the index date, as well as participants with an invalid unique identifier or who were missing their sex or birth date (Fig 1). ![Fig 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/08/07/2022.08.05.22278478/F1.medium.gif) [Fig 1.](http://medrxiv.org/content/early/2022/08/07/2022.08.05.22278478/F1) Fig 1. Data flow chart. ϮOHIP = physician diagnostic billing codes, DAD = hospitalizations, NACRS = emergency room visits, iPHIS = reportable cases, Labware = laboratory tests, *using pertussis-containing immunization codes G840, G841, and G847 and general immunization codes G538 and G539. ### Case definitions and sensitivity analyses We developed three case definitions to separately input into capture-recapture models – period prevalence, incidence with exclusions, and false-positive adjusted (Fig 2). Period prevalence permitted an individual to have one entry per data source over the study period. We did not apply a time limit to recapture in other sources, leading to the best recapture scenario. For incidence with exclusions, we incorporated data entries into episodes using 90-day rules and ruled out administrative data-only episodes with a negative pertussis laboratory test within 28 days of the episode start (Fig 2 and S1 Fig). This structure required recapture in other sources to occur within the 90 days before or after the respective episode. Finally, for the last definition we eliminated the proportion of false positive OHIP code-only episodes from the incidence with exclusions case definition. To do so, we validated these episodes to obtain a positive predictive value (PPV) and took a random sample based on the estimated proportion of true positives. We conducted validation using a previously developed methodology and cohort from the Electronic Medical Record Primary Care (EMRPC) database (S1 Appendix) [15, 16]. The PPV was estimated as 8.99%, leading to the removal of 41,062 OHIP-code only entries for the primary analysis (Fig 2). ![Fig 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/08/07/2022.08.05.22278478/F2.medium.gif) [Fig 2.](http://medrxiv.org/content/early/2022/08/07/2022.08.05.22278478/F2) Fig 2. Flow diagram of case definitions. *Admin= OHIP physician diagnostic billing codes, DAD hospitalizations, NACRS emergency room visits, iPHIS = reportable cases, Labware = laboratory tests, Ϯstratified by age groups, < 1 year of age and 1+ years of age. To ensure that failure to recapture was not due to data missingness, we applied a study window based on data availability. We removed cases with a first date before March 7, 2010 and last date after December 31, 2017 to give a buffer of 90 days at each end to ensure an accurate episode start and end date, which we defined as the earliest and latest date for an episode in any data source. Finally, we applied three sensitivity analyses to the three data structures to explore further uncertainty in case definitions. This included incorporating iPHIS probable cases, iPHIS probable and DNM cases in addition to indeterminate laboratory tests, and excluding A37.9 codes with pertussis species unspecified. For the second sensitivity analysis, we ruled out episodes with a negative pertussis laboratory test within 28 days of the episode start if the episode only included administrative data or DNM entries. ### Capture-recapture analyses Capture-recapture estimates abundance using the cases identified in each source and their overlap to calculate the number missed by all [17]. We assumed the population was closed and that temporality was present [10, 18]. We combined all administrative data into a single source at the outset, assuming and thereby accounting for dependency [11]. We assessed other dependencies by calculating the probability of being captured in one source given being in another [10]. Additionally, we evaluated random detection using Pearson’s chi-squared tests of the observed versus expected number of cases in a pair of sources [10]. Both were assessed under the prevalence structure to ensure the assumption of independence for these tests was not violated. We used these results to select a dependency structure in combination with theoretical considerations [19]. We evaluated heterogeneity by assessing the linearity of heterogeneity graphs [18]. We used models that accounted for a closed population, temporality, and heterogeneity if determined to be present. The latter was expected as milder, older cases of pertussis are less likely to be tested, have a positive result, and be reported to surveillance [4, 5]. We explicitly built the selected dependency structure into models by including two-way interaction coefficients for sources hypothesized to be dependent [11]. We chose Chao’s lower bound estimator for total sample size for the capture-recapture models, which additionally accounts for dependency [11]. We compared model estimates to those from Darroch models, which further correct for heterogeneity [20]. We rounded estimates of abundance to the nearest whole number. We considered results statistically significant at alpha ≤ 0.05 and we used the Rcapture package in R [18, 21]. ### Estimated sensitivity We calculated sensitivity by dividing the number of cases identified by each data source by the estimated abundance [10]. For the incidence and adjusted false positive case definitions, there was concern that correlation (clustering) would arise from having multiple cases for some individuals, impacting the sensitivity point estimate and variance [22, 23]. To address this, sensitivity was additionally calculated under both definitions by selecting a random episode per person to remove the effect of clustering. We then compared these results to those including multiple episodes, with little difference used as evidence that estimates were robust. ## Results ### Capture-recapture results Under the prevalence case definition, all sources were dependent based on pair-wise probabilities and Pearson’s chi-squared test was highly significant (p < 0.00001), suggesting non-random detection. This provided support for the theorized dependency structure, and we used all two-way interaction terms to account for dependency between each source pair. To ensure comparability between case definitions, we used this structure for all capture-recapture models. A visual depiction of dependency is presented in Fig 3 using the degree of overlap between data sources under prevalence. Heterogeneity graphs lacked linearity, indicating heterogeneity was present and had to be accounted for in modelling (S2 Fig). ![Fig 3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/08/07/2022.08.05.22278478/F3.medium.gif) [Fig 3.](http://medrxiv.org/content/early/2022/08/07/2022.08.05.22278478/F3) Fig 3. Example of overlap between data sourcesϮ, all age groups combined under the period prevalence primary analysis. ϮAdministrative data = OHIP physician diagnostic billing codes, DAD hospitalizations, NACRS emergency room visits, iPHIS = reportable cases, Labware = laboratory tests. Estimated abundance was similar for those less than one year of age under the definitions for period prevalence and incidence with exclusions, at 3810 (95% CI 2932-5707) and 3078 (95% CI 2362-4609) respectively (Table 1). Abundance was considerably lower with less variability, as measured by the width of the 95% CIs, for the adjusted false positive definition at 1151 (95% CI 964-1538). Overall, the one year or older age group had more variable estimates. Abundance for this age group was again similar under prevalence and incidence, at 114,135 (95% CI 87,228-155,391) and 132,528 (95% CI 100,384-181,775). The false positive definition had substantially lower estimates and variability at 20,490 (95% CI 15,998-27,319). Darroch models produced identical abundance estimates in scenarios with more than one two-way interaction term. View this table: [Table 1.](http://medrxiv.org/content/early/2022/08/07/2022.08.05.22278478/T1) Table 1. Capture-recapture model results with all two-way interactions between sources by analysis, age group, and case definition. ### Sensitivity analyses After the addition of probable iPHIS cases, there was little change to estimated abundance (Table 5.1). Including iPHIS probable and DNM cases and indeterminate laboratory tests produced the largest increase in abundance and introduced a considerable amount of variability. While this pattern occurred under all case definitions, the highest estimate of 252,872 (95% CI 209,058-308,891) was obtained under incidence for those one year of age or older. After removing A37.9 codes, abundance estimates increased and decreased without a reliable pattern. ### Estimated sensitivity Sensitivity estimates are available in Table 2. Results were comparable and generally within a few percentage points with and without multiple episodes per individual (S1 Table). As a result, clustering was determined to have a minimal effect and we reported sensitivity estimates that included multiple episodes per individual. Using all data sources consistently provided the highest sensitivity (Table 2). Labware had the lowest sensitivity while administrative data had the highest for a single source. Under the primary analysis and prevalence definition, the highest sensitivity for those less than one year of age was 69.2% (95% CI 67.7-70.7%), which dropped to 40.6% (95% CI 40.3-40.8%) for the older age group. Incidence sensitivity estimates were similar but slightly lower in comparison, excepting marginally higher estimates for iPHIS and Labware for the younger age group. Overall, using the adjusted false positive definition increased sensitivity, with all sources under the primary analysis producing a sensitivity of 78.5% (95% CI 76.2-80.9%) and 43.1% (95% CI 42.4-43.8%) for the younger and older age groups respectively. However, the largest increase to sensitivity occurred for iPHIS and Labware estimates, although sensitivity for these sources remained low for the older age group. View this table: [Table 2.](http://medrxiv.org/content/early/2022/08/07/2022.08.05.22278478/T2) Table 2. Estimated sensitivity by case definition, age group, analysis, and data source. Any change to sensitivity estimates after the addition of probable iPHIS cases was small, within 5%. After including iPHIS probable and DNM cases and indeterminate laboratory tests, the sensitivity estimates for all sources combined were more similar between age groups. Under prevalence, sensitivity was 29.7% (95% CI 28.7-30.6%) and 21.0% (95% CI 20.8-21.2%) for the younger and older groups respectively. Excluding A37.9 codes under the adjusted false positive definition led to the highest Labware and iPHIS sensitivity estimates for the younger age group, at close to 50% and 60%. ## Discussion Abundance estimates demonstrated that a high proportion of pertussis cases were missed by all sources. Results were similar when using prevalence and incidence, but after adjusting for physician billing code-only false positives abundance dropped considerably. While this occurred for both age groups, the effect was greater among those one year of age or older. The low estimated PPV of physician billing code-only episodes provides evidence that false positives have been inflating observed counts and abundance estimates, which decreased sensitivity for laboratory and case report data. As a result, the adjusted case definition is the most valid out of those tested, with the described approach useful for improving the utility of capture-recapture methods to epidemiological surveillance. Regardless of the case definition, health administrative data had the highest sensitivity for a single source, with all sources combined producing the best sensitivity. This establishes the value of incorporating health administrative data into pertussis surveillance if accuracy and timeliness limitations are addressed. Laboratory tests had the lowest sensitivity and particularly for the older age group, indicating testing bias is present. Public health case reports had slightly higher sensitivity but displayed a similar pattern. While sensitivity estimates improved for both after adjusting for false positives, this was primarily in the younger age group. Overall, sensitivity was substantially lower for the older group, suggesting ascertainment bias is present. A 2018 Ontario capture-recapture study used the same data sources but over fewer years. As a result, abundance is not directly comparable as the higher estimates from this study are expected due to the extra years of data. However, sensitivity for all sources with probable case reports included was estimated at 54% and 39% for the younger and older age groups respectively. This is comparable to sensitivity estimates for the older age group in this study after including iPHIS probable cases, but sensitivity was higher for the younger group (∼66%). This could be due to improved recapture in this subgroup using the case definitions or differences in how the 2018 study modelled dependencies [10]. The 2018 study reported considerable uncertainty persisting around physician billing code accuracy and investigated by using different proportions of true positives for all health administrative data, with the lowest at 25%. This dropped abundance estimates by 66% and 73% for the younger and older age groups [10]. We were able to address remaining uncertainty through validation of physician billing code-only episodes. Interestingly, applying the estimated PPV of 8.99% to these episodes reduced abundance estimates similarly to assuming a PPV of 25% for all health administrative data in the 2018 study, by 64% and 84%. The higher latter value indicates a greater proportion of code only-episodes in the older age group, leading to enhanced improvement in recapture once removed. Laboratory test and case report sensitivity considerably improved using this case definition. A modelling study based on pertussis incidence in southern Ontario from 1993-2004 estimated that five to 33,032 cases remain undetected per reported case depending on age [3]. It has been stated elsewhere that the true number of pertussis cases is at least three times higher than what is reported [2]. To compare, estimates from this study should be calculated using the false positive case definition to avoid inflating underdetection. For case report data, these values are 2.5 and 8.9 for the younger and older age groups under the primary analysis and 4.7 and 12 after including probable and DNM case reports and indeterminate laboratory tests. Using all data sources, 1.3 and 2.3 cases were missed per observed case for the younger and older age groups, which increased to 2.4 and 3.8 with the additional case reports and laboratory tests. The substantially lower upper limit compared to the modelling study is likely due to differences in age groups, pertussis incidence during the respective time periods, and diagnostic testing accuracy, with PCR introduced after 2004. In addition, the estimation approaches differ, with the modelling study using methods with considerable uncertainty as reflected by the wide range of values. While this study used the more conservative Chao’s lower bound estimator, similar results were obtained from Darroch models. Furthermore, a simulation study found that Chao’s methods estimated abundance within 75-82% of the total population size in most complicated scenarios [11]. Regardless, this study’s findings are in line with past estimates of underdetection, with reasonable explanations for remaining differences. The estimated PPV for physician billing code-only episodes of 8.99% (95% CI 1.59-16.39%) is comparable to the PPV of 13.6% (95% CI 9.28-17.9%) obtained for an OHIP physician billing code algorithm within the EMRPC using the same cohort [16]. The slightly higher PPV in the EMRPC can be explained by using prevalent cases, increasing the likelihood of concordance between codes and cases. Additionally, EMRPC billing codes were only collected from physician offices, potentially decreasing the number of false positives. OHIP pertussis cases billed at homes or long-term care facilities are unlikely to be documented in EMRPC’s primary care patient records. Further contributing to this issue is that visits outside office settings such as walk-in or specialist visits fail to be captured in the EMRPC, leading to about 15% of interactions being missed [24]. In addition, only two thirds of the laboratory tests in OHIP are documented in the EMRPC [24]. Missing any of these data in the EMRPC could artificially decrease the PPV of OHIP billing code-only episodes by increasing the number of false positives, although billings from outside office settings were uncommon in the OHIP database and unlikely to greatly affect validation. The EMRPC study only tested the accuracy of data available in the EMRPC, meaning sensitivity estimates for emergency room visits, hospitalizations, and case report data are not available. However, pertussis laboratory test sensitivity using prevalent cases was reported as 0.64% (95% CI 0.37-1.09%) across all ages [16]. After adjusting for false positives, 8.0% sensitivity (95% CI 7.62-8.38%) was obtained for the older age group (which only excludes infants) using prevalence. This difference can be explained by variation in ages, pertussis classification, and validation methods. Additionally, Labware has more comprehensive coverage, with the EMRPC noted to have decreased laboratory test sensitivity through incomplete documentation [16]. One limitation of this study is that we did not validate physician billing code-only episodes separately for the younger and older age groups. This was to preserve an adequate sample size, with the assumption that the PPV averages out across age groups and this is an appropriate strategy for taking a random sample based on a proportion. In addition, while demonstrating that sensitivity estimates differ by age group, it is unlikely that PPV would vary to the same extent. PPV evaluates the proportion of true positives out of test positives and is primarily affected by prevalence, not testing bias [25]. Although it may appear that younger individuals have a higher risk of pertussis infection, due to testing and ascertainment bias this may not actually be the case [4]. Older cases are hypothesized to be an important source of pertussis, which is evident through older relatives being key sources of transmission to infants [4, 26]. Additionally, in 2019, 62% of reported cases in Ontario occurred in those ten years of age or older [27]. However, it may be of interest to allow the PPV to vary separately by age group in future analyses. An additional limitation is that we had to remove EMRPC cases without dates during validation. We considered it preferable to avoid introducing misclassification rather than preserving the sample size, and there is no reason to suspect excluded cases are systematically different from the majority of those included. A further limitation of validation is that we were unable to adjust the PPV and sensitivity estimates for clustering under the incidence and false positive definitions due to the low sample size and study methodology respectively [22, 23]. To address this, we compared point estimates and variances to those using a single episode per individual to assess the effect of correlation, with little difference found between PPV estimates. While some sensitivity estimates were statistically significantly different, the absolute difference was small, and this was for the older age group where large sample sizes produced substantial precision. As a result, we concluded that these differences were unlikely to be clinically significant. While it is possible correlation still minorly affected the findings, it is a study strength to be able to report sensitivity estimates under different case definitions. Little validation research has incorporated repeated episodes, which is of interest for acute diseases. Finally, missing data may impact abundance and sensitivity estimates, but this is an inherent limitation of using secondary data and beyond the scope of this study. This includes failing to collect data on certain cases, such as those that are milder in nature who do not seek health care or may be misdiagnosed. While pertussis infectivity is related to severity, meaning that milder undetected cases are likely less infectious, it is possible that many are still important to transmission. While Labware data only covers pertussis laboratory testing for < 95% of Ontario, we assumed that the remaining tests would not considerably impact abundance estimates [10, 28]. To assess the effect of missing data, we conducted sensitivity analyses incorporating additional data to determine the robustness of capture-recapture abundance estimates. This produced little change to results, except when including probable and DNM cases in addition to indeterminate laboratory tests. Doing so decreased sensitivity and led to greater similarity in capture patterns between the younger and older age groups. This could suggest that older individuals are less likely to meet the confirmed case definition, or that milder infections are frequently missed in younger as well as older individuals. Alternatively, it could be due to increased uncertainty in abundance among both groups under this analysis, stemming from greater uncertainty in true pertussis status. ## Conclusions This study demonstrated how limited health data accuracy can be accounted for using capture-recapture analyses that employ different pertussis case definitions. The false-positive adjusted case definition helped address past uncertainty in burden estimation and produced results which align with the degree of underdetection reported in the literature; improved capture-recapture estimates can better inform public health policy and prevention. Findings consistently demonstrated that data sources are failing to detect pertussis cases, and particularly laboratory and case report data. The best sensitivity was obtained by using all sources together, with health administrative data having the highest sensitivity for a single source. This indicates the benefit of incorporating real time health administrative data into surveillance if misclassification can be addressed. The results provide further support that pertussis detection differs by age, indicating that ascertainment and testing bias is present in data. ## Data Availability The datasets from this study are held securely in coded form at ICES. While data sharing agreements with data providers prohibit ICES from making these data publicly available due to privacy concerns and risk of re-identification, access may be granted to those who meet pre-specified criteria for confidential access, available at [www.ices.on.ca/DAS](http://www.ices.on.ca/DAS). ## Supporting Information **S1 Fig. Rules for developing incident case episodes and establishing data re-capture** **S2 Fig. Heterogeneity graphs from the primary analysis by case definition and age group** **S1 Table. Estimated sensitivity by age group, analysis, and data source for incidence and adjusted false positive case definitions using a single random episode per person** **S1 Appendix. Validation of Ontario Health Insurance Plan (OHIP) billing diagnostic code-only case episodes**. ## Acknowledgements The authors acknowledge the assistance of Branson Chen and John Wang with data preparation and Arezou Saedi and Mohammad Ali Moinshaghaghi with record abstraction. * Received August 5, 2022. * Revision received August 5, 2022. * Accepted August 7, 2022. * © 2022, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution 4.0 International), CC BY 4.0, as described at [http://creativecommons.org/licenses/by/4.0/](http://creativecommons.org/licenses/by/4.0/) ## References 1. 1.Public Health Agency of Canada. Reported cases from 1924 to 2019 in Canada – Notifiable diseases on-line; [updated 2021 July 20; cited 2022 March 3]. Database: Government of Canada [Internet]. Available from: [http://diseases.canada.ca/notifiable/charts?c=pl](http://diseases.canada.ca/notifiable/charts?c=pl). 2. 2.Kilgore PE, Salim AM, Zervos MJ, Schmitt H-J. Pertussis: Microbiology, disease, treatment, and prevention. Clin Microbiol Rev. 2016;29(3): 449–486. doi: 10.1128/CMR.00083-15. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiY21yIjtzOjU6InJlc2lkIjtzOjg6IjI5LzMvNDQ5IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjIvMDgvMDcvMjAyMi4wOC4wNS4yMjI3ODQ3OC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 3. 3.McGirr AA, Tuite AR, Fisman DN. Estimation of the underlying burden of pertussis in adolescents and adults in Southern Ontario, Canada. PLoS One. 2013;8(12): e83850. doi: 10.1371/journal.pone.0083850. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pone.0083850&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24376767&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F08%2F07%2F2022.08.05.22278478.atom) 4. 4.1. Rohani P, 2. Scarpino S Crowcroft N, Miller E. Pertussis epidemiology. In: Rohani P, Scarpino S, editors. Pertussis: Epidemiology, immunology, and evolution. Oxford, UK: Oxford University Press; 2019. pp. 66–86. 5. 5.Deeks S, De Serres G, Boulianne N, Duval B, Rochette L, Dery P, et al. Failure of physicians to consider the diagnosis of pertussis in children. Clin Infect Dis. 1999;28(4): 840–846. doi: 10.1086/515203. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1086/515203&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=10825048&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F08%2F07%2F2022.08.05.22278478.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000079611600028&link_type=ISI) 6. 6.1. Rohani P, 2. Scarpino S, editors Bolotin S, Quinn H, McIntyre P. Surveillance and diagnostics. In: Rohani P, Scarpino S, editors. Pertussis: Epidemiology, immunology, and evolution. Oxford, UK: Oxford University Press; 2019. pp. 193–210. 7. 7.World Health Organization. Pertussis: Vaccine-Preventable Diseases Surveillance Standards. [Internet]. WHO; 2018. Available from: [https://www.who.int/publications/m/item/vaccine-preventable-diseases-surveillance-standards-pertussis](https://www.who.int/publications/m/item/vaccine-preventable-diseases-surveillance-standards-pertussis). 8. 8.Barkoff AM, Grondahl-Yli-Hannuksela K, He Q. Seroprevalence studies of pertussis: What have we learned from different immunized populations. Pathog Dis. 2015;73(7): ftv050. doi: 10.1093/femspd/ftv050. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/femspd/ftv050&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26208655&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F08%2F07%2F2022.08.05.22278478.atom) 9. 9.Crowcroft NS, Stein C, Duclos P, Birmingham M. How best to estimate the global burden of pertussis? Lancet Infect Dis. 2003;3(7): 413–418. doi: 10.1016/s1473-3099(03)00669-8. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S1473-3099(03)00669-8&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=12837346&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F08%2F07%2F2022.08.05.22278478.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000183895000022&link_type=ISI) 10. 10.Crowcroft NS, Johnson C, Chen C, Li Y, Marchand-Austin A, Bolotin S, et al. Under-reporting of pertussis in Ontario: A Canadian Immunization Research Network (CIRN) study using capture-recapture. PLoS One. 2018;13(5): e0195984. doi: 10.1371/journal.pone.0195984. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pone.0195984&link_type=DOI) 11. 11.Braeye T, Verheagen J, Mignon A, Flipse W, Pierard D, Huygen K, et al. Capture-recapture estimators in epidemiology with applications to pertussis and pneumococcal invasive disease surveillance. PLoS One. 2016;11(8): e0159832. doi: 10.1371/journal.pone.0159832. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pone.0159832&link_type=DOI) 12. 12.Böhning D, Rocchetti I, Maruotti A, Holling H. Estimating the undetected infections in the Covid-19 outbreak by harnessing capture-recapture methods. Int J Infect Dis. 2020;97: 197–201. doi: 10.1016/j.ijid.2020.06.009. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ijid.2020.06.009&link_type=DOI) 13. 13.Polonsky JA, Böhning D, Keita M, Ahuka-Mundeke S, Nsio-Mbeta J, Abedi AA, et al. Novel use of capture-recapture methods to estimate completeness of contact tracing during an Ebola outbreak, Democratic Republic of the Congo, 2018-2020. Emerg Infect Dis. 2021;27(12): 3063–3072. doi: 10.3201/eid2712.204958. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.3201/eid2712.204958&link_type=DOI) 14. 14.Crowcroft NS, Andrews N, Rooney C, Brisson M, Miller E. Deaths from pertussis are underestimated in England. Arch Dis Child. 2002;86(5): 336–338. doi: 10.1136/adc.86.5.336. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MTI6ImFyY2hkaXNjaGlsZCI7czo1OiJyZXNpZCI7czo4OiI4Ni81LzMzNiI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIyLzA4LzA3LzIwMjIuMDguMDUuMjIyNzg0NzguYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 15. 15.McBurney SH, Kwong JC, Brown KA, Rudzicz F, Chen B, Candido E, et al. Using electronic medical records to develop a reference standard for low prevalence disease validation studies: A pertussis case study. SSRN [Preprint]. 2022. Available from: [https://ssrn.com/abstract=4148223](https://ssrn.com/abstract=4148223). doi: 10.2139/ssrn.4148223. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2139/ssrn.4148223&link_type=DOI) 16. 16.1. McBurney SH. McBurney SH, Kwong JC, Brown KA, Rudzicz F, Chen B, Candido E, et al. Validating pertussis data measures using electronic medical record data in Ontario, Canada 1986-2016. In: McBurney SH. The problem with pertussis: Finding undetected pertussis cases in Electronic Medical Record Primary Care to improve data accuracy and burden estimates. PhD. Thesis, The University of Toronto. 2022. Available from: [https://tspace.library.utoronto.ca/handle/1807/9945](https://tspace.library.utoronto.ca/handle/1807/9945). 17. 17.1. Rothman KJ, 2. Greenland S, 3. Lash TL, editors Buehler JW. Surveillance. In: Rothman KJ, Greenland S, Lash TL, editors. Modern epidemiology. 3rd ed. Philadelphia: Wolters Kluwer Health/Lippincott Williams & Wilkins; 2008. pp. 459–480. 18. 18.Baillargeon S, Rivest L-P. Rcapture: Loglinear models for capture-recapture in R. J Stat Softw. 2007;19(5): 1–31. doi: 10.18637/jss.v019.i05. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.18637/jss.v019.i05&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21494410&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F08%2F07%2F2022.08.05.22278478.atom) 19. 19.Jones HE, Hickman M, Welton NJ, De Angelis D, Harris RJ, Ades AE. Recapture or precapture? Fallibility of standard capture-recapture methods in the presence of referrals between sources. Am J Epidemiol. 2014;179(11): 1383–1393. doi: 10.1093/aje/kwu056. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/aje/kwu056&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24727806&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F08%2F07%2F2022.08.05.22278478.atom) 20. 20.Rivest L-P, Baillargeon S. Applications and extensions of Chao’s moment estimator for the size of a closed population. Biometrics. 2007;63(4): 999–1006. doi: 10.1111/j.1541-0420.2007.00779.x. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/j.1541-0420.2007.00779.x&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=17425635&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F08%2F07%2F2022.08.05.22278478.atom) 21. 21.R-Core-Team. R: A language and environment for statistical computing. Vienna (AT): R Foundation for Statistical Computing; 2013. 22. 22.Ying GS, Maguire MG, Glynn RJ, Rosner B. Calculating sensitivity, specificity, and predictive values for correlated eye data. Investig Ophthalmol Vis Sci. 2020;61(11): 29. doi: 10.1167/iovs.61.11.29. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1167/iovs.61.13.29&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F08%2F07%2F2022.08.05.22278478.atom) 23. 23.Genders TS, Spronk S, Stijnen T, Steyerberg EW, Lesaffre E, Hunink MG. Methods for calculating sensitivity and specificity of clustered data: A tutorial. Radiology. 2012;265(3): 910–916. doi: 10.1148/radiol.12120509. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1148/radiol.12120509&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23093680&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F08%2F07%2F2022.08.05.22278478.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000311420300029&link_type=ISI) 24. 24.Tu K, Mitiku T, Ivers NM, Guo H, Lu H, Jaakkimainen L. Evaluation of Electronic Medical Record Administrative data Linked Database (EMRALD). Am J Manag Care. 2014;20: e15–21. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24669409&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F08%2F07%2F2022.08.05.22278478.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000330599000010&link_type=ISI) 25. 25.Molinaro AM. Diagnostic tests: How to estimate the positive predictive value. Neurooncol Pract. 2015;2(4): 162–166. doi: 10.1093/nop/npv030. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/nop/npv030&link_type=DOI) 26. 26.Ministry of Health and Long-Term Care. Infectious Diseases Protocol - Appendix A: Disease-Specific Chapters, Pertussis (Whooping Cough). Government of Ontario; 2019. 27. 27.Ontario Agency for Health Protection and Promotion. Infectious Disease Trends in Ontario; 2021 [cited 2022 March 3]. Database: Public Health Ontario [Internet]. Available from: [https://www.publichealthontario.ca/en/data-and-analysis/infectious-disease/reportable-disease-trends-annually#/42](https://www.publichealthontario.ca/en/data-and-analysis/infectious-disease/reportable-disease-trends-annually#/42). 28. 28.Schwartz KL, Kwong JC, Deeks SL, Campitelli MA, Jamieson FB, Marchand-Austin A, et al. Effectiveness of pertussis vaccination and duration of immunity. Can Med Assoc J. 2016;188(16): E399–e406. doi: 10.1503/cmaj.160193. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoiY21haiI7czo1OiJyZXNpZCI7czoxMToiMTg4LzE2L0UzOTkiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMi8wOC8wNy8yMDIyLjA4LjA1LjIyMjc4NDc4LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==)