Abstract
Risk of COVID-19 infection in Wuhan has been estimated using imported case counts of international travelers, often under the assumption that all cases in travelers are ascertained. Recent work indicates variation among countries in detection capacity for imported cases. Singapore has historically had very strong epidemiological surveillance and contact-tracing capacity and has shown in the COVID-19 epidemic evidence of a high sensitivity of case detection. We therefore used a Bayesian modeling approach to estimate the relative imported case detection efficiency for other countries compared to that of Singapore. We estimate that the global ability to detect imported cases is 38% (95% HPDI 22% - 64%) of Singapore’s capacity. Equivalently, an estimate of 2.8 (95% HPDI 1.5 - 4.4) times the current number of imported cases, could have been detected, given all countries had the same detection capacity as Singapore. Using the second component of the Global Health Security index to stratify country likely detection capacities, we found that the ability to detect imported cases among high surveillance countries is 40% (95% HPDI 22% - 67%), among intermediate surveillance countries it is 37% (95% HPDI 18% - 68%), and among low surveillance countries it is 11% (95% HPDI 0% - 42%). We conclude that estimates of case counts in Wuhan based on assumptions of perfect detection in travelers may be underestimated by several fold, and severity correspondingly overestimated by several fold. Undetected cases are likely in countries around the world, with greater risk in countries of low detection capacity and high connectivity to the epicenter of the outbreak.
Introduction
During the outbreak of a new virus SARS-Cov2 and its associated disease COVID-19, infection in travelers has been used to estimate the risk of infection in Wuhan, Hubei Province, China, the epicenter of the outbreak1. This approach is similar to that used for the the 2009 influenza pandemic where infections in tourists returning from Mexico were used to estimate the time-specific risk of infection (incidence or cumulative incidence) with the novel pandemic H1N1 influenza strain in Mexico (or parts thereof). The idea was that surveillance for the novel virus was not intense during the early days of the pandemic in Mexico, the source country, and that detection would be far more sensitive in travelers, who would be screened when returning home as a means of preventing introductions of cases into destination countries 2,3. Reports that health systems in Wuhan are overwhelmed and many cases are not being counted have led to the use of outgoing traveler data to estimate the time-specific risk of infection in Wuhan4. This estimate, in turn, has been used to estimate the cumulative incidence of infection by a certain date in Wuhan, and from there (assuming exponential growth and no appreciable depletion of susceptibles) the cumulative number of cases. An important assumption underlies this calculation: detection of cases in the destination country has been 100% sensitive and specific, whether they are detected at the airport (prevalent cases with symptoms) or later after arrival at their destination (cases that were incubating during travel).
Here we consider the extent to which the assumption of 100% sensitivity is justified. We conclude that the assumption is strongly inconsistent with observed data, resulting in potentially substantial underestimates of prevalence in Hubei and corresponding overestimates of case-severity measures that are normalised by case counts.
We previously showed that there was variability between locations in the world in the relationship between the number of travelers from Wuhan to each location and the number of imported cases detected in that location. On average for countries in the top quartile of the Global Health Security Index5 (indicating presumed high surveillance capacity) one imported case reported over the period 8th January to 4th February was associated with each additional 14 passengers/day historical travel volume6. However there was variation around this average. Among countries with substantial travel volume, Singapore showed the highest ratio of detected imported cases to daily travel volume, a ratio of one case per 5 daily travelers. Singapore is historically known for exceptionally sensitive detection of cases, for example in SARS7, and has had extremely detailed case reporting during the COVID-19 outbreak8.
Methods
We included n=191 locations, from a total of 195 worldwide locations, reflecting mainly countries without taking any position on territorial claims and excluding mainland China (the epicentre of the epidemic) and Hong Kong, Taiwan and Macau, locations where imported and locally transmitted cases were not disaggregated. Data on imported cases aggregated by location were obtained from the WHO technical report dated 4th February 20201 (a zero case count was assumed for all locations not listed). We used case counts up to the 4th February, because after this date the number of exported cases from Hubei province drops rapidly1, likely due to the Hubei-wide lockdowns. We defined imported cases as those with known travel history from China (of those, 83% had travel history from Hubei province, and 17% from unknown locations in China1). Estimates on daily air travel volume were obtained from Lai et al.9. They are based on historical (February 2018) data from the International Air Travel Association and include estimates for the 27 locations that are most connected to Wuhan. They capture the daily average number of passengers traveling via direct and indirect flight itineraries from Wuhan to destinations outside of China. For all 164 locations not listed by Lai et al.9, we set the daily air travel volume to 1.5 passengers per day, which is one half of the minimum reported by Lai et al. Surveillance capacity was assessed using the Global Health Security Index5, which is an assessment of health security across 195 countries agreeing to the International Health Regulations (IHR [2005]). Specifically, we use the second category of the index, Early Detection and Reporting Epidemics of Potential International Concern, henceforth referred to as simply the GHS2 index. We order the n locations according to their GHS2 index and classify locations with GHS2 index above the 80th percentile as high surveillance locations, those with GHS2 index below the 20th percentile as low surveillance locations, with all others classified as locations with intermediate surveillance.
We consider the detection of 18 cases by 4th February 2020 in Singapore1 to be a gold standard of near-perfect detection, and estimate the probability of detection in other countries relative to Singapore according to the following model. We assume that across i = 1,…,n worldwide locations where n=191, the observed case count follows a Poisson distribution, and that the expected case count is linearly proportional to the daily air travel volume and a random variable, θlevel, detecting cases relative to Singapore:
reflecting the probability of where Yi denotes the reported case count in the i-th location, λi denotes the expected case count in the i-th location, β denotes the regression coefficient, xi denotes the daily air travel volume of the i-th location, and θlevel denotes a detection probability relative to Singapore whose index is one. We assume that there are three different levels: low, medium and high. For each θlevel ∈ {θlow, θmed, θhigh} we assign a uniform prior over [0,1], and for log(β) we assign a weakly informative Normal prior with mean zero and standard deviation 50. Having fit the model (see details below) we approximate the distribution of the average detection probability, θglobal, by transforming draws from the posterior distributions of θlow, θmed, θhigh. Specifically, we take the weighted mean of the posterior draws of θlow, θmed, θhigh for i = 2,…n where weights are proportional to daily air travel volume, xi. Exclusion of Singapore (i=1) enables the estimation of the global detection probability relative to Singapore. Conversely, 1/θglobal is an estimate of the multiplier of the case count that could have been detected globally under a capacity equivalent to that of Singapore. We discuss the mean and 95% highest posterior density interval (HPDI) of the numerical approximation of the posterior distribution of θglobal, as well as the mean and 95% HPDI of the numerical approximation of the posterior distribution of 1/θglobal. Note that the latter two are not reciprocals of the former two because the inverse of a mean is not equal to the mean of the inverse, and similarly for the HPDIs.
We fit this model using Stan software (v2.19.1)10 and we draw 80,000 samples from the posterior using four independent chains (20,000 samples each), each with a burn-in of 500. Diagnostic plots of the MCMC sampler for each inferred variables (θlow, θmed, θhigh and β) are shown in Supplementary Figure 1.
All analyses are fully reproducible with the code available online (https://github.com/c2-d2/detect_prob_corona2019).
Results
We estimate that the global ability to detect imported cases is 38% (95% HPDI 22% - 64%) of Singapore’s capacity. Equivalently, an estimate of 2.8 (95% HPDI 1.5 - 4.4) times the current number of imported cases, could have been detected, given all countries had the same detection capacity as Singapore, which are 1.8 (95% CI 0.5 - 3.4) undetected cases per detected case. The ability to detect imported cases among high surveillance countries is 40% (95% HPDI 22% - 67%), among intermediate surveillance countries it is 37% (95% HPDI 18% - 68%), and among low surveillance countries it is 11% (95% HPDI 0% - 42%).
Discussion
In this paper, we have aimed to test a critical assumption underlying the estimation of incidence at the epicentre of the SARS-Cov2 outbreak: that the capacity for detection of international imported cases is 100% sensitive and specific across locations. While there is no reason to doubt specificity of detection to our knowledge, we tested the assumption of perfect sensitivity. Specifically, we regressed cumulative cases to Wuhan-to-location air travel volume considering Singapore to have the greatest detection capacity and estimating the relative underdetection compared to Singapore in the remaining locations indexed i=2,…,n.
We estimated that detection of exported cases from Wuhan worldwide is 38% (95% HPDI 22%-64%) as sensitive as it has been in Singapore. Put another way, this implies that the true number of cases in travelers is at least 2.8x (95% HPDI 1.5x - 4.4x) the number that has been detected. Equivalently, for each detected case there are at least 1.8 (95% CI 0.5,3.4) undetected cases. If the model is correct, this is an upper bound on the detection frequency because (1) Singapore’s detection is probably not 100% efficient. Singapore had as of 12 February 2020 eight documented cases of COVID-19 transmission for which there were no known epidemiological links to China or other known cases11, implying that imported cases in Singapore may have gone undetected (although it is not certain that these imports came from Wuhan or China, and links may still be found). (2) Singapore’s detection like that in other countries has relied largely on symptoms and travel history, so the number of asymptomatic or low-severity cases missed by such a strategy is unknown.
These findings have two important implications for public health response to SARS-CoV2. First, this finding has implications for approaches to case burden and severity estimation which use cases in travelers to impute cases in Wuhan, which are then compared (for severity estimation) against deaths in Wuhan. If the true number of cases in travelers is higher than previously thought, this implies more cases in Wuhan and a larger denominator, resulting in reduced estimates of severity compared to estimates assuming perfect detection in travelers. Future studies should account for our evolving understanding of detection capacity when estimating case numbers and severity in source population on the basis of traveler case numbers. Second, the scenario where the virus has been imported from Wuhan and remained undetected in various worldwide locations is a plausible one, at least until the city lockdown (23rd January 2020), and one might speculate that detection capacity remained limited beyond this period as travelers infected elsewhere in China continued to leave China. Based on our model, the risk of undetected circulation correlates both with air travel connectivity and (inversely) to outbreak detection capacity, but could have happened in virtually any location worldwide leading to the potential risk of self-sustained transmission, which may be an early stage of a global pandemic.
Data Availability
All analyses are fully reproducible with the code available online
Funding
This work was supported by Award Number U54GM088558 from the US National Institute Of General Medical Sciences. P.M.D was supported by the Fellowship Foundation Ramon Areces. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute Of General Medical Sciences or the National Institutes of Health.
Appendix
Supplementary Figures