ABSTRACT
288 cases have been confirmed out of China from January 3 to February 13, 2020. We collected and synthesized all available information on these cases from official sources and media. We analyzed importations that were successfully isolated and those leading to onward transmission. We modeled their number over time, in relation to the origin of travel (Hubei province, other Chinese provinces, other countries) and interventions. We characterized importations timeline to assess the rapidity of isolation, and epidemiologically linked clusters to estimate the rate of detection. We found a rapid exponential growth of importations from Hubei, combined with a slower growth from the other areas. We predicted a rebound of importations from South East Asia in the upcoming weeks. Time from travel to detection has considerably decreased since the first importation, however 6 cases out of 10 were estimated to go undetected. Countries outside China should be prepared for the possible emergence of several undetected clusters of chains of local transmissions.
INTRODUCTION
Twenty-six countries worldwide have declared cases of the novel coronavirus, COVID-19, as of February 20, 20201. Only China so far registered a widespread epidemic2, and authorities have implemented massive intervention measures to curtail it3. Outside China, affected countries are facing importations of cases and clusters of local transmission1,4,5 Border controls have been reinforced in many countries, and active surveillance has been intensified to rapidly detect and isolate importations, trace contacts and isolate suspect cases6,7.
The effectiveness of such measures, however, critically depends on COVID-19 epidemiology and natural history8,9, as well as the volume of importations6. The presence of an incubation period, during which infected individuals carry on their usual activities (including travel), is a major challenge for screening controls at airports8. Moreover, mild non-specific symptoms and transmission before the onset of clinical symptoms2,10 may compromise infection control measures for importations and onward transmissions9. There is concern that imported cases may have gone undetected and contribute unknowingly to the global spread of the disease11–15.
Here we systematically collected and analyzed data on 288 COVID-19 confirmed cases outside China. We analyzed importations that were successfully isolated and those leading to onward transmission, characterizing their case timeline. We developed a statistical model to nowcast trends in importations and quantify the proportion of undetected imported cases.
METHODS
Data collection and synthesis
We collected all international cases confirmed by official public health sources in the period from January 3 to February 13, 2020. Case history was reconstructed by searching the scientific literature, official public health sources, and news. Case history included: dates of travel and symptoms onset, date of COVID-19 confirmation, date of hospital admission, date of case isolation, travel history, epidemiological link with other cases, hospitalization history. International cases included imported cases, secondary cases out of China, and repatriations. Cases from cruises were not considered here. Information was extracted by LDD and EO and checked by MM. The full database, along with the database describing clusters, were made publicly available16.
Descriptive analysis
For imported cases with full information on the timeline of events, we computed the average duration from travel to onset, from travel to hospitalization, and from hospitalization to reporting. We used analysis of variance to compare groups of imported cases that generated or did not generate local transmissions. We extended the analysis to all imported cases combining cases with full and partial information on the timeline. We used the analysis of variance and multiple imputation for the missing data. Results were combined using Rubin’s approach17.
Modeling and predicting importations
We modeled the total number of imported cases out of China over time accounting for date of travel, delay in reporting, and source areas.
We distinguished between three different sources: Hubei province (H), the rest of China (C), other countries (O). We modeled imported cases over time as a piecewise exponential function depending on the source and on travel restrictions in place. We assumed a different situation in Hubei province and the rest of the world due to the level of awareness in the different phases of the outbreak. The exponential functions are defined as follows: where is the growth rate of cases coming from Hubei, and , with , the growth rates of cases coming from the rest of China and other countries, respectively. Travel restrictions were modelled by assuming a discontinuity in the growth rate. For Hubei, we assumed the growth rate to change from to after the travel ban of January 23, 2020 (indicated with TC); for the rest of China, we assumed an analogous change from to after January 29, 2020 (TC), date of first flight cancellations18. No change was considered for the other countries (r0 constant over time), as no restrictions of travel were established towards these countries. The scale parameters of the exponential functions were assumed to be different among the three sources, to account for different traveling volumes and dates of beginning of importations.
We modelled the time τ from importation to detection of a case with a gamma distribution, gt(τ), conditioned to the date of case importation, t. gt(τ) was assumed to have constant coefficient of variation (SD/mean) achieved by a constant shape parameter and a rate parameter varying smoothly in time to capture change in surveillance efficiency.
We used a Bayesian framework to fit the model to imported cases by origin, travel date, and confirmation date. Cases with partial information (e.g. missing date and/or origin of travel) were included by defining latent variables marginalized out during inference. The model was then used to nowcast imported cases two weeks in the future. All details of the analysis are reported in the Appendix.
Estimation of under-detection of imported cases
We analyzed clusters of transmission generated by imported cases (index case(s) in each cluster) to estimate undetected importations. A cluster can be seeded by more than one index case when local transmissions are epidemiologically linked to more cases traveling together (e.g. infected family members traveling together). We modelled the number of such ‘cluster seeds’, i.e. groups of index cases, with a multinomial distribution depending on the portion of cluster seeds of size 1 or greater than 1 (for simplicity, this was taken as 2), on the probability of detection of a seed, and on occurrence of secondary transmission. The likelihood function was a function of: the number x2 of observed clusters with one index case; the number x1 of observed clusters with more than 1 index cases; the number of detected index cases not leading to onward transmission; the number z of clusters whose index cases have not been identified; and the number w of undetected imported cases that did not generate any cluster. w can be estimated through likelihood maximization from the records of .
RESULTS
Timeline of travel-related cases
We collected 288 cases, including 163 imported cases, 109 local transmissions, 30 repatriations, and 1 case of unknown origin. Fifteen cases were classified as both imported and local transmissions, since they contracted the infection outside China and traveled to a different country once infected (ES01, ES02, GB03, GB04, GB05, GB06, GB07, GB08, KR12, KR16, KR17, KR19, MY09, TH20, TH21 in our database16).
Figure 1 summarizes the timeline of imported cases. Symptoms onset occurred after the travel to the destination country for almost all cases for which date of travel and of onset are available (68 out of 73, 93%). Complete information was available for 51 (31%) imported cases, with quality of information decreasing over time (Figure S1 of the Appendix).
Among imported cases with full information, the delay from travel to hospitalization was longer in cases that generated secondary transmissions (mean of 10 ± 0.97 days compared to 5.5 ± 0.67 days, p=0.003). Overall, the duration from travel to first event (whether symptom onset, or hospitalization for asymptomatic) was also longer, although the difference was not statistically significant (5.0 ± 0.9 days vs. 3.7 ± 0.5 days p= 0.08). Durations of hospitalization were instead comparable among the two groups of cases (1.5 ± 0.7 days vs. 2.6 ± 0.4 days for cases that generated or did not generate secondary transmissions, respectively). Including imported cases with missing information through imputation, we found the same trend though smaller in magnitude and not statistically significant (delay from travel to hospitalization 9.8 ± 1.2 vs. 8.3 ± 0.5 days p= 0.3; delay from travel to onset 5.8 ±1.1 vs. 4.2 ±0.5 p= 0.16, for cases that generated or did not generate secondary transmissions, respectively). This suggests that importations with missing information may be closer in characteristics to index cases leading to onward transmission.
The statistical model predicted a decrease in the average time from travel to detection from 14.5 ± 5.5 days on January 5, 2020 to 6 ± 3.5 days on February 1, 2020 (Figure 2).
Nowcasting travel-related cases
The model predicted a rapid exponential growth of importations from Hubei, with a growth rate , corresponding to a doubling time of 2.8 days. In comparison, the exponential growth from other territories (rest of China and countries other than China) was slow, . After the implementation of travel restrictions, a negative growth rate was estimated, signaling a decline in imported cases. The decline was sharp for Hubei and more gradual for the rest of China .
The predicted trend of all imported cases over time is shown in Figure 3, compared with the observed data. Reported importations are predicted to remain stationary in the second and third week of February and to rise again due to the effect of transmission clusters outside China. Imported cases after February 13, 2020 are in agreement with model predictions (Fig.3).
Trasmission clusters outside China
Forty-two transmission clusters were identified out of China in the timeframe under study. Table 1 summarizes the size and country of each cluster. Clusters were grouped according to whether the index case: (i) was a traveling case identified prior to cluster detection; (ii) a traveling case not identified or identified retrospectively once the cluster was observed; (iii) completely unknown. Assuming that clusters of unknown origin were linked to one of the already observed imported cases - or, in other words, not linked to an undetected imported case - led to an estimate of 76 [49, 118] undetected imported cases. In this scenario, detected cases would amount to 65% of all imported cases. Assuming instead that all clusters of unknown origin were due to undetected imported cases increased the number of undetected cases to 225 [186, 369], i.e. detected cases would correspond to only 36% of the total.
DISCUSSION
As the COVID-19 epidemic in China shows effects of mitigation2, increasingly larger clusters of infections reported outside China are raising concern that other territories may start sustaining the outbreak4,5. To contain it globally, identification, rapid management of cases, and contact tracing are key. The success of these response measures depends critically on the volume of importations19 and the sensitivity of active surveillance13,15.
We reviewed here all confirmed cases out of China from January 3 to February 13, 2020 and gathered detailed information on case history and epidemiological links. We identified salient epidemiological features, and modeled the number of importations over time. International exportations from Hubei grew rapidly, fueled by the local epidemic, up to the closure of Wuhan airport preventing further travel of cases. Exportations from other Chinese provinces and other countries grew at a considerably slower pace. This is related to the difference in the increase of cases between Hubei province, origin of the outbreak, and the rest of the affected areas1. Such difference is likely an outcome of the implementation of containment measures in China3,20,21, and of the increased awareness at different phases of the outbreak22–26 (i.e. before and after containment measures) leading to self-isolation and quarantine.
The reduced volume of exported cases worldwide following the travel ban may have given countries the time to prepare and strengthen their surveillance systems, as signaled by a reduction of the interval from travel date to detection over time.
Our model predicts that exportations will likely rise from areas outside China. The number of local transmissions is rapidly increasing in the Republic of Korea, Japan, and Singapore27, and few importations in Asia and Europe were registered already from travelers from Japan and Singapore. For this reason, certain countries have updated the history of travel for the case definition of a suspect imported case to include additional countries in South Asia besides China28 or banned travelers from East Asian countries29. ECDC and WHO currently base their case definition on travel from China only30,31, but this may rapidly change in the next days.
Before the likely rebound of exportations, identification and isolation of possible clusters outside China remain essential to contain local transmission. The increasing reporting of clusters outside China with no known epidemiological link1,14 raises important concerns on the possibility to contain COVID-19 epidemic worldwide. Our estimates indicate an ability of 36% to detect imported cases in countries outside China. This means that approximately 6 imported cases out of 10 have gone undetected. Previous estimates range from 27%13 to 38%13,15 detection rates, with variations across countries13,15. Ascertainment was estimated to be even lower (approximately 10%) when assessed on repatriations31. Here, we excluded from this analysis all repatriation events and cruises with outbreaks, as conditions for detection and identification may be different.
Underdetection may be due to several different factors including asymptomatic infections, infections with mild clinical symptoms, health-seeking behavior and declaration of travel history, case definition, and underdiagnosis. Underdetection of imported cases is likely to be higher than what we estimate here, as our analysis is conditional to the identification of clusters of cases. The current situation in Italy, with different clusters emerging in the timeframe of few hours in different areas in the North of the country14, shows that clusters have gone undetected and epidemiological links with the index case are still missing. Countries outside China should be prepared for the possible emergence of several undetected clusters of chains of local transmissions. Surveillance efforts to track all suspect cases may become impractical if the number of cases increases too rapidly32. If that situation occurs, countries should be ready to step-up their response and take preparatory steps for community interventions.
Data Availability
Data were collected by the authors and made publicly available online.
ACKNOWLEDGMENTS
This study is partially funded by: the ANR project DATAREDUX (ANR-19-CE46-0008-03); the EU grant MOOD (H2020-874850); the Municipality of Paris through the programme Emergence(s). We thank REACTing (https://reacting.inserm.fr/) for useful discussions.
APPENDIX
1. DATA
2. STATISTICAL METHODS
Modelling traveling cases and delay from arrival to detection
Dataset
The individual data consists of t-uples (S, f, o), where:
S indicates place of departure as Hubei province (H), China other than Hubei (C), outside China (O);
f ∈ {1, …, T} is the day the case arrived at destination, counted from January 5th up to current date T;
o ∈ {1, …, T} is the day the case was confirmed, counted from January 5th.
Modelling the detection delay
The difference D = o − f corresponds to the time from arrival to confirmation. To account for changes in detection efficiency, we modelled D as a (discretized) gamma distribution with parameters changing with time. More precisely, the rate parameter of the distribution was βf = a ∗ ebf. The shape parameter k was constant, leading to a constant coefficient of variation (Standard deviation/).
We truncated the distribution at TD = 25 days and computed probabilities that D was τ days as: where K is a normalization constant accounting for the truncation at TD.
We denote the corresponding cumulative distribution function of D by Gf(τ) = P(D ≤ τ + 0.5|βf, k).
Modelling cases arrival
We computed AS = {AS,t}t=1,…,T the daily number of cases arriving from location S on date t that had been detected before time T, and NS = ∑t AS,t the total number of such cases arriving from location S.
Due to the time lag between arrival and confirmation, some cases arriving on time t from location S can be undetected as of time T. We denote US,t the number of such cases. Then, the total count of cases arriving on day t is given by AS,t + US,t. We assumed a Poisson distribution for this count, AS,t + US,t∼ Poisson(IS,t), where IS,t represents the expected number of imported cases from location S on day t.
We modelled IS,t as a piecewise exponential function in each location of origin S, the exponential growth parameter changing in Hubei after the ban instated on January 23rd and in the rest of China after flight cancellation by major airline companies on January 29th. IS,t was therefore: where TS is the last day before the start of quarantine/travel restriction in location are hyperparameters representing the scale and the growth rate of each exponential, and is determined by continuity of IS,t at TS.
Outside China we assumed a single exponential function with the same growth rate as in China outside Hubei before travel restrictions were put in place and a different scale :
49 confirmed cases had no information on date of arrival and/or origin of travel. These cases were described with latent variables as follows:
, the time series that accounts for case counts with unknown date of arrival;
i.e. case counts with unknown travel origin;
, i.e. cases with both information missing.
The framework described above was extended to account for these cases, i.e. we considered to be the number of cases arriving from destination S on time
Likelihood function
The components of the estimated parameters θ and prior distributions are listed in Table S2.
The likelihood of the observations is given by:
Where:
- P(A, X, U|θ) is the term describing observed incidence according to the model as: where IS,t is the expected incidence in location S at day t described above;
- P(D|θ) is the term describing observed and unobserved duration between arrival and detection:
where DS,t,i are the individual times to detection of those travelling from location S on day t,
-P(θ) is the prior model for all parameters
For ease of computation, the likelihood is marginalized over latent variables and corresponding to cases with missing information , so that data augmentation is unnecessary in the computation of the posterior distribution for the parameters. The final likelihood is:
Here we have defined for convenience the following variables: and μ(t) = ΣS μS(t) and introduced MS the number of cases travelling from source S and with unknown date of arrival, Xt is the number of cases that arrived on day t from an unknown source, and MX is the number of cases with unknown travel source and date of arrival.
Inference was performed by MCMC sampling using Stan. We used 3 chains with 6000 iterations and discarded the first 50% as burn-in.
We computed the median of the posterior distributions as well as credible intervals for each parameter in θ. Additionally, we computed predictive distribution statistics about the number of cases confirmed on day t, e.g. the average value as well as upper and lower quantiles, using Poisson distribution with mean μ(t) = ∑S= H,C,OμS(t).
Modelling index case detection probability
Dataset
We define as seed an imported case or a group of cases that could have started a cluster of local transmission outside China. We computed the number x1 of transmission clusters where a seed of size 1 was among the cases identified in the cluster and likewise x2 with seeds of size >1. We also computed the number of imported cases that did not start a transmission cluster and the number z of clusters for which a seed was not observed among the tested cases, i.e. clusters without a direct link to an imported case.
Modeling index case detection
We assumed that seeds could be of size 1 with probability λ or of size 2 with probability 1 − λ. A seed could be observed with probability π and started a cluster with probability φ.
The number of imported cases that did not start a cluster consist of y1 and y2 seeds of size 1 and 2 such that and y1 + y2 = y; however the grouping of these cases is unknown. We computed y out of using a plug-in estimate where the mean of the fraction y1/y2 was λ/(1 − λ), i.e .
Denote w the number of seeds of any size that went undetected and did not give start to a cluster, with probability: (1 − π)(1 − φ). w is latent and estimated together with λ, π and φ.
Likelihood function
The likelihood is based on a multinomial distribution for x1, x2, y, z and w:
Parameters can be estimated at maximum likelihood:
- Differentiating the likelihood function according to λ, π and φ:
- Approximating the maximum w by looking for the value where L(θ, w) = L(θ, w − 1) (Pollock KH, Building Models of Capture-Recapture Experiments, The Statistician (1976); 25 (4) : 253-9). We then find:
By replacing and in the previous equation we find that the Maximum Likelihood estimator for w is given by:
Confidence intervals are computed using profile likelihood methods.
Finally, we estimate the number of unobserved cases that did not give start to a cluster as . The confidence interval on this last quantity is computed by multiplying the confidence intervals of both factors.
3. ADDITIONAL RESULTS
Dataset of international cases
We analyze in Figure S1 the proportion of traveling cases for which we have complete information regarding the timeline of events. Detailed information of the clusters of transmission is reported in Table S3.
Results of likelihood estimation
We provide in Table S4 all parameter estimates and their confidence intervals. The convergence of the MCMC and the posterior distribution of key parameters are shown in Figure S2.
Sensitivity analysis
On 23/01/2020 all trains, flights and public transports connecting Wuhan with the outside were suspended. We accounted for the possibility that this ban was initially not completely effective, e.g. people at the point of departing were still able to get out of the area with private transports. We consider a sensitivity scenario in which the effects of the travel ban in Wuhan took place on the 24/01/2020, one day later. We found that growth rates changed slightly with respect to the baseline case; in particular .
Analysis of imported clusters: summary of parameter estimates
Here we report Maximum Likelihood estimates of parameters in the analysis of imported clusters. We estimate the number of unobserved cases that did not give start to a cluster as . The confidence interval on this last quantity is computed by multiplying the confidence intervals of both factors. For z = 8 and z = 27 we estimate 76 [49, 118] and 255 [186, 369] undetected cases, respectively. We then estimate the fraction of detected imported cases as , which yields 65% and 36% for z = 8 and z = 27, respectively.