ABSTRACT
Background The international flight network creates multiple routes by which pathogens can quickly spread across the globe. In the early stages of infectious disease outbreaks, analyses using flight passenger data to identify countries at risk of importing the pathogen are common and can help inform disease control efforts. A challenge faced in this modelling is that the latest aviation statistics (referred to as contemporary data) are typically not immediately available. Therefore, flight patterns from a previous year are often used (referred to as historical data). We explored the suitability of historical data for predicting the spatial spread of emerging epidemics.
Methods We analysed monthly flight passenger data from the International Air Transport Association to assess how baseline air travel patterns were affected in outbreaks of MERS, Zika, and SARS-CoV-2 over the past decade. We then used a stochastic discrete time SEIR metapopulation model to simulate global spread of different pathogens, comparing how epidemic dynamics differed in simulations based on historical and contemporary data.
Results We observed local, short-term disruptions to air travel from South Korea and Brazil for the MERS and Zika outbreaks we studied, whereas global and longer-term flight disruption occurred during the SARS-CoV-2 pandemic.
For outbreak events that were accompanied by local, small, and short-term changes in air travel, epidemic models using historical flight data gave similar projections of timing and locations of disease spread as when using contemporary flight data. However, historical data were less reliable to model the spread of an atypical outbreak such as SARS-CoV-2 in which there were durable and extensive levels of global travel disruption.
Conclusions The use of historical flight data as a proxy in epidemic models is an acceptable practice except in rare, large epidemics that lead to substantial disruptions to international travel.
INTRODUCTION
Localised outbreaks of emerging and re-emerging pathogens are often followed by international spread to multiple countries and continents (1, 2), with human population movement one of the key factors facilitating this spread. The international flight network plays a part in this, connecting populations separated by large distances with short travel times. Understanding the volume and spatiotemporal patterns of flight passengers can therefore provide insights into the routes by which a pathogen can spread (1, 2).
Analyses using flight passenger volumes have answered critical questions in the early phases of previous infectious disease epidemics. Early in the severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) pandemic, passenger data helped to identify the likely locations where the virus could be exported, assess the potential for travel restrictions to control spread, and estimate the true epidemic size in Wuhan based on cases identified among travellers to other countries (3–5). Similar studies were conducted for Ebola in West Africa (6, 7), and Zika (8, 9) and Yellow Fever (10) in the Americas.
Such studies can help control the spread of emerging epidemics through rapid communication of conclusions, increasing international awareness and aiding preparedness, surveillance, and response planning (5, 11, 12). As an outbreak unfolds in real-time, one challenge for spatiotemporal epidemic modelling is that current aviation statistics (referred to as ‘contemporary’ data throughout) are typically not immediately available. However, waiting for the data is not feasible in a rapidly growing epidemic. In addition, flight datasets are typically expensive to purchase. Consequently, the movement data used in models are often selected based on what is available, i.e. typically flight data from previous years.
For example, many of the studies evaluating the potential international spread of SARS-CoV-2 from China in early 2020 used passenger numbers from the corresponding months in 2019 (3, 4, 13–17), or occasionally 2018 (11, 18, 19). In a brief literature search of spatial epidemic models for SARS-CoV-2 including flight data, we found that only one of 10 studies attempted to characterise the actual 2020 flight patterns. That study scaled 2019 passenger data according to more up-to-date information on the numbers of planes departing from China (relative to the equivalent period in the year before) (14). The lack of up-to-date movement data is not unique to SARS-CoV-2 modelling analyses. Historical flight data were also used to model the international spread of Ebola, Zika, and Yellow Fever outbreaks because of data availability issues (6–10).
However, to our knowledge, no study has assessed whether historical datasets are a suitable proxy for contemporary flight patterns when modelling epidemic spatial spread. This is important given that epidemics can affect volumes and spatiotemporal patterns of travel due to public perception of risks or travel bans. In this paper, we explore the suitability of historical datasets for predicting the spatial spread of emerging epidemics. We assess whether implicit assumptions of consistent travel patterns over time are valid and their impact on key outputs of spatial models of infectious disease spread. We aim to: i) identify the extent to which flight volumes were disrupted by previous epidemics; ii) assess whether the most popular destinations for travellers from a given country changed over time and during epidemics; iii) simulate epidemics to compare epidemic model outputs when using historical versus contemporary movement data.
METHODS
An overview of the methods is provided below; further details of the methodology are available in Supplementary Section 1.
We focus on three past epidemics to explore our aims: a Middle East respiratory syndrome (MERS) outbreak in South Korea from May to July 2015 involving 186 reported cases with 38 deaths (20); the Zika epidemic in Brazil that was declared a Public Health Emergency of International Concern (PHEIC) in February 2016 (21); and the SARS-CoV-2 pandemic which emerged in China at the end of 2019 (22).
Flight passenger data
We used flight passenger data purchased from the International Air Transport Association (IATA) (23). The dataset contained the numbers of passengers that travelled between pairs of international airports each month from January 2012 to December 2020, which we aggregated to a country level.
To identify differences in passenger volumes during epidemics, we examined the monthly number of passengers departing from South Korea, Brazil, and China, and calculated changes in passenger numbers during the relevant epidemic period (defined in Supplementary Section 1.i) relative to the same month in the previous year. We also analysed the temporal variation in the flight destinations from each of these three countries, considered as the epidemic centres. We compared how the top 10 destinations (by monthly passenger volume) from the epidemic centres varied for a specified calendar month across the years 2012-2020 (we analysed the months at the beginning of the contemporary periods, see Table S1).
Epidemics simulation study
We conducted a simulation study to compare the characteristics of epidemics modelled using “historical” flight passenger data from the year before the disease emerged with models that used “contemporary” flight data from the epidemic period.
Epidemic model
We used a stochastic discrete time SEIR metapopulation model to simulate the global spread of a pathogen emerging in a single country, with the probability of movement between countries being informed by the IATA passenger data.
Simulation scenarios
We simulated epidemics for three flight scenarios that used data corresponding to the MERS, Zika, and SARS-CoV-2 epidemic periods. The models used either contemporary or historical passenger data. The contemporary and historical periods for each flight scenario are defined in Table S1.
Across all flight scenarios, we simulated epidemics of pathogens with natural history parameter values similar to MERS, Zika, and SARS-CoV-2 (Table S2). These examples explored different basic reproduction numbers (R0, the average number of secondary cases generate by a primary case in a susceptible population) and generation times (time between infection of a case and their infector). Simulations were initiated with 100 infectious cases in the epidemic centre (South Korea, Brazil, and China for MERS, Zika, and SARS-CoV-2 flight scenarios respectively), ran for one year, and assumed that the global population was initially fully susceptible to infection. For each natural history, we simulated 100 epidemics with contemporary flight data and 100 epidemics with historical data. Combining the flight and natural history scenarios gave nine overall scenarios in which we compared historical and contemporary flight data.
For each simulated epidemic we computed the following metrics:
- Number of invaded countries over time: the number of countries with at least 10 cumulative infections at each day.
- Invasion time in i: the time to country i experiencing its 10th cumulative infection.
For the historical and contemporary simulations in each scenario, we summarised the distributions of each metric across all 100 simulations using the median, 2.5% and 97.5% quantiles. We ordered countries by their median invasion times to obtain the average invasion ranking. We identified the first n countries that were invaded with the contemporary flight data, and then calculated the percentage of those countries that were also invaded first when using historical data.
For the simulations using SARS-CoV-2 flight data and natural history, we used the invasion rankings to validate the performance of our model against independent case data from the World Health Organisation for the SARS-CoV-2 pandemic (24). We compared the first 10 countries to report 10 SARS-COV-2 cases (24) with the top 10 invasion rankings from our simulations. Simulations in this validation step were seeded in China in January 2020.
RESULTS
The number of flight passengers departing South Korea and Brazil showed an increasing trend over time (especially pronounced in South Korea), with some within-year seasonal variation (Figure 1A-B). However, epidemic events in those countries were accompanied by deviations from long-term passenger trends. The numbers of people flying from South Korea in the months after the MERS epidemic started (June-August 2015) were between 6.5% and 16.2% lower than the equivalent months in 2014 (Figure 1A). Similarly, passenger departures from Brazil in the months after the declaration of Zika virus as a PHEIC (March-July 2016) were between 3.3% and 10.2% lower than the previous year (Figure 1B). June 2016 had the fourth lowest monthly passenger departures between January 2012 and February 2020, with the months with fewer departures all occurring in 2012. South Korea and Brazil, as well as China (Figure 1C), experienced very large reductions in air travel during the SARS-CoV-2 pandemic. The largest reduction in monthly departures was in April 2020 when passenger numbers decreased by 98.6%, 97.9%, and 98.6% for South Korea, Brazil, and China respectively.
The most popular destinations for flights from the three countries were generally consistent across years prior to the SARS-CoV-2 pandemic (Figure 2). Among these top 10s, there was some variation in the order, but in general the shifts in ordering were small. In South Korea and China, nine countries appeared consistently in the top 10 flight destinations for each year from 2012-2019. Brazil experienced more variability, with only six countries consistently in the top 10 destinations over 2012-2019. However, there was very little change in top destinations during the Zika PHEIC: of the 10 top destinations in March 2015, nine remained in the list for March 2016 (when Zika was a PHEIC). In all three epidemic centres, the year-to-year changes in destination lists were greatest between 2019 and 2020, but still modest: each country had 2/10 new countries in the 2020 lists.
In simulated epidemics comparing historical and contemporary flight data from the MERS or Zika flight scenarios, we found very little difference in the rate the epidemics spread globally (Figure 3, second and third rows), irrespective of the pathogen natural histories. In contrast, in the SARS-CoV-2 flight scenario with extensive disruption to the global flight network, use of the historical flight data resulted in much earlier predicted spread than when using the contemporary flight data (Figure 3, first row). The differences were amplified by increasing the generation time or decreasing R0. In these SARS-CoV-2 flight scenario results, the difference in the median time to 50 countries being invaded was 25, 95, and 84 days for the SARS-CoV-2, MERS, and Zika natural history scenarios respectively. In all simulations, the SARS-CoV-2-like pathogen eventually spread to all countries, even with the extensive disruptions in the contemporary SARS-CoV-2 flight data. This was not the case when contemporary SARS-CoV-2 flight data was used for other natural history scenarios, which spread more slowly due to either longer generation times or smaller R0.
We explored how the differing invasion dynamics were influenced by the relative changes to the number of departing passengers in the contemporary versus historical data (Figure 3, circles). In the MERS and Zika flight scenarios, we found relatively small, short-term reductions to flight departures from the epidemic centre. This contrasted with a small increase in overall global flight departures, which reflected the trend of increasing flight volumes over time (Figure S1). The magnitude of the local changes seemed to have little impact on the initial spread from the epidemic centre and the subsequent rate of global epidemic spread. On the other hand, the SARS-CoV-2 flight scenario showed concurrent, large and durable reductions in both Chinese and total global flight departures.
Consequently, for the slower-growing MERS and Zika natural history scenarios, the epidemics remained localised at the epidemic centre until flight volumes from China recovered.
We found similar country invasion times when using contemporary and historical flight data for the MERS and Zika flight scenarios (Figure 4), across all three natural history scenarios. Differences in predicted invasion times were similar across countries invaded early and those invaded later in the epidemic (Figure 4). However, in the SARS-CoV-2 flight scenario, we found that using historical passenger data substantially underestimated the invasion times. Again, the invasion delay was amplified with larger generation times, with the median underestimation in invasion time ranging from 28 days (2.5% and 97.5% quantiles: 9, 55 days) to 93 days (54, 135 days) for the SARS-CoV-2 and Zika natural history scenarios. For the SARS-CoV-2 natural history scenario, the delays were more marked for countries invaded later, likely reflecting that early invasions occurred when there was less disruption to global travel, while the later countries to be invaded were in a period when there was increased disruption to passenger volumes (Figure 2, first row). Conversely, for the slower growing MERS and Zika natural history scenarios, differences in invasion were smaller for countries invaded later because their invasion occurred at times when there was relatively less disruption (compared to the period when the early countries were invaded) (Figure 2, first row).
Despite some underestimation of invasion times, there was generally good agreement in the first n invaded countries predicted using historical and contemporary flight data across all natural history scenarios (Figure 5). Across simulation scenarios, 60-100% of the first 10 countries invaded using historical flight data also featured in the first 10 countries invaded using contemporary data. This increased to 80-100% for the first 20 invaded countries.
Since our findings are based on simulations, we assessed the extent to which predicted invasion orders reflected reality by comparing our model predictions to independent data on the early spatial spread of COVID-19. Our model performed well in predicting the early countries to report SARS-CoV-2 cases (Table S3). Seven of the first eight countries invaded in the model matched the first eight countries to report cases.
DISCUSSION
Mathematical models of infectious disease spread relying on global flight data are often used in real-time to inform epidemic control efforts. Delayed publication of the latest flight passenger statistics means that models are often constrained to using historical data, typically from the previous year, and therefore do not capture changes to travel patterns and volumes that are caused by the outbreak. In this work, we showed that the standard practice of using historical data generally leads to similar projections of the timing and order of epidemic spread to other countries, compared to using contemporary flight data, for epidemic events with localized, relatively small, short-term mobility changes (such as those experienced during the MERS and Zika outbreaks). The consistency in the predicted order in which the epidemic reached countries is not surprising given our findings that the most common flight destinations were relatively stable over time.
Historical flight data were less suitable for modelling an atypical epidemic such as SARS-CoV-2 with durable, extensive levels of global travel disruption. Although the locations projected to be invaded early were consistent between historical and contemporary flight data, the projected invasion times were vastly underestimated when using historical data. This could lead to the dismissal of preventative interventions that are perceived as too slow for the projected speed of invasion (e.g. building emergency healthcare facilities). It may also reduce public trust in model outputs which could have implications such as decreasing compliance with interventions.
Our work suggests that correcting historical data to predict the contemporary spread of a pathogen would only be necessary for rare events with extensive travel disruptions. In such situations, a correction factor could be applied to historical flight data, as in the approach by Menkir et al (14). However, accurately predicting complex changes to travel in real-time is likely to be challenging.
While our study focused on the robustness of using historical flight data in real-time epidemic models, our findings also provide insights on the potential impact of travel restrictions. Our simulations suggest that large, widespread mobility reductions are needed to substantially impact disease spread. In the MERS and Zika flight scenarios, local, small, and short-term changes in mobility had little impact on the global spread of a pathogen. In the SARS-CoV-2 flight scenario, a rapid decrease in the number of departing passengers from the epidemic centre was soon followed by similar decreases globally. Although this substantially delayed the international spread of the epidemic in our simulations, ultimately all countries were still infected as international travel recovered, and eventually experienced similar epidemic sizes and peak sizes (Figure S2).
Therefore, travel restrictions seem to be insufficient to interrupt transmission sustainably but could provide an opportunity to prepare for the arrival of a pathogen. However, the substantial economic and political costs of introducing travel restrictions (25–28) mean that restrictions will only be worthwhile if the delay they generate is used sensibly, such as for the development of diagnostic, pharmaceutical, and non-pharmaceutical tools, and the logistics of their delivery.
The reductions in air travel in the contemporary SARS-CoV-2 flight data resulted in median delays to invasion of 25 days across the first 50 countries for the SARS-CoV-2 natural history scenario, rising to 95 days for the MERS natural history simulations. For context, over 1.3 million people were vaccinated by day 26 of the UK COVID-19 vaccination rollout, increasing to over 23.3 million by day 95 (29). Although these statistics do not account for the time to develop, manufacture, and distribute vaccines, they provide an example of the speed at which response measures can be implemented.
Our work is limited as our model has not been extensively validated against epidemiological data. However, validation of our SARS-CoV-2 scenarios found that the first countries invaded in our model generally matched the first countries to report cases in early 2020 (24). Further validation is challenging due to variability in the reporting of early cases across countries, with reporting potentially reflecting a country’s capacity to detect and report cases effectively, rather than their true burden.
Future research could investigate whether predictions of epidemic dynamics are improved by combining flight data with real-time movement indicators, e.g. changes in movement around airports from platforms such as Google or Meta (30, 31). However, overall, we showed that using historical instead of contemporary flight data had limited impact on simulated epidemic dynamics for two flight scenarios (MERS and Zika) and a range of pathogen natural histories. Only for the extreme SARS-CoV-2 flight scenario, with an almost complete shutdown of international travel, were projections of invasion times significantly underestimated. We note that the ability to use historical flight passenger data depends on scientists having access to these data; it is essential that those involved in epidemic response have timely access to data and access is not prevented by financial barriers.
Funding acknowledgements
This study is partially funded by the National Institute for Health and Care Research (NIHR) Health Protection Research Unit in Modelling and Health Economics, a partnership between the UK Health Security Agency, Imperial College London and LSHTM (grant code NIHR200908); and acknowledges funding from the MRC Centre for Global Infectious Disease Analysis (reference MR/X020258/1), funded by the UK Medical Research Council (MRC). This UK funded award is carried out in the frame of the Global Health EDCTP3 Joint Undertaking; and acknowledges funding by Community Jameel. JW acknowledges research funding from the Wellcome Trust (grant 102169/Z/13/Z). AC was supported by the Academy of Medical Sciences Springboard scheme, funded by the AMS, Wellcome Trust, BEIS, the British Heart Foundation and Diabetes UK [REF:SBF005\1044]. Disclaimer: The views expressed are those of the author(s) and not necessarily those of the NIHR, UK Health Security Agency or the Department of Health and Social Care. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Author Contributions
JW: Conceptualization, Formal analysis, Methodology, Visualization, Writing – original draft, Writing – review and editing.
SB: Conceptualization, Formal analysis, Methodology, Writing – review and editing.
AC: Conceptualization, Formal analysis, Methodology, Supervision, Writing – review and editing.
PN: Conceptualization, Formal analysis, Methodology, Supervision, Writing – review and editing.
Funding
This study is partially funded by the National Institute for Health and Care Research (NIHR) Health Protection Research Unit in Modelling and Health Economics, a partnership between the UK Health Security Agency, Imperial College London and LSHTM (grant code NIHR200908); and acknowledges funding from the MRC Centre for Global Infectious Disease Analysis (reference MR/X020258/1), funded by the UK Medical Research Council (MRC). This UK funded award is carried out in the frame of the Global Health EDCTP3 Joint Undertaking; and acknowledges funding by Community Jameel. JW acknowledges research funding from the Wellcome Trust (grant 102169/Z/13/Z). AC was supported by the Academy of Medical Sciences Springboard scheme, funded by the AMS, Wellcome Trust, BEIS, the British Heart Foundation and Diabetes UK [REF:SBF005\1044]. Disclaimer: The views expressed are those of the author(s) and not necessarily those of the NIHR, UK Health Security Agency or the Department of Health and Social Care. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Conflicts of Interest
AC has received payment from Pfizer for teaching of mathematical modelling of infectious diseases.