Gaussian Process Emulation for Modeling Dengue Outbreak Dynamics
================================================================

* Anna M. Langmüller
* Kiran A. Chandrasekher
* Benjamin C. Haller
* Samuel E. Champer
* Courtney C. Murdock
* Philipp W. Messer

## Abstract

Epidemiological models that aim for a high degree of biological realism by simulating every individual in a population are unavoidably complex, with many free parameters, which makes systematic explorations of their dynamics computationally challenging. This study investigates the potential of Gaussian Process emulation to overcome this obstacle. To simulate disease dynamics, we developed an individual-based model of dengue transmission that includes factors such as social structure, seasonality, and variation in human movement. We trained three Gaussian Process surrogate models on three outcomes: outbreak probability, maximum incidence, and epidemic duration. These models enable the rapid prediction of outcomes at any point in the eight-dimensional parameter space of the original model. Our analysis revealed that average infectivity and average human mobility are key drivers of these epidemiological metrics, while the seasonal timing of the first infection can influence the course of the epidemic outbreak. We use a dataset comprising more than 1,000 dengue epidemics observed over 12 years in Colombia to calibrate our Gaussian Process model and evaluate its predictive power. The calibrated Gaussian Process model identifies a subset of municipalities with consistently higher average infectivity estimates, highlighting them as promising areas for targeted public health interventions. Overall, this work underscores the potential of Gaussian Process emulation to enable the use of more complex individual-based models in epidemiology, allowing a higher degree of realism and accuracy that should increase our ability to control important diseases such as dengue.

Keywords
* individual-based modeling
* statistical emulation
* Gaussian Processes
* epidemiological modeling
* variance-based sensitivity analysis

## Introduction

Simulation models that describe individual organisms or agents have become a well-established research tool across numerous scientific fields1. These so-called individual-based models (IBMs) can allow researchers to explore how system-level characteristics emerge from individual behaviors, while also investigating the reciprocal influence of the system on individuals1. In the field of epidemiology, IBMs have provided valuable insights into the dynamics of pathogen and disease spread and have facilitated rigorous evaluation of planned intervention strategies, making them an integral part of modern epidemiological research2–6. Recent computational advances, combined with the development of comprehensive individual-based simulation frameworks2,7, have enabled the creation of epidemiological models with unprecedented realism. These models can capture details ranging from fine-scale human movement5,8 to specific larval breeding sites for organisms that spread vector-borne diseases3.

While enhanced biological realism has undeniably deepened our understanding of epidemiological processes, it also introduces increased complexity because of the level of detail being simulated, which comes at a substantial computational cost. As IBMs become more realistic, they also become more parameter-rich, making it increasingly difficult to identify the key drivers of disease dynamics. This is partly because parameters often interact in complex, non-linear ways, complicating efforts to quantify the contribution of any single factor to model outcomes. Global sensitivity analysis can help by quantifying the relative contribution of each parameter — as well as their interactions — to IBM outcomes. For example, the Sobol method9 is a global sensitivity analysis approach that is able to assess complex, non-linear parameter interactions by partitioning the observed variance in IBM output into relative contributions from single parameters as well as interactions between two or more parameters. This allows researchers to gain a deeper understanding of the key drivers of the disease dynamics observed in a simulation.

Global sensitivity analysis can provide valuable insights into model dynamics, but generating sufficient data for robust sensitivity analysis becomes increasingly computationally demanding with each additional parameter10. Even with IBMs optimized for runtime performance, comprehensively surveying high-dimensional parameter spaces can be daunting and time-consuming. This is particularly true when large populations are modeled, since the computational complexity of IBMs typically scales at least linearly with the number of individuals modeled. When more intricate behaviors or interactions are included, the computational burden can even increase quadratically (such as when every pairwise interaction between individuals must be simulated), further intensifying the challenge of parameter space exploration.

Statistical emulation11 is a powerful technique for analyzing IBMs with high-dimensional parameter spaces. It can potentially provide an efficient solution to this dimensionality problem (sometimes referred to as the “Curse of Dimensionality”10) by greatly reducing the number of simulation runs required to understand the behavior of the IBM across the parameter space of interest. This technique involves the creation of a surrogate model, based on a limited set of IBM simulations, that can rapidly predict the output of the IBM. Emulators can be developed using either statistical, or, more recently, machine learning approaches12. A key feature of a well-designed emulator is its ability to provide highly accurate predictions significantly faster than the IBM could — often orders of magnitude faster. In some cases, this can effectively reduce simulation times from days or weeks to mere seconds or minutes, paving the way for comprehensive sensitivity analysis and broader model exploration.

Gaussian Processes (GPs), first introduced in the 1960s within the field of geostatistics13,14, are flexible statistical emulators that have been successfully applied across a range of disciplines15–17. GPs are non-parametric models that define a distribution over functions based on observed data. A key advantage of GPs over other machine learning techniques, such as conventional support vector machines or neural networks, lies in their Bayesian foundations, which allow GPs to provide confidence intervals alongside their predictions. This uncertainty quantification enables efficient sampling of additional training data from regions with the greatest uncertainty, facilitating active learning18 that can quickly produce highly accurate emulation. Furthermore, with the availability of advanced software packages supporting GPU acceleration, the computational efficiency of GPs has improved at an astonishing pace in recent years, making them an increasingly attractive research tool19.

In epidemiology, the ability of GPs to efficiently extrapolate between sparse data points is often utilized for estimating disease incidence counts in areas where data is missing or unobserved20,21. Beyond interpolation, GPs serve as valuable forecasting tools17,22, and are key components of early-warning systems23. Furthermore, as emulators of complex, computationally intensive IBMs, GPs facilitate the calibration of these IBMs to empirical data by helping to select parameter values for which the IBM’s outputs closely match observed real-world data24–26. However, the potential of GPs for enabling comprehensive sensitivity analysis — particularly to identify key drivers of disease dynamics in an IBM — remains underexplored. Notable examples of using GPs as emulators to better understand complex IBMs include recent studies that applied GP emulation to the OpenMalaria model26 (an advanced IBM developed to simulate malaria transmission and control) to explore key drivers of the spread of drug-resistant *Plasmodium falciparum*27, and to assess the effectiveness of various intervention strategies28,29. However, these previous studies focused exclusively on malaria.

In this study, we introduce the use of GP-based sensitivity analysis to dengue transmission modeling, offering a refined perspective on the factors that drive epidemic patterns. We employed GP surrogate models to efficiently predict three key metrics from a dengue-inspired individual-based disease transmission model: outbreak probability, maximum incidence, and duration. Dengue poses a growing global health threat30, with cases rapidly increasing due to urbanization31 and climate change that has expanded the habitat of *Aedes* mosquitoes, the primary vectors of the dengue virus32,33. Our IBM allows us to simulate disease transmission while accounting for social structure, human movement, and variation in infection probability. We demonstrate that our GPs explore the parameter space with impressive speed, enabling a more comprehensive sensitivity analysis than previous studies have undertaken. We conduct a global sensitivity analysis and find that average infectivity and average human mobility are primary drivers of outbreak dynamics, while the interactions between seasonality strength and initial infection timing can critically influence the course of epidemic outbreaks. To determine whether insights from our sensitivity analysis could inform our understanding of real-world epidemics, we investigate weekly dengue incidence data at the municipality level across Colombia over more than a decade34,35, identifying municipalities that could serve as potential candidates for targeted interventions or in-depth studies.

## Results

### Individual-based model

We implemented an IBM in C++ that simulates and tracks transmission dynamics of dengue to study how disease dynamics are influenced by infection probability, human movement, and social structure. The model is designed not to replicate a specific empirical system, but to illustrate the relative importance of these parameters and their interactions in influencing the course of epidemic outbreaks. A detailed description of the IBM can be found in the Materials & Methods section. All model parameters are summarized in Table 1. Briefly, each simulation begins by generating 10,000 locations that are each home to a group of susceptible individuals. The number of individuals per location is sampled from a negative binomial distribution fitted to the demography of Iquitos, Peru — a well-studied dengue transmission hotspot36,37. These locations are then randomly organized into non-overlapping family clusters, with the “family cluster size” parameter controlling the number of locations per cluster. Social structure, controlled by the “social structure” parameter, influences the likelihood that individuals interact within their family cluster. Human movement — the number of visits to locations per day — is sampled from a negative binomial distribution, defined by the “average mobility” and “mobility skewness” parameters. The social structure of the model is depicted in Supplemental Figure S1.

View this table:
[Table 1.](http://medrxiv.org/content/early/2024/11/29/2024.11.28.24318136/T1)

Table 1. Parameters of the individual-based disease transmission model

The disease is introduced by infecting a single randomly chosen individual. Infected individuals remain contagious for a number of days specified by the “infectious period” parameter, after which they recover and gain lasting immunity (and thus cannot become reinfected). When a susceptible individual visits a location that was visited by infectious individuals the day before, the likelihood of infection from each previous infectious visitor is determined by the infection probability, which accounts for seasonal fluctuations in infection risk due to changes in mosquito abundance. We chose not to model individual mosquitoes explicitly. This choice was driven by the very limited dispersal ability of *Aedes aegypti* 38, the primary vector for dengue transmission in the Americas, which predominantly bites during daylight hours39. Consequently, human movement patterns tend to be more influential than mosquito movement in shaping dengue dynamics36,40,41.

The infection probability is defined by a cosine function with three parameters: (i) the “average infectivity” parameter (α), representing the average infection probability over the course of a year (365 days); (ii) the “seasonality strength” parameter (α*season*), controlling the magnitude of seasonal variation in infection probability; and (iii) the “first case timing” parameter (*t**first*), defining the horizontal shift of the cosine function and thus the timing of the first case relative to the peak infection probability due to seasonality. Together, these parameters define the infection probability at any given day *t* in the year: ![Formula][1]</img> The IBM progresses by daily timesteps and continues until there are no infectious individuals left. The output consists of daily counts of individuals in each infection state (susceptible, exposed, infectious, and recovered). For each combination of parameters, we use 100 replicate simulation runs to calculate three metrics of the simulated epidemics: (i) outbreak probability, defined as the proportion of simulation runs in which more than 0.1% of the population becomes infected; (ii) maximum disease incidence (*imax*), defined as the highest proportion of infectious individuals seen in any timestep; and (iii) outbreak duration, defined as the timespan in days from the first infectious case to the recovery of the last infectious individual. To calculate *imax* and outbreak duration, we average across 100 simulation runs where an outbreak occurred (doing additional runs as needed to obtain 100 such runs for each parameter combination), thereby minimizing confounding effects from stochastic losses of the disease.

We systematically varied the eight parameters outlined above to explore how the simulated epidemics change across the parameter space. Across the full range of parameters (Table 1), the three metrics exhibit significant variability: the average outbreak probability is 0.79, ranging from 0 to 1; the average *imax* is 0.67, ranging from 0.0003 to 0.99; and the average duration is 63.88 days, ranging from 19.65 to 424.15 days.

### Gaussian Process training & performance

Even for relatively simple IBMs, generating the necessary data for a comprehensive sensitivity analysis can become computationally prohibitive, especially when the parameter space is high-dimensional and parameters interact with one another in complex ways. To address this, we implemented GP surrogate models as statistical emulation tools11,12, and trained them on input–output data pairs from our IBM (Materials & Methods section). We trained three independent GPs to predict outbreak probability, *imax*, and outbreak duration. The Bayesian nature of GPs allows for uncertainty quantification of single predictions18, which we utilized in an active learning loop to adaptively select additional training data points (Figure 1A).

![Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/11/29/2024.11.28.24318136/F1.medium.gif)

[Figure 1.](http://medrxiv.org/content/early/2024/11/29/2024.11.28.24318136/F1)

Figure 1.
(A) Gaussian Process (GP) training loop18. The GP training begins with an initial training dataset consisting of a Latin hypercube sample (LHS) of 5,000 data points generated from the input domain (Table 1) using the individual-based simulation model (IBM). During training, the GP is continuously evaluated against a validation dataset of 10,000 data points to prevent overfitting. After each training cycle, 107 potential new data points are scored based on a policy that considers their predicted value and 95% confidence interval. In each iteration of the training loop, 1,000 additional data points are sampled from those 107 candidate points, where the probability of being sampled is proportional to their respective scores. The newly selected data points are then simulated using the IBM, added to the training dataset, and the next training iteration begins. (B) Usage of the trained GP. After training, the GP is tested using an independent dataset of 10,000 LHS data points to evaluate its performance. The trained GP can then be used for rapid predictions, enabling large-scale global sensitivity analyses.

Thanks to the optimization techniques and GPU acceleration implemented in GPyTorch19, GP training remained computationally manageable. On our local machine (i5-12600K CPU, GeForce RTX 4090 GPU), one training round with 30,000 iterations took approximately 15 minutes to 2 hours, depending on the size of the training data (5,000 – 20,000 points, Figure 1A: Step 1). The majority of time in the overall process was spent generating additional training data with the IBM during the active learning rounds (Figure 1A: Step 3). This was more time-intensive (∼10 hours on our local machine) for the GPs modeling *imax* and outbreak duration than for the GP modeling outbreak probability, because the training data for the latter also included many simulation runs where no outbreaks occurred (which are typically very fast).

One of the primary goals of statistical emulation is to reduce computational time12. GPs are particularly efficient for this purpose because they provide closed-form solutions for predictions, and computationally expensive matrix inversions are performed during training, not during prediction18,42. Thus, once trained, GPs can predict new data points rapidly. In our case, we observed a significant speed-up: with our local machine we were able to predict the 10,000 data points needed for Figure 3C in only 0.1 seconds. This is equivalent to performing 106 IBM simulation runs, which, depending on available computational resources, would typically take several hours in a computing cluster environment. Additionally, the runtime for GP predictions is deterministic and constant due to the closed-form solution, whereas the runtime of IBM simulations varies depending on the duration of the disease outbreak.

Model performance was evaluated using root mean square error (RMSE) — both for continuous checks against a validation dataset during training to avoid overfitting (Figure 1A), and for assessing the accuracy of the GPs after training was completed (Figure 1B). During training, we observed that the first few adaptive training rounds tended to lead to the most significant improvements in model performance, whereas additional rounds later in the process yielded diminishing returns (Figure S1). The final GPs achieved RMSE values of 0.057 for outbreak probability, 0.042 for *imax*, and 0.068 for duration (Figure S2).

We observed greater variance in the model’s predictive accuracy for weaker epidemics (i.e., lower *imax* values, Figure 2B) and longer epidemics (i.e., larger duration, Figure 2C). In such cases, stochasticity plays a larger role, making predictions more challenging. For the *imax* GP, there were instances where the intensity of the epidemic outbreak was severely overpredicted (Figure 2B). This might have resulted from neighboring data points with very different properties. Although our kernel choice allows for rather abrupt changes in function values, the interpolation might not fully capture the true dynamics of the underlying model if it does not happen exactly at the midpoint between the two points.

![Figure 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/11/29/2024.11.28.24318136/F2.medium.gif)

[Figure 2.](http://medrxiv.org/content/early/2024/11/29/2024.11.28.24318136/F2)

Figure 2.
Gaussian Process performance evaluation. Comparison of observed versus predicted values for 500 randomly sampled test data points. The yellow line represents the identity line (*x = y*) for (A) outbreak probability (B) maximum incidence (*imax*), and (C) log10-transformed duration.

### Sensitivity analysis & specific model outcomes

The computational efficiency of the GPs allowed us to perform a comprehensive variance-based sensitivity analysis of our model. For this, we used the Sobol method9, a variance-based approach that quantifies the contribution of each variable — both independently and in interaction with others — to the overall variance of the model output. We estimated first-order effects (the influence of single parameters independent of others), second-order effects (the contribution of pairwise interactions between parameters), and total-order effects (which include first-order effects as well as all interactions of any order).

This analysis revealed that average infectivity and average mobility are the primary drivers in our model, shaping all three epidemiological metrics: outbreak probability, *imax*, and outbreak duration. Since there is no correlation between the number of visits sampled for a given individual over time, our model does not include systematic super-spreading behavior. As a result, we expected the sensitivity index for mobility skewness to be low across all three metrics, in contrast to a stronger impact of average mobility. Our findings confirm this expectation, with mobility skewness showing no influence on the epidemiological metrics (*imax*: Figure 3A, outbreak probability: Figure S3A, outbreak duration: Figure S4A).

![Figure 3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/11/29/2024.11.28.24318136/F3.medium.gif)

[Figure 3.](http://medrxiv.org/content/early/2024/11/29/2024.11.28.24318136/F3)

Figure 3.
Sobol sensitivity analysis, maximum incidence (*imax*). (A) First-order and total effects across the entire input domain (Table 1). The first-order effect describes the impact of a single parameter on the model output (*imax*), while the total effect of a parameter accounts for both its first-order effect and all interactions with other parameters. Error bars represent the 95% confidence intervals of the sensitivity index estimates. A total of 9,437,184 points were evaluated for the sensitivity analysis. (B) Second-order effects across the entire input domain (Table 1). A second-order effect captures the pairwise interaction between two parameters. Sobol indices with a 95% confidence interval that does not overlap zero are highlighted with a pink border. The largest second-order effect is emphasized with a bold pink border. (C) *imax* predictions with varying "seasonality strength" and "first case timing" parameters (i.e., the two parameters with the largest second-order effect, see panel B). Other parameters were fixed at default values (Table 1). Corresponding sobol sensitivity analysis plots for outbreak probability and outbreak duration can be found in the supplementary figure section (Figures S3 and S4).

The first-order effect estimates for average infectivity are nearly identical across all three metrics: outbreak probability, *imax*, and duration (0.52, 0.53, and 0.53, respectively; Figures 3, S3–S4). However, the total effect of average infectivity is notably higher for outbreak probability (0.69; Figure S3) compared to *imax* and duration (both 0.58; Figure 3, S4). The higher total-order effect for outbreak probability is primarily driven by the interaction between average infectivity and average mobility (Figure S3B–C). This makes intuitive sense, as highly infectious diseases can still trigger outbreaks even when individual movement is limited. By contrast, the spread of less infectious diseases relies more heavily on sufficient individual mobility to compensate for a lower per-contact infection probability. Accordingly, the first-order effects of average mobility are smaller for outbreak probability compared to *imax* and duration (0.12 vs. 0.26 and 0.27). However, the total-order effects of average mobility are similar across all three metrics (0.27, 0.29, and 0.3, respectively, Figures 3, S3–S4).

The reduced importance of the interaction between average infectivity and average mobility for *imax* and duration is due to the fact that these metrics are calculated only across simulations in which an actual outbreak occured. For these two metrics, the largest second-order effect is the interaction between the seasonality strength and the timing of the first infectious case (Figure 3B, Figure S4B). In our model, the infection probability fluctuates seasonally, following a cosine pattern, and thus the initial infection timing relative to the seasonal cycle is crucial. Introducing the disease during a low-risk season (i.e., when the value of the first case timing parameter falls between 0.25 and 0.75) can lead to prolonged epidemics with lower *imax* (Figure 3C, Figure S4C). Since the infection probability depends on the interaction between the average infectivity, the seasonality strength, and the first case timing, it is encouraging that sensitivity analyses of the GP surrogate models effectively uncovered these pairwise interactions (Figure 3B, Figure S3–S4B).

We also observed that family cluster size has only a minor effect on the outcomes, which is mainly driven by interactions with the social structure parameter (Figure 3B, Figure S3–4B). Since individuals return home each day, they are more likely to interact with others within their home location and family cluster (as long as the social structure parameter is > 0). Family cluster size determines how many individuals, on average, live within each family cluster, and in combination with social structure, it influences the extent of interaction within those family clusters, thus having some effect on the investigated model outputs.

As we pointed out earlier, the results of our sensitivity analysis depend on the variance present in the system. When some parameters explain a disproportionately large amount of variance in the model output across the entire input domain, the contribution of other parameters can be hard to detect. To address this, one can fix the former parameters at specific values and then conduct a “conditional” sensitivity analysis that only varies the latter parameters to assess their relative effects in this particular subdomain of the parameter space.

We used this approach to analyze the effect of seasonality strength and first case timing on outbreak probability across various average infectivity and average mobility scenarios, since these parameters were previously identified as crucial drivers (Figures 3, S3–4). Interestingly, we observed a sharp increase in the first-order indices for the first case timing when the average infectivity was fixed at a low value, followed by a gradual decline as the average infectivity increased (Figure 4A). This pattern corresponds to a shift in the system: moving from a state where epidemic outbreaks are rare (Figure 4B: 1st panel) to one where outbreaks occur in the majority of simulated scenarios (Figure 4B: 4th panel). Under conditions that are generally unfavorable for disease transmission, outbreaks are rare and occur only when all parameters align to support an outbreak. Consequently, the first-order sensitivity indices for the first case timing parameter tend to be low, because this metric reflects only the independent effects. With increasing average infectivity, the system enters a state where the outbreak probability is primarily driven by the first case timing, creating a hit-or-miss dynamic (Figure 4B: 3rd panel). Here, the strength of seasonality plays a minimal role; the key factor determining whether an outbreak occurs is whether the first case is introduced when infection probabilities are above or below the average infectivity. As average infectivity continues to increase, we observe a U-shape on the outbreak probability heatmap, especially when first case timing coincides with conditions unfavorable for disease transmission (Figure 4B: 4th panel). This reflects how, at this stage, seasonality strength again becomes a critical contributor to outbreak probability. As the average infectivity increases further, the proportion of unlikely outbreak scenarios shrinks, causing the sensitivity index of the first case timing parameter to gradually decrease again. We confirmed that these patterns are not artifacts of the GP model by validating them with the IBM, which showed the same trend (Figure 4C).

![Figure 4.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/11/29/2024.11.28.24318136/F4.medium.gif)

[Figure 4.](http://medrxiv.org/content/early/2024/11/29/2024.11.28.24318136/F4)

Figure 4.
Summary of model outcomes related to outbreak probability. (A) First-order sensitivity index estimates for the first case timing parameter across varying average infectivity and average mobility values. A total of 294,912 points were evaluated for the sensitivity analysis per parameter combination. All other parameters vary across their full ranges (Table 1). The first-order effect measures the influence of a single parameter on the model output (outbreak probability). Yellow stars mark parameter combinations associated with specific model outcomes shown in (B). (B) Predicted outbreak probabilities using the Gaussian Process surrogate model with varying seasonality strength and first case timing values. Panels represent different average infectivities. All other parameters are fixed at default values (Table 1), except for average mobility, which is set to 1.5. (C) Outbreak probabilities inferred from the individual-based model, with varying seasonality strength and first case timing values. Panels represent different average infectivities. As in (B), the remaining parameters are fixed at default values (Table 1), except for average mobility which was set to 1.5. (B) and (C) thus represent model outcomes for the same model parameters, but conducted with the Gaussian Process surrogate model (B) versus the original individual-based model (C), allowing a direct comparison between the two.

### Application to empirical dengue incidence data

To parameterize our model and corroborate its findings with real-world data, we analyzed over a decade of weekly dengue incidence data from Colombia34, along with municipality-level processed demographic and environmental data from Siraj *et al.* (2018)35. We retrieved weekly dengue incidence data for Colombia from the OpenDengue database34, covering January 1st, 2007, to December 31st, 2019, resulting in 163,279 entries. We selected this cutoff date to avoid confounding effects from the COVID-19 pandemic. To account for potential under-reporting and asymptomatic cases, we adjusted reported dengue incidences by a correction factor of 2543,44. We focused on 211 municipalities with populations of at least 30,000 individuals and dengue maximum incidence rates of at least 0.1%. We defined outbreaks as periods of at least four consecutive weeks exceeding the median dengue incidence rate, resulting in the identification of 1,211 epidemic outbreaks.

On this data set, we first tested our model’s prediction of a potential inverse relationship between the average infectivity and average human mobility (Figure S3C). Because we lacked direct fine-scale data, we used mosquito abundance probabilities35,45 as a proxy for average infectivity and the inverse of mean travel time as a proxy for average human mobility35,46. Mean travel time reflects the average time it takes to reach a settlement with at least 50,000 people (the settlement might not necessarily be within the municipality itself). Shorter travel times suggest more urbanized areas, potentially resulting in greater human mobility. If this relationship holds, we would expect real epidemics to exhibit a positive correlation between mean travel time and mosquito abundance probabilities. However, no significant positive correlation was found, even when restricting the analysis to municipalities with a travel time at the 85th percentile or above (i.e., very rural areas, Spearman’s rank correlation test: *ρ* = 0.1, *S* = 917,546, *p* = 0.085).

We then tested our model’s prediction that the timing of the first infectious case should play a crucial role in shaping outbreak dynamics whenever seasonality is important. However, empirical data only allows us to observe outbreaks that have been measurable in the population. We do not know when or how the first infectious case was introduced, nor do we have information about instances where an infectious case was introduced but did not result in an epidemic. When testing across all observed outbreaks, binned by week, we find that the distribution of epidemics is not uniform throughout the year (Chi-squared test: *χ*2 = 117.85, df = 52, *p* < 0.001), which suggests a seasonal effect, and matches our expectations of dengue outbreak dynamics43,47.

Next, we used the GP predicting *imax* to systematically explore how different parameter combinations in our IBM can effectively recapitulate the empirical dengue incidence data from Colombia. To account for heterogeneity in dengue risk across regions, we incorporated municipality-specific average infectivities while keeping other parameters constant. We analyzed municipalities with at least three epidemic outbreaks, resulting in a dataset of 1,186 epidemics across 173 municipalities. For each municipality, we used 67% of the data for calibration, and withheld 33% for testing. Calibration was done across all municipalities simultaneously. Withholding a portion of the data allowed us to assess the predictive performance of the calibrated GP in estimating *imax* for real-world epidemics.

The best-fitting parameter combination, which minimized the RMSE on the calibration data, revealed an unexpectedly high degree of social structure in our IBM (seasonality strength = 0.16; first case timing = 0.58; infectious period = 4.67; average mobility = 4.39; mobility skewness = 0.47; social structure = 0.99; family cluster size = 4.54; scaling factor = 0.03). However, across the top 0.1 % (250 out of 25,000) parameter combinations with the lowest RMSE, we observed a wide range of values (Table S1), often spanning the entire parameter range. This suggests that in our IBM, multiple distinct parameter combinations can generate similar epidemic patterns. We also observed variation in average infectivity estimates across municipalities, though certain municipalities consistently exhibited higher average infectivity estimates (Figure 5A). Notably, these municipalities had a significantly higher Gross Cell Product (a measure of economic activity66), compared to the remaining municipalities, suggesting a possible link between economic activity and dengue incidence (Figure 5B; *Wilcoxon rank sum test: W* = 1,037, *p* = 0.021). Municipalities with higher average infectivity estimates might merit targeted, in-depth studies to inform public health interventions.

![Figure 5.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/11/29/2024.11.28.24318136/F5.medium.gif)

[Figure 5.](http://medrxiv.org/content/early/2024/11/29/2024.11.28.24318136/F5)

Figure 5.
(A) Distribution of municipality-specific average infectivity estimates for the 250 parameter combinations with the lowest root mean square errors. The top 5% of municipalities as sorted by median average infectivity estimates are highlighted in yellow. (B) Average Gross Cell Product (GCP) — a measure of economic activity66 where higher values represent greater economic activity — distributions, as reported by Siraj *et al.* (2018), for the municipalities depicted in (A), with the municipalities with the largest average infectivity estimates grouped separately.

Finally, we evaluated the predictive power of the GP using the best-fit parameters to predict *imax* of epidemics across municipalities for the withheld test data. While the model achieved an RMSE of 0.006, the normalized RMSE (RMSE, scaled by the mean of the observed data) was 1.02, indicating that the model struggled to capture the full complexity of the system and that the model’s predictions were not highly accurate (Figure S5A). The rank correlation coefficient between observed and predicted values was 0.458, with permutation tests placing the model in the top percentile, confirming it has some predictive power. However, these results highlight the limitations of our approach when applied to large-scale, heterogeneous epidemic data.

## Discussion

In this paper, we demonstrated the potential of statistical emulation for studying the dynamics of epidemiological IBMs. Specifically, we implemented a dengue-inspired individual-based disease transmission model in C++ and trained Gaussian Process (GP) emulators of that model on three key metrics: outbreak probability, maximum incidence (*imax*), and outbreak duration. Due to their fast prediction speed, these GPs facilitated highly efficient exploration of the model’s eight-dimensional parameter space, allowing us to conduct comprehensive sensitivity analyses that would otherwise have been computationally prohibitive. Our results show that average infectivity and average mobility have large first-order effects and influence all three epidemiological metrics. The most important pairwise parameter interaction varies by model outcome: the interaction between average infectivity and the average human mobility primarily influences outbreak probability, whereas the timing of the first infectious case, combined with seasonality strength, can shape both *imax* and the duration of epidemics. Although our IBM and the trained GPs were not able to capture the full heterogeneity of complex empirical epidemiological data, they still proved able to identify regions in Colombia that might merit targeted public health surveillance and intervention efforts as potential dengue hotspots.

Our IBM’s design is fairly flexible, allowing us to explore a wide range of scenarios, but it nevertheless makes several simplifying assumptions that could limit its accuracy. For example, we modeled an initially completely susceptible population in our simulations, neglecting any preexisting immunities at the onset of the epidemic. This ignores the fact that dengue is caused by four distinct viral serotypes (DENV–1 to DENV–4), and while infection with one strain provides long-lasting immunity against that specific strain, immunity to other strains lasts only a short time48. Moreover, a second infection with a different serotype can trigger antibody-dependent enhancement, significantly increasing the risk of severe (and symptomatic) dengue48. In hyperendemic countries such as Colombia43, where multiple dengue virus serotypes are simultaneously circulating within the population, this can cause complex immunity dynamics. Unfortunately, strain-specific sequencing data and antibody measurements that could be used to accurately estimate the proportion of immune individuals are scarce43.

Furthermore, while our IBM incorporates key aspects of dengue epidemiology, such as infectious duration33 and the role of human movement36,40, we chose not to explicitly model mosquito vectors, which could be important in some scenarios5. Combining host models with detailed vector models that account for factors such as habitat availability and selection pressures across mosquito life stages could significantly enhance the realism of epidemiological simulations2,49, albeit at a substantial cost in model complexity, number of parameters, and simulation runtime.

Another simplifying assumption in our IBM is in the human mobility model. While the “family cluster size” and “social structure” parameters allow us to model populations with varying levels of social interconnectivity, locations are not spatially explicit, meaning that the distance between them is not defined. Thus, the likelihood of a person visiting a location is solely determined by parameters affecting social population structure and human mobility. Real-world human movement patterns, on the other hand, are known to exhibit strong spatial regularity8,50,51. Moreover, in reality, human populations are rarely closed systems like the one we modeled here. Migration and a variety of factors — economic shifts, environmental changes, large public events — often lead to interactions beyond regular social circles, increasing the risk of disease introduction into areas that were previously unaffected52.

While we decided to train our GPs on outbreak probability, *imax*, and duration, a GP could instead be trained on other outputs from the IBM. For example, a GP could be trained on the total epidemic size or the time to the epidemic peak, if relevant to addressing the research question at hand. It would also be possible to use a so-called multi-task GP53, which allows the simultaneous prediction of multiple outputs, and is capable of capturing correlations between them. This could improve the efficiency of the training process, especially when the outputs are highly correlated, because multi-task GPs can leverage shared information between the prediction tasks to enhance accuracy and reduce computational costs. Our choice of separate GPs was guided by two factors. First, the outbreak probability GP was trained on the proportion of simulation runs with observed outbreaks, while the *imax* and duration GPs were exclusively trained on simulations with observed outbreaks, which made choosing a consistent set of training points across all three metrics challenging. Second, we had no clear expectations regarding the correlation between *imax* and outbreak duration: simulations with shorter durations might result from severe epidemics where most individuals are infected rapidly (high *imax*), or from scenarios where the disease quickly dies out (low *imax*). These complex dynamics made separate GPs a simpler, more practical choice.

A key factor in implementing a GP is the choice of an appropriate kernel18,42. We used the Matérn kernel because of its flexibility in modeling different levels of smoothness in the data. For this kernel, we chose a smoothness parameter *v* = 0.5, which can be beneficial for capturing model behavior in which small changes in parameters can result in abrupt changes in model outputs, as seen here and in a previous study16. Preliminary testing, as well as our trained GPs, showed satisfactory performance with the Matérn kernel, so we did not pursue alternative kernels. Whether the accuracy of our GPs could be improved even further with more customized or composite kernels tailored to specific features of the data remains to be explored.

One key advantage of GPs is their Bayesian nature, which allows for uncertainty quantification. This property is particularly useful in active learning, wherein the uncertainty measurements can be leveraged to choose the most informative points to add to the training data. During GP training, we selected half of the new points based on the confidence interval widths, while the other half was selected using the product of the confidence interval widths and a function of the predicted mean. Specifically, we weighted the confidence interval widths based on how close the predicted mean is to its most extreme possible values, assigning the highest weights to intermediate predictions. This approach encourages the GPs to move away from the edges of the parameter space, where uncertainties are naturally higher and predicted means often become extreme. These extremes occur either due to expected model behavior at the parameter boundaries (extreme parameter values cause extreme model behavior), or because data is sparse in these regions, causing the GP to revert to its prior (a constant mean of 0 in our case, which is an extreme value relative to the average predicted value).18. However, this approach might overlook regions that the GP does not determine to be highly uncertain but which could provide valuable information if explored. Alternative sampling strategies, such as expected improvement, could help identify points that boost model performance, even if their initial uncertainty is lower. Moreover, tools like BoTorch54 provide libraries to implement advanced batch optimization techniques, allowing the selection of sets of data points that are chosen together to maximize their combined impact on improving GP performance. While more advanced techniques like expected improvement scores and batch optimization could potentially enhance GP performance, they would require further model tuning and validation, which is beyond the scope of this study.

The fast prediction speed of the trained GPs allowed us to conduct comprehensive variance-based sensitivity analyses. However, this approach could be confounded by potential discrepancies between the GP surrogate model and the original IBM. While the GPs generally predicted epidemiological metrics inferred from the IBM well, the width of the sensitivity analysis confidence intervals should be interpreted cautiously. Furthermore, average infectivity and average mobility emerged as the dominant contributors to variance in the epidemiological metrics, making it harder to detect the influence of the other parameters. This scaling effect can obscure smaller, but still relevant, factors. To address this, we also performed sensitivity analyses in targeted regions of the parameter space for which average infectivity and average mobility were fixed, revealing state changes within the model’s dynamics — sudden transitions from rare epidemic outbreaks to frequent outbreaks — which were confirmed by simulating selected points directly with the IBM. However, it is important to note that while a sensitivity analysis captures the variance in model outputs due to parameter changes, it does not fully capture the underlying dynamics of the model, such as state transitions or the mechanistic interactions between single parameters that drive these changes. Specifically, the sensitivity analysis highlights which parameters contribute most to the output variance, but it does not reveal why certain parameter combinations lead to changes in the model behavior.

We observed the largest second-order effects between average infectivity and average mobility for the outbreak probability metric, and between seasonality strength and first case timing for *imax* and duration. However, our analysis of over a decade of dengue incidence data from Colombia did not yield the correlation that we expected between our proxies for average human mobility and average infectivity in actual outbreaks. This aligns with previous findings that mean travel time is a broad measure of accessibility rather than a precise indicator of actual human mobility35. Furthermore, we used mosquito abundance probabilities35,45 as proxies for the average infectivity, implicitly assuming a constant mosquito biting rate such that a higher abundance or likelihood of mosquitoes corresponds to an increased disease transmission risk. However, in reality, this relationship is more complex. Mosquito abundances can be significantly influenced by human alterations to the environment, such as vector control measures or urbanization55. Additionally, infection risk depends on other factors, such as human-mosquito contact rates56 and the behavior of individual mosquitoes57. While our analysis suggests — in line with previous studies43,47 — a seasonal distribution of dengue epidemics, we assumed that each epidemic outbreak is independent. This means that we did not account for potential correlations between outbreaks occurring within the same municipality over time, or spatial correlations across neighboring municipalities. As a result, the actual *p*-value of our test should be interpreted with caution.

When comparing the predictions of our model with real-world outbreaks, the only empirical parameter that actually varied within each municipality was the timing of the epidemic’s onset. This limited the GP’s flexibility to generate diverse predictions. In fact, a simple linear mixed-effects model that predicts log10-transformed *imax* values based on the onset timing of an epidemic, while accounting for the municipality-level variations with random effects, performed similarly to the GP model on withheld test data (Spearman’s *ρ* = 0.54). This suggests that both the abstract IBM and the GP emulators might be too generalized to effectively predict real-world outbreak data across multiple municipalities. To achieve more accurate predictions, the IBM would need to be more complex, incorporating municipality-specific characteristics such as outbreak histories, population immunity levels, and finer-scale human movement patterns, which might be critical for capturing the nuanced dynamics of local outbreaks.

Despite the GP struggling to predict complex empirical data, we observed that a subset of municipalities consistently had higher average infectivity estimates than the other municipalities. Several of these outlier municipalities are notable for their economic or geographic importance. For example, Puerto López, which had the highest average infectivity estimate, is a key river port, while Leticia, another outlier, is located at the intersection of the borders of Colombia, Brazil, and Peru, serving as a major port on the Amazon River. Tourist destinations like Melgar and La Mesa also exhibited elevated average infectivity estimates, possibly because of increased connectivity due to tourism and travel, which might contribute to higher dengue transmission rates in these areas. This observation supports the idea that human movement and economic activity could play a significant role in shaping dengue dynamics in these municipalities58. Furthermore, the higher economic activity (Gross Cell Product) in these regions might indicate that a larger proportion of the population has access to healthcare, which could affect our model’s assumptions. Specifically, better healthcare access could lead to higher detection and reporting of dengue cases, thereby violating our assumption of constant reporting rates across municipalities.

In conclusion, we explored the utility of statistical emulation to efficiently analyze epidemiological IBMs. The use of GPs allowed valuable insights into the key drivers of our simulated disease dynamics, revealing critical interactions between average infectivity, human mobility, and seasonality. When analyzing empirical dengue outbreaks, our calibrated GP highlighted municipalities that could serve as candidates for targeted interventions or in-depth studies. Our work demonstrates both the potential and the challenges of using statistical emulation to explore complex epidemiological systems, providing a foundation for future efforts that could incorporate additional model complexity and realism while maintaining computational efficiency.

## Methods

### Individual-based model

This detailed individual-based model (IBM) description is motivated by previous studies using the ODD protocol for describing IBMs1,5,59. The IBM was implemented in C++.

#### 1. Purpose and patterns

The purpose of our IBM is to explore how human movement, social population structure, and seasonal variation in infection probability influence infectious disease dynamics. Rather than replicating a specific empirical system, the model is intended to highlight the relative importance of these parameters and their interactions in shaping disease spread. Its performance is evaluated by examining how the absolute number of infectious cases and the change in infectious cases over time are affected by IBM parameter changes. The model is abstract, but is designed to loosely replicate dengue transmission dynamics.

#### 2. Entities, state variables, and scales

The model operates in daily timesteps and includes two primary entities: humans and locations. Each location represents a residential home with a group of residents, who are collectively considered a ‘family’. There are no distinct non-home locations such as workplaces, schools, or other public spaces. The number of humans in a given family — the number of humans residing at a given home location — is drawn from a negative binomial distribution with *μ* = 6.2 *and θ* = 9.07, following the household size distribution observed in Iquitos, Peru36,37.

Each individual human has the following state variables: a home location, an infection status (susceptible, exposed, infected, or recovered), the number of remaining days in their current infection status (how much longer they will stay in their current state), and the number of elapsed days in their current infection status (how long they have already been in that state).

The basic spatial unit of the model is the location. Simulations are initialized by generating 10,000 locations. Each location has the following state variables: the number of infectious humans visiting it in the current timestep of the model, its per-contact infection probability (the probability that an infectious individual that visited the location at time *t-1* transmits the disease to a susceptible individual that visits the same location at time *t*), a history of the number of infectious visitors it has had, and a history of its infection probability over time. All locations are randomly grouped into family clusters of a user-specified size. Each location is assigned to only one family cluster, ensuring that no locations are shared between clusters. All family members of a single home location belong to the same family cluster, but not all members of a given family cluster reside in the same home location. Figure S1 provides a conceptual overview of family clusters (and human movement) as implemented in our IBM Conceptually, then, a family cluster is a group of locations, the members of which tend to socialize together; an individual in a given family cluster is more likely to visit a location inside the family cluster than to visit a location outside it. This grouping introduces social structure into the simulation (please refer to the next section for further details). Note that “family clusters” are conceptually equivalent to the “social groups” defined by Reiner *et al.* (2014)37.

#### 3. Process overview and scheduling

The two core processes of the model are human movement and infection dynamics.

##### Human movement

For the process of human movement, during each timestep the model iterates through all individuals to determine which locations they will visit. Each individual visits its home location at least once every timestep. For each individual, an additional number of visits by the individual is drawn from a negative binomial distribution at the beginning of each day. The use of a negative binomial distribution allows for heavy-tailed human mobility distributions where some individuals are highly mobile, visiting a large number of locations per day. However, there is no correlation between the various sampled values for a given individual over time, meaning that we did not model systematic super-spreaders in our IBM. Given that a large fraction of empirical dengue infections is asymptomatic60, the infection status does not affect human movement in our IBM: infectious individuals visit, on average, the same number of locations as susceptible or recovered individuals.

For each visit, whether the visit is to a location inside the individual’s family cluster or to a location outside that cluster is determined probabilistically, with the social structure parameter being the probability that a particular visit happens within the family cluster (Table 1). Locations to be visited are then randomly selected from the set of locations inside or outside the individual’s family cluster, as appropriate. Multiple visits to the same location are allowed. Figure S1 provides a conceptual overview of human movement patterns as implemented in our IBM.

##### Infection dynamics

In our model, each human can have one of four infectious states: susceptible, exposed, infectious, and recovered. The model tracks the number of days each simulated human will remain in its current infection state, decreasing this count at the end of each day.

At the start of the simulation, all humans are susceptible. The disease is then introduced into the population by randomly selecting one individual and immediately changing their infection status from susceptible to exposed. This exposed status indicates that the individual has contracted the disease but is not yet infectious. Unless specified otherwise, exposed individuals become infectious at the end of each day, effectively reducing the model to a Susceptible-Infectious-Recovered model, where the length of the infectious period is specified by the user. At the end of an individual’s infectious period, the individual’s infection status changes to recovered the next day. The model assumes lasting immunity, so once individuals recover, they cannot be reinfected.

If a susceptible human visits a location that had *N* infectious visitors the day before, the probability of contracting the disease and immediately entering the exposed state is 1 − (1 − *p**infection*)*N* where *p**infection* is the infection probability per contact. (This is simply the probability that a binomial draw B(*N*, *p**infection*(*t*)) ≥ 1, indicating that infection occurred from at least one previous visitor.) The infection probability *p**infection* follows a cosine function that is determined by three parameters: the average infectivity (α), the seasonality strength (α*season*), and the first case timing (*t**first*), and is calculated as follows: ![Formula][2]</img> Each location is assigned the same infection probability for a given day, as determined by the cosine function above. Variations in the overall likelihood of infection, from location to location, arise from the differing numbers of infectious individuals visiting each location.

If a visiting human is already infectious, the model increments the count of infectious visits for the current day at each location visited by that human, which will make those locations infectious in the following timestep as just described.

Since humans only change their infection status at the end of each day, and the likelihood of infection for susceptible individuals is determined by the number of infectious visitors from the previous day, the order in which individuals are processed is inconsequential. This ensures that the model remains asynchronous and order-independent during each day. Indeed, this design would allow the model to be parallelized to run across multiple processing cores, although runtimes were fast enough that we did not deem that necessary.

#### 4. Design concepts

##### Basic principles

The model aims to study abstract disease dynamics within human populations exhibiting varying levels of social structure. It does not focus on the realistic modeling of a specific city, or on the biological details of dengue transmission dynamics.

##### Emergence

The number of infectious individuals each day — the central output of this model, for our purposes — is an emergent property, not predefined within the model. Stochasticity plays a major role in introducing uncertainty into these patterns.

##### Adaptation, objectives, learning, prediction, and sensing

None of the individuals in the model have the ability to adjust their behaviors. There are no adaptive behaviors, learning abilities, predictive capabilities, or sensing capabilities in the model.

##### Interaction

Humans interact by potentially infecting other humans who visit the same location the next day.

##### Stochasticity

This IBM incorporates stochasticity in the family sizes, the daily number of visits per human, and the probabilistic infection dynamics. For detailed descriptions, please refer to Sections 2 and 3.

##### Collectives

Each human is assigned a home location and a family cluster, making them members of a family and a collection of locations. However, no special properties are attributed to sharing a home location or family cluster, except for the general tendency to interact more frequently due to the human movement rules described in Section 3.

##### Observations

The model outputs a table of the counts of susceptible, exposed, infectious, and recovered individuals for each day of the simulation, across the entire population.

#### 5. Initialization

Please refer to Sections 2 and 3 regarding the initialization of the model.

#### 6. Input

The model has eight parameters that can be specified by the user with command-line arguments (Table 1).

Out of these eight parameters, three parameters collectively influence the infection probability, as described by the equation in Section 3: the average infectivity, the seasonality strength, and the first case timing. Additionally, the user must define:

* - The length of the infectious period, after which infectious individuals transition to being recovered (Section 3: *infection dynamics*).

* - Parameters for the negative binomial distribution describing human movement (Section 3: *human movement*).

* - The proportion of visits that occur within the family cluster of an individual (Section 3: *human movement*).

* - The number of locations per family cluster (Section 2).

### Gaussian Processes

Statistical emulation involves replacing an individual-based simulation framework with a statistical surrogate model, or machine learning model trained on input-output pairs from the original framework12. For this task, we used GPs, which are non-parametric models that define a distribution over functions. This allows GPs to efficiently interpolate between scarcely sampled observations. GPs are an attractive choice, not only due to their mathematical tractability, but also because their Bayesian nature allows for the quantification of model uncertainty. We incorporated these uncertainty estimates into our policy for scoring potential additional data points (Figure 1A). We implemented GPs in Python (v3.10.6) using the GPyTorch library (v1.11)19 for efficient GP modeling, and the torch.cuda module from the PyTorch package (v2.0.1)61 for GPU acceleration with NVIDIA GPUs. The implementation was inspired by a GP surrogate model previously used to study the efficiency of gene drives in rat populations16.

We trained a separate GP model for each of the three key characteristics described in the previous section: outbreak probability (proportion of simulation runs in which more than 0.1% of the population becomes infected), *imax* (highest proportion of infectious individuals at any day) and epidemic duration (timespan from the first infectious case to the recovery of the last infectious individual). While the IBM outputs for outbreak probability and *imax* are bounded between 0 and 1, epidemic duration spans a much wider range: In the initial training dataset (N = 5,000 data points), the observed duration of epidemics ranged from 19.65 to 424.15 days. To manage the variance in epidemic duration and improve the GP’s ability to predict longer epidemics, we applied a logarithmic transformation to the outbreak duration.

The response variable for each GP is calculated from 100 simulation runs per parameter combination. While outbreak probability is defined as the proportion of simulation runs with observed outbreaks, *imax* and outbreak duration are exclusively calculated from simulations with epidemic outbreaks. This approach helps minimize unrepresentative input-output pairs. For example, if a specific parameter combination results in an *imax* of 10%, but only 50% of the simulation runs result in an outbreak, averaging across all simulations would result in an *imax* estimate of about 5%, not accurately representing the dynamics of the IBM.

The covariance function — or kernel — of a GP determines how much the response values of different input points covary42. Thus, the choice of kernel is crucial in shaping the GP’s predictions. We selected a Matérn kernel with *v* = 0.5, which corresponds to the exponential kernel. This kernel is capable of capturing abrupt changes in function values18. We applied the same kernel type across all three GPs.

#### Gaussian Process training loop

One key advantage of GPs over other machine learning methods is their Bayesian nature, which allows us to identify regions in the parameter space where the model’s predictions are highly uncertain. This allows us to strategically select new training data from areas where the GP is least confident, thereby enhancing prediction accuracy. This approach forms the basis of the GP’s active learning loop (Figure 1A), which consists of three steps: training the GP, scoring potential new training data, and then updating the training data with selected new data points18.

**Step 1: GP training** We trained each GP for 16 rounds: one initial round with a training dataset consisting of a Latin hypercube sample (LHS) of 5,000 data points from the entire input domain (Table 1), followed by 15 active training rounds (Figure 1A). For training, we utilized the Adam optimizer from PyTorch61 with a learning rate of 0.01. In each training round, the GP was trained for 30,000 iterations, with a model snapshot saved every 1,000 iterations. To avoid overfitting, we evaluated all 30 snapshots against a separate validation dataset consisting of 10,000 LHS points. We selected the snapshot with the lowest RMSE on the validation dataset for step 2 in the training loop.

**Step 2: Data scoring** In each active learning round, we scored 107 LHS points using two distinct policies16. These scores are used as probability weights to select 1,000 new data points to expand the training data. Policy 1 is based solely on model uncertainty. In this policy, the probability *pi* that a data point *i* is selected is proportional to the width of the 95% confidence interval for that point (*wi*), normalized by the total width of all potential data points: ![Formula][3]</img> Policy 1 assigns larger weights to data points with greater uncertainties. However, regions with large uncertainties are often clustered near the edges of the observed parameter space, where the GP must extrapolate far beyond observed training data42. However, while the uncertainty bounds of these points might be relatively high, the degree of improvement the GP can gain from sampling points from the edges of the parameter space can be limited. To avoid oversampling these areas, we developed policy 2. Policy 2 reduces the likelihood of sampling points with extreme predicted values. Specifically, the 95% confidence intervals from policy 1 are further weighted by the GP’s prediction. The probability *pi* of selecting a data point *i* is given by: ![Formula][4]</img> where:

* - *wi* is the 95% confidence interval width for point *i*

* - *mi* is the GP’s predicted value for point *i*

* - *mmax* is the maximum predicted value (3 for duration, 1 otherwise)

* - *n* is the total number of potential data points

This formulation ensures that points with high uncertainty yet with predicted values near the midpoint of the range are assigned the highest weights. The GP’s predictions were clipped to the range [0, 1] for outbreak probability and *imax*, and to [0, 3] for epidemic duration (since the GP predicts log10-transformed durations, this range corresponds to durations between 1 and 1,000 days).

For our adaptive sampling strategy, 50% of the points are selected using policy 1, and the remaining 50% are selected using policy 2.

**Step 3: Update training data** As mentioned earlier, the data points for *imax* and duration are based solely on simulation runs where epidemic outbreaks occurred. If a selected data point did not result in 100 outbreaks after 2,000 simulation attempts, we chose a new data point. For the initial training dataset, where no GP predictions were available, this selection was done randomly from 107 LHS samples. In the active learning rounds, we chose all of the new data points as described in step 2. After successfully simulating all selected points, the new results are added to the training dataset, and a new GP training cycle begins (Figure 1A).

#### Gaussian Process usage

We evaluated the accuracy of the trained GP using an independent test dataset of 10,000 LHS points and calculated the RMSE. For visualization purposes, such as in heatmaps (e.g., Figure 3C), we clipped the predictions to the range [0, 1] for epidemic probability and *imax*, and to [0, 3] for the log10-transformed epidemic duration.

### Sensitivity analysis

To explore how changes in parameters affect the epidemic metrics, we conducted a series of variance-based sensitivity analyses62 in Python using the Sobol method implemented in the SALib library (v1.4.7)63. We performed sensitivity analyses on the output of GP models rather than the IBM due to the GP’s faster prediction capabilities.

The Sobol method quantifies the contribution of single parameters and their interactions to the variance of a model’s output. Since these variance components are often not analytically tractable, the Sobol method approximates them using a Monte Carlo method. The resulting sensitivity indices — first, second, and total order — provide a measure of each parameter’s influence. First-order effects measure single parameter contribution, second-order effects measure the interactions of two parameters, and total order effects capture the combined impact of each parameter, including all interactions with other parameters of any order.

To perform the sensitivity analysis, the number of model evaluations is proportional to *n * (2d + 2)*, where *n* is the base sample size and *d* is the dimensionality of the parameter space (*d* = 8; Table 1)63. The accuracy of the Sobol indices improves with a larger *n*, leading to smaller confidence intervals. For the sensitivity analysis of the entire input domain, where all parameters vary across their full range (Table 1, Figure 3A–B), we selected *n* = 219. To investigate the first-order effect of the first case timing with the two most influential parameters (average infectivity and average mobility) held constant, we conducted a sensitivity analysis with *n* = 214 for each combination of these parameters (Figure 4A). We calculated 95% confidence intervals of the sobol indices using the bootstrapping method provided by SALib63.

### Empirical data

We retrieved weekly dengue incidence data at the municipality level for Colombia from the OpenDengue database, an open-access platform that provides detailed epidemiological data on dengue34. The selected dataset spans from January 1st, 2007, when weekly resolution data became consistently available, to December 31st, 2019, comprising 163,279 entries. We chose to end in 2019 to avoid the potential confounding effects of the COVID-19 pandemic64. We chose Colombia for this study because it is one of the countries most affected by dengue in the Americas43 and offers exceptionally well-documented time series data on dengue incidence34. To adjust dengue incidences for potential under-reporting and asymptomatic cases, reported dengue incidences were corrected by a factor of 2543,44. Estimates for under-reporting factors in dengue typically range from 10 to 27, depending on the region and study44. While we recognize that using a correction factor of 25 for Colombia, which has well-documented dengue incidence records34, might be cautious, we chose it to include a broad range of municipalities.

We obtained municipality-level processed data from Siraj *et al.* (2018), which provides a global, high-resolution dataset of potential environmental drivers for Zika transmission in Colombia between January 1st, 2014 and October 1st, 2016. Although the data published by Siraj *et al.* (2018) focused on Zika, it is relevant to dengue because both viruses share a primary vector, *Ae. aegypti*, which is responsible for the majority of dengue transmission in Colombia65. Specifically, we used four metrics from Siraj *et. al* (2018) (i) the population count (ii) the weekly occurrence probabilities of *Ae. aegypti* 45(iii) the Gross Cell Product, which measures economic activity at a fine spatial scale66, and (iv) the mean travel time to the nearest city. Please refer to Table 1 in Siraj *et al.* (2018) for additional information on the municipality-specific data.

We matched the records from Siraj *et al.* (2018) and Clarke *et al.* (2024) based on the names of municipalities and their respective departments. To improve the matching, we standardized the municipality and department names by converting them to lowercase and applying a latin-ascii transformation to remove any accents or special characters. In cases where mismatches occurred, either at the municipality or departmental level, we followed a similar approach to Clarke *et al.* (2024), manually reviewing the records and checking the geographic boundaries using shapefiles. While we were able to obtain the original shapefiles from Clarke *et al.* (2024) (Oliver Brady, personal communication), the original shapefiles for Siraj *et al.* (2018) were not accessible at the time of our study. As a substitute, we used shapefiles from the OCHA database67. Despite this limitation, we successfully matched 95% (1,009 out of 1,063) of the municipalities present in the raw dengue incidence data. Our goal was not to achieve a perfect match, but rather to secure a sufficient number of high-quality matches to proceed with our analysis.

We focused on 211 municipalities that aligned with our IBM in terms of population size (i.e., at least 30,000 individuals) and had a maximum dengue incidence rate of at least 0.1% over the entire study period. To detect epidemics, we fitted a smoothing spline using the ss() function from the npreg R-package (λ =10-10)68. An epidemic outbreak was defined as a period of at least four consecutive weeks in which the spline function exceeded the median dengue incidence rate. Using this method, we identified 1,211 potential epidemic outbreaks with an *imax* of at least 0.1% which were included in the analysis. On average, each municipality had 6.34 outbreaks. The average duration per outbreak was 195 days, with an average *imax* of 0.6%.

#### Parameter exploration with Gaussian Processes

The speed of the GP predictions allows for an efficient exploration of which parameter combinations in our epidemiological IBM provide a best fit to the dengue incidence data from Colombia. This also allows us to assess whether our GPs — and consequently, our IBM — can provide insights into real-world epidemic outbreaks. To capture the heterogeneity in dengue transmission potential, we incorporated municipality-specific average infectivities while assuming that other model parameters remained constant across all municipalities. We focused on municipalities with at least three outbreaks, resulting in a total of 1,186 epidemics across 173 municipalities. For each municipality, we randomly split the data into 67% for parameter fitting (N = 737) and 33% for testing (N = 449), using the test data to evaluate the GP’s predictive performance for *imax*.

We generated 25,000 LHS parameter combinations from the full parameter space (Table 1), excluding the average infectivity parameter. Additionally, we introduced a scaling parameter (ranging from 0 to 0.1) to adjust for discrepancies between the simulated and observed dengue incidence rates, which might arise due to the abstraction in our model as compared to real-world data. This scaling factor applies uniformly to all incidence values, while preserving the relative differences between single outbreaks. For each of the 25,000 LHS samples, we tested 50 evenly spaced average infectivities over the range [0, 0.03], resulting in 1.25 million predicted data points for the 737 epidemics used for calibration. For each epidemic, we accounted for the epidemic’s start time by adjusting the first case timing parameter, while using municipality-specific infection probabilities. The predictions were constrained to the range [0, 1] before calculating the RMSE between observed and predicted *imax*. We then selected the average infectivity that minimized the RMSE for each of the 25,000 LHS in each municipality. The 25,000 parameter combinations were ranked by summing the RMSE across all municipalities, using the best-fit (i.e., lowest RMSE) infection probabilities for each. To further investigate infection probabilities across municipalities, we examined the 250 LHS combinations with the lowest RMSE sums.

Finally, we evaluated the GP’s predictive performance on the withheld test data by calculating both the RMSE and Spearman’s rank correlation coefficient (*ρ*). To test the significance of Spearman’s *ρ*, we conducted 1,000 permutation tests, where the start time and municipality for each epidemic in the test set were randomly shuffled.

### Statistical analysis

Unless stated otherwise, statistical analyses were performed using the R statistical computing environment (v4.2.1)69. Significance is declared at an alpha cut-off of 5%.

## Data Availability

The source code of the individual-based disease transmission model implemented in C++ is available on GitHub at [https://github.com/AnnaMariaL/DengueSim](https://github.com/AnnaMariaL/DengueSim). Simulated data, pre-trained GPs, and Jupyter notebooks demonstrating the usage of the GPs are also available on GitHub at [https://github.com/AnnaMariaL/DengueSim-GP](https://github.com/AnnaMariaL/DengueSim-GP).

[https://opendengue.org/](https://opendengue.org/)

## Code availability

## Supplementary information

### Supplementary figures

![Figure S1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/11/29/2024.11.28.24318136/F6.medium.gif)

[Figure S1.](http://medrxiv.org/content/early/2024/11/29/2024.11.28.24318136/F6)

Figure S1.
Schematic overview of human movement in the individual-based model. Each colored frame represents a unique, non-overlapping family cluster, with each cluster containing multiple family homes. Individuals can make visits within their own family cluster (solid arrows) or to other clusters (dashed arrows). The likelihood of visits occurring inside the family cluster is determined by the social structure parameter (Table 1). Each individual visits their home at least once per day and moves independently of others in the same family (individuals A and B). Multiple visits to the same location are allowed (individual D). Visits to other family clusters occur randomly and are not restricted to any specific cluster (individual C).

![Figure S2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/11/29/2024.11.28.24318136/F7.medium.gif)

[Figure S2.](http://medrxiv.org/content/early/2024/11/29/2024.11.28.24318136/F7)

Figure S2.
The Root Mean Square Error (RMSE) between the Gaussian Process predictions and the individual-based model results in the validation dataset (N = 10,000 data points). The RMSE decreased as the size of the dataset used to train the Gaussian Processes increased (x-axis). The RMSE between the predictions of the final GP model and the test data (N = 10,000 data points) is indicated by a yellow square. (A) outbreak probability (B) maximum incidence (*imax*), (C) log10-transformed duration.

![Figure S3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/11/29/2024.11.28.24318136/F8.medium.gif)

[Figure S3.](http://medrxiv.org/content/early/2024/11/29/2024.11.28.24318136/F8)

Figure S3.
Sobol sensitivity analysis, outbreak probability. (A) First-order and total effects across the entire input domain (Table 1). The first-order effect describes the impact of a single parameter on the model output (outbreak probability), while the total effect accounts for all interactions involving one or more parameters. Error bars represent the 95% confidence intervals of the sensitivity index estimates. A total of 9,437,184 points were evaluated for the sensitivity analysis. (B) Second-order effects across the entire input domain (Table 1). A second-order effect captures the pairwise interaction between two variables. Sobol indices with a 95% confidence interval that does not overlap zero are highlighted with a pink border. The largest second-order effect is emphasized with a bold pink border. (C) Predicted outbreak probabilities with varying "average infectivity" and "average mobility" parameters (i.e., the two parameters with the largest second-order effect, see panel B). Other parameters were fixed at default values (Table 1).

![Figure S4.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/11/29/2024.11.28.24318136/F9.medium.gif)

[Figure S4.](http://medrxiv.org/content/early/2024/11/29/2024.11.28.24318136/F9)

Figure S4.
Sobol sensitivity analysis, log10-transformed duration. (A) First-order and total effects across the entire input domain (Table 1). The first-order effect describes the impact of a single parameter on the model output (log10(duration)), while the total effect accounts for all interactions involving one or more parameters. Error bars represent the 95% confidence intervals of the sensitivity index estimates. A total of 9,437,184 points were evaluated for the sensitivity analysis. (B) Second-order effects across the entire input domain (Table 1). A second-order effect captures the pairwise interaction between two variables. Sobol indices with a 95% confidence interval that does not overlap zero are highlighted with a pink border. The largest second-order effect is emphasized with a bold pink border. (C) log10(duration) predictions with varying "seasonality strength" and "first case timing" parameters (i.e., the two parameters with the largest second-order effect, see panel B). Other parameters were fixed at default values (Table 1).

![Figure S5.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/11/29/2024.11.28.24318136/F10.medium.gif)

[Figure S5.](http://medrxiv.org/content/early/2024/11/29/2024.11.28.24318136/F10)

Figure S5.
(A) Observed vs. predicted maximum incidence (*imax*) for empirical epidemic outbreaks (N = 449). The yellow line represents the identity line (*x = y*). (B) Distribution of Spearman correlation coefficients between observed and predicted *imax* from 1,000 permutations, where both the onset and municipality of the 449 epidemics were randomized. The actual observed correlation coefficient is shown as a vertical yellow line.

### Supplementary tables

View this table:
[Table S1.](http://medrxiv.org/content/early/2024/11/29/2024.11.28.24318136/T2)

Table S1. Summary statistics describing the 250 parameter combinations with the lowest root mean squared errors from the parameter exploration with the Gaussian Process. These combinations represent the best-fitting sets of parameters for matching observed and predicted dengue maximum incidences across municipalities. IQR = Interquartile Range.

## Acknowledgements

We thank all members of the Messer and Murdock lab for helpful discussions. Special thanks to Beliz Erdogmus for her contributions during the early phases of the project; Isabel Kim, Mitchell Lokey, and Meera Chotai for technical support; Amir Siraj for support with the municipality-specific environmental data; and Oliver Brady for providing supplemental shape files. This project has received funding from the European Union’s Horizon2020 research and innovation program under the Marie Sklodowska-Curie grant agreement No. 101025586. PWM was supported by the National Institutes of Health under award R35GM152242.

## Glossary

GP
: Gaussian Process
IBM
: Individual-based model
RMSE
: Root mean square error
LHS
: Latin hypercube sample
*imax*
: Maximum incidence

* Received November 28, 2024.
* Revision received November 28, 2024.
* Accepted November 29, 2024.

This pre-print is available under a Creative Commons License (Attribution 4.0 International), CC BY 4.0, as described at [http://creativecommons.org/licenses/by/4.0/](http://creativecommons.org/licenses/by/4.0/)

## References

1. 1.Grimm, V. et al. A standard protocol for describing individual-based and agent-based models. Ecol. Modell. 198, 115–126 (2006).

[CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ecolmodel.2006.04.023&link_type=DOI)

[Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000240823100008&link_type=ISI)

2. 2.Bershteyn, A. et al. Implementation and applications of EMOD, an individual-based multi-disease modeling platform. Pathog. Dis. 76, (2018).

3. 3.de Lima, T. F. M. et al. DengueME: A Tool for the Modeling and Simulation of Dengue Spatiotemporal Dynamics. Int. J. Environ. Res. Public Health 13, (2016).

4. 4.Hladish, T. J. et al. Designing effective control of dengue with combined interventions. Proc. Natl. Acad. Sci. U. S. A. 117, 3319–3325 (2020).

[Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czoxMDoiMTE3LzYvMzMxOSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDI0LzExLzI5LzIwMjQuMTEuMjguMjQzMTgxMzYuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9)

5. 5.Perkins, T. A. et al. An agent-based model of dengue virus transmission shows how uncertainty about breakthrough infections influences vaccination impact projections. PLoS Comput. Biol. 15, e1006710 (2019).

[CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pcbi.1006710&link_type=DOI)

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=30893294&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F29%2F2024.11.28.24318136.atom)

6. 6.Smith, N. R. et al. Agent-based models of malaria transmission: a systematic review. Malar. J. 17, 299 (2018).

[CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s12936-018-2442-y&link_type=DOI)

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=30119664&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F29%2F2024.11.28.24318136.atom)

7. 7.Xu, P., et al. e3SIM: epidemiological-ecological-evolutionary simulation framework for genomic epidemiology. bioRxiv (2024) doi:10.1101/2024.06.29.601123.

[Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiYmlvcnhpdiI7czo1OiJyZXNpZCI7czoxOToiMjAyNC4wNi4yOS42MDExMjN2MSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDI0LzExLzI5LzIwMjQuMTEuMjguMjQzMTgxMzYuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9)

8. 8.Perkins, T. A. et al. Theory and data for simulating fine-scale human movement in an urban environment. J. R. Soc. Interface 11, 20140642 (2014).

[CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1098/rsif.2014.0642&link_type=DOI)

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=25142528&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F29%2F2024.11.28.24318136.atom)

9. 9.Sobol′, I. M. Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates. Math. Comput. Simul. 55, 271–280 (2001).

[CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0378-4754(00)00270-6&link_type=DOI)

[Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000167385700029&link_type=ISI)

10. 10.Bellman, R. Dynamic Programming. (Princeton University Press, Princeton, New Jersey, 1957).

11. 11.Sacks, J., Welch, W. J., Mitchell, T. J. & Wynn, H. P. Design and Analysis of Computer Experiments. Stat. Sci. 4, 409–423 (1989).

[CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1214/ss/1177012413&link_type=DOI)

12. 12.Paleyes, A., Mahsereci, M. & Lawrence, N. D. Emukit: A Python toolkit for decision making under uncertainty. in Proc. of the Python in Science Conf. (2023).

13. 13.Krige, D. G. A Statistical Approaches to Some Basic Mine Valuation Problems on the Witwatersrand. Journal of the Chemical 52, 119–139 (1951).

14. 14.Matheron, G. Principles of geostatistics. Econ. Geol. 58, 1246–1266 (1963).

[Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiZWNvbmdlbyI7czo1OiJyZXNpZCI7czo5OiI1OC84LzEyNDYiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyNC8xMS8yOS8yMDI0LjExLjI4LjI0MzE4MTM2LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==)

15. 15.Bower, R. G., Goldstein, M. & Vernon, I. Galaxy Formation: a Bayesian Uncertainty Analysis. Bayesian Anal. 5, 619–670 (2010).

[CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1214/10-BA524&link_type=DOI)

[Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000285652000001&link_type=ISI)

16. 16.Champer, S. E. et al. Modeling CRISPR gene drives for suppression of invasive rodents using a supervised machine learning framework. PLoS Comput. Biol. 17, e1009660 (2021).

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=34965253&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F29%2F2024.11.28.24318136.atom)

17. 17.Mubangizi, M., Andrade-Pacheco, R., Smith, M., Quinn, J. & Lawrence, N. D. Malaria surveillance with multiple data sources using Gaussian process models. in Proceedings of the 1st International Conference on the Use of Mobile ICT in Africa 2014 (2014).

18. 18.Nguyen, Q. Bayesian Optimization in Action. (Manning Publications, Shelter Island, 2023).

19. 19.Gardner, J. R., Pleiss, G., Weinberger, K. Q., Bindel, D. & Wilson, A. G. Gpytorch: Blackbox matrix-matrix gaussian process inference with gpu acceleration. in Proceedings of the 32nd International Conference on Neural Information Processing Systems vol. 31 7587–7597 (Curran Associates Inc., Reed Hook, 2018).

20. 20.Gething, P. W. et al. Improving imperfect data from health management information systems in Africa using space-time geostatistics. PLoS Med. 3, e271 (2006).

[CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pmed.0030271&link_type=DOI)

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=16719557&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F29%2F2024.11.28.24318136.atom)

21. 21.Bhatt, S. et al. The effect of malaria control on Plasmodium falciparum in Africa between 2000 and 2015. Nature 526, 207–211 (2015).

[CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nature15535&link_type=DOI)

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=26375008&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F29%2F2024.11.28.24318136.atom)

22. 22.Johnson, L. R. et al. Phenomenological forecasting of disease incidence using heteroskedastic Gaussian processes: A dengue case study. Ann. Appl. Stat. 12, 27–66 (2018).

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=38623158&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F29%2F2024.11.28.24318136.atom)

23. 23.Albinati, J., Meira, W. & Pappa, G. L. An Accurate Gaussian Process-Based Early Warning System for Dengue Fever. in 2016 5th Brazilian Conference on Intelligent Systems (BRACIS) 43–48 (IEEE, 2016).

24. 24.Sawe, S. J. et al. Gaussian process emulation to improve efficiency of computationally intensive multidisease models: a practical tutorial with adaptable R code. BMC Med. Res. Methodol. 24, 26 (2024).

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=38281017&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F29%2F2024.11.28.24318136.atom)

25. 25.Andrianakis, I. et al. Bayesian history matching of complex infectious disease models using emulation: a tutorial and a case study on HIV in Uganda. PLoS Comput. Biol. 11, e1003968 (2015).

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=25569850&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F29%2F2024.11.28.24318136.atom)

26. 26.Reiker, T. et al. Emulator-based Bayesian optimization for efficient multi-objective calibration of an individual-based model of malaria. Nat. Commun. 12, 7212 (2021).

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=34893600&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F29%2F2024.11.28.24318136.atom)

27. 27.Masserey, T. et al. The influence of biological, epidemiological, and treatment factors on the establishment and spread of drug-resistant Plasmodium falciparum. eLife 11, (2022).

28. 28.Golumbeanu, M. et al. Leveraging mathematical models of disease dynamics and machine learning to improve development of novel malaria interventions. Infect. Dis. Poverty 11, 61 (2022).

29. 29.Burgert, L., Reiker, T., Golumbeanu, M., Möhrle, J. J. & Penny, M. A. Model-informed target product profiles of long-acting-injectables for use as seasonal malaria prevention. PLOS Glob Public Health 2, e0000211 (2022).

30. 30.Johansson, M. A. et al. An open challenge to advance probabilistic forecasting for dengue epidemics. Proc. Natl. Acad. Sci. U. S. A. 116, 24268–24274 (2019).

[Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czoxMjoiMTE2LzQ4LzI0MjY4IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjQvMTEvMjkvMjAyNC4xMS4yOC4yNDMxODEzNi5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=)

31. 31.Perrin, A., Glaizot, O. & Christe, P. Worldwide impacts of landscape anthropization on mosquito abundance and diversity: A meta-analysis. Glob. Chang. Biol. 28, 6857–6871 (2022).

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=36107000&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F29%2F2024.11.28.24318136.atom)

32. 32.Paz-Bailey, G., Adams, L. E., Deen, J., Anderson, K. B. & Katzelnick, L. C. Dengue. Lancet 403, 667–682 (2024).

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=38280388&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F29%2F2024.11.28.24318136.atom)

33. 33.Gubler, D. J. Dengue and dengue hemorrhagic fever. Clin. Microbiol. Rev. 11, 480–496 (1998).

[Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiY21yIjtzOjU6InJlc2lkIjtzOjg6IjExLzMvNDgwIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjQvMTEvMjkvMjAyNC4xMS4yOC4yNDMxODEzNi5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=)

34. 34.Clarke, J. et al. A global dataset of publicly available dengue case count data. Sci. Data 11, 296 (2024).

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=38485954&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F29%2F2024.11.28.24318136.atom)

35. 35.Siraj, A. S. et al. Spatiotemporal incidence of Zika and associated environmental drivers for the 2015-2016 epidemic in Colombia. Sci. Data 5, 180073 (2018).

36. 36.Stoddard, S. T. et al. House-to-house human movement drives dengue virus transmission. Proc. Natl. Acad. Sci. U. S. A. 110, 994–999 (2013).

[Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czo5OiIxMTAvMy85OTQiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyNC8xMS8yOS8yMDI0LjExLjI4LjI0MzE4MTM2LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==)

37. 37.Reiner, R. C., Jr., Stoddard, S. T. & Scott, T. W. Socially structured human movement shapes dengue transmission despite the diffusive effect of mosquito dispersal. Epidemics 6, 30–36 (2014).

[CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.epidem.2013.12.003&link_type=DOI)

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=24593919&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F29%2F2024.11.28.24318136.atom)

38. 38.Moore, T. C. & Brown, H. E. Estimating Aedes aegypti (Diptera: Culicidae) Flight Distance: Meta-Data Analysis. J. Med. Entomol. 59, 1164–1170 (2022).

[CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/jme/tjac070&link_type=DOI)

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=35640992&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F29%2F2024.11.28.24318136.atom)

39. 39.Zahid, M. H., et al. The biting rate of Aedes aegypti and its variability: A systematic review (1970-2022). PLoS Negl. Trop. Dis. 17, e0010831 (2023).

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=37552669&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F29%2F2024.11.28.24318136.atom)

40. 40.Shragai, T. et al. Distance to public transit predicts spatial distribution of dengue virus incidence in Medellín, Colombia. Sci. Rep. 12, 8333 (2022).

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=35585133&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F29%2F2024.11.28.24318136.atom)

41. 41.Wesolowski, A. et al. Impact of human mobility on the emergence of dengue epidemics in Pakistan. Proc. Natl. Acad. Sci. U. S. A. 112, 11887–11892 (2015).

[Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czoxMjoiMTEyLzM4LzExODg3IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjQvMTEvMjkvMjAyNC4xMS4yOC4yNDMxODEzNi5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=)

42. 42.Rasmussen, C. E. & Williams, C. K. I. Gaussian Processes for Machine Learning. (MIT Press, Cambridge, USA, 2005).

43. 43.Gutierrez-Barbosa, H., Medina-Moreno, S., Zapata, J. C. & Chua, J. V. Dengue infections in Colombia: Epidemiological trends of a hyperendemic country. Trop. Med. Infect. Dis. 5, 156 (2020).

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=33022908&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F29%2F2024.11.28.24318136.atom)

44. 44.Bhatt, S. et al. The global distribution and burden of dengue. Nature 496, 504–507 (2013).

[CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nature12060&link_type=DOI)

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=23563266&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F29%2F2024.11.28.24318136.atom)

[Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000317984400041&link_type=ISI)

45. 45.Kraemer, M. U. et al. The global distribution of the arbovirus vectors Aedes aegypti and Ae. albopictus. Elife 4, e08347 (2015).

[CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.7554/eLife.08347&link_type=DOI)

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=26126267&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F29%2F2024.11.28.24318136.atom)

46. 46.Nelson, A. Estimated travel time to the nearest city of 50,000 or more people in year 2000. European Commission - Joint Research Centre - Forest Resources and Climate Unit. [https://forobs.jrc.ec.europa.eu/gam](https://forobs.jrc.ec.europa.eu/gam).

47. 47.Thai, K. T. D. & Anders, K. L. The role of climate variability and change in the transmission dynamics and geographic distribution of dengue. Exp. Biol. Med. 236, 944–954 (2011).

[CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1258/ebm.2011.010402&link_type=DOI)

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=21737578&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F29%2F2024.11.28.24318136.atom)

48. 48.Khan, M. B. et al. Dengue overview: An updated systemic review. J. Infect. Public Health 16, 1625–1642 (2023).

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=37595484&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F29%2F2024.11.28.24318136.atom)

49. 49.Magori, K., et al. Skeeter Buster: a stochastic, spatially explicit modeling tool for studying Aedes aegypti population replacement and population suppression strategies. PLoS Negl. Trop. Dis. 3, e508 (2009).

[CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pntd.0000508&link_type=DOI)

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=19721700&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F29%2F2024.11.28.24318136.atom)

50. 50.González, M. C., Hidalgo, C. A. & Barabási, A.-L. Understanding individual human mobility patterns. Nature 453, 779–782 (2008).

[CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nature06958&link_type=DOI)

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=18528393&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F29%2F2024.11.28.24318136.atom)

[Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000256415300043&link_type=ISI)

51. 51.Song, C., Koren, T., Wang, P. & Barabási, A.-L. Modelling the scaling properties of human mobility. Nature Phys. 6, 818–823 (2010).

52. 52.Tatem, A. J. et al. Air travel and vector-borne disease movement. Parasitology 139, 1816–1830 (2012).

[CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1017/S0031182012000352&link_type=DOI)

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=22444826&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F29%2F2024.11.28.24318136.atom)

[Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000311904900002&link_type=ISI)

53. 53.Bonilla, E. V., Chai, K. M. & Williams, C. K. I. Multi-task Gaussian Process Prediction. in Proceedings of the 20th International Conference on Neural Information Processing Systems 153–160 (Vancouver, British Columbia, Canada, Red Hook, 2007).

54. 54.Balandat, M. et al. BOTORCH: a framework for efficient monte-carlo Bayesian optimization. in Proceedings of the 34th International Conference on Neural Information Processing Systems 21524–21538 (Curran Associates Inc., Red Hook, NY, USA, 2020).

55. 55.Schrama, M. et al. Human practices promote presence and abundance of disease-transmitting mosquito species. Sci. Rep. 10, 13543 (2020).

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=32782318&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F29%2F2024.11.28.24318136.atom)

56. 56.Thongsripong, P., Hyman, J. M., Kapan, D. D. & Bennett, S. N. Human-mosquito contact: A missing link in our understanding of mosquito-borne disease transmission dynamics. Ann. Entomol. Soc. Am. 114, 397–414 (2021).

[CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/aesa/saab011&link_type=DOI)

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=34249219&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F29%2F2024.11.28.24318136.atom)

57. 57.Wei Xiang, B. W., et al. Dengue virus infection modifies mosquito blood-feeding behavior to increase transmission to the host. Proc. Natl. Acad. Sci. U. S. A. 119, e2117589119 (2022).

[Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czoxNzoiMTE5LzMvZTIxMTc1ODkxMTkiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyNC8xMS8yOS8yMDI0LjExLjI4LjI0MzE4MTM2LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==)

58. 58.Stoddard, S. T., et al. The role of human movement in the transmission of vector-borne pathogens. PLoS Negl. Trop. Dis. 3, e481 (2009).

[CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pntd.0000481&link_type=DOI)

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=19621090&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F29%2F2024.11.28.24318136.atom)

59. 59.Grimm, V. et al. The ODD protocol for describing agent-based and other simulation models: A second update to improve clarity, replication, and structural realism. J. Artif. Soc. Soc. Simul. 23, 7 (2020).

60. 60.Asish, P. R., Dasgupta, S., Rachel, G., Bagepally, B. S. & Girish Kumar, C. P. Global prevalence of asymptomatic dengue infections - a systematic review and meta-analysis. Int. J. Infect. Dis. 134, 292–298 (2023).

[CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ijid.2023.07.010&link_type=DOI)

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=37463631&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F29%2F2024.11.28.24318136.atom)

61. 61.Paszke, A. et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. in Proceedings of the 33rd International Conference on Neural Information Processing Systems 8024–8035 (Curran Associates Inc., Red Hook, NY, USA, 2019).

62. 62.Saltelli, A. et al. Global Sensitivity Analysis: The Primer. (John Wiley & Sons, 2008).

63. 63.Herman, J. & Usher, W. SALib: An open-source Python library for Sensitivity Analysis. The Journal of Open Source Software 2, (2017).

64. 64.Cavany, S. M., España, G., Vazquez-Prokopec, G. M., Scott, T. W. & Perkins, T. A. Pandemic-associated mobility restrictions could cause increases in dengue virus transmission. PLoS Negl. Trop. Dis. 15, e0009603 (2021).

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=34370734&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F11%2F29%2F2024.11.28.24318136.atom)

65. 65.Mejía-Jurado, E., Echeverry-Cárdenas, E. & Aguirre-Obando, O. A. Potential current and future distribution for Aedes aegypti and Aedes albopictus in Colombia: important disease vectors. Biol. Invasions 26, 2119–2137 (2024).

66. 66.Nordhaus, W. D. Geography and macroeconomics: new data and new findings. Proc. Natl. Acad. Sci. U. S. A. 103, 3510–3517 (2006).

[Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czoxMToiMTAzLzEwLzM1MTAiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyNC8xMS8yOS8yMDI0LjExLjI4LjI0MzE4MTM2LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==)

67. 67.United Nations Office For The Coordination of Humanitarian Affairs. Colombia - Subnational Administrative Boundaries. [https://data.humdata.org/dataset/cod-ab-col](https://data.humdata.org/dataset/cod-ab-col) (2020).

68. 68.Helwig, N. E. Multiple and Generalized Nonparametric Regression. (SAGE Publications, Inc., London, 2020).

69. 69.R Core Team. R: A Language and Environment for Statistical Computing. Preprint at [https://www.R-project.org/](https://www.R-project.org/) (2022).

[1]: /embed/graphic-2.gif
[2]: /embed/graphic-8.gif
[3]: /embed/graphic-9.gif
[4]: /embed/graphic-10.gif