Identifying Importation and Asymptomatic Spreaders of Multi-drug Resistant Organisms in Hospital Settings
=========================================================================================================

* Jiaming Cui
* Jack Heavey
* Eili Klein
* Gregory R. Madden
* Anil Vullikanti
* B. Aditya Prakash

## Abstract

Healthcare-associated infections (HAIs) due to multi-drug resistant organisms (MDROs) are a significant burden to the healthcare system. Patients are sometimes already infected at the time of admission to the hospital (referred to as “importation”), and additional patients might get infected in the hospital through transmission (“nosocomial infection”). Since many of these importation and nosocomial infection cases may present no symptoms (i.e., “asymptomatic”), rapidly identifying them is difficult since testing is limited and incurs significant delays. Although there has been a lot of work on examining the utility of both mathematical models of transmission and machine learning for identifying patients at risk of MDRO infections in recent years, these methods have limited performance and suffer from different drawbacks: Transmission modeling-based methods do not make full use of rich data contained in electronic health records (EHR), while machine learning-based methods typically lack information about mechanistic processes. In this work, we propose NeurABM, a new framework which integrates both neural networks and agent-based models (ABM) to combine the advantages of both modeling-based and machine learning-based methods. NeurABM simultaneously learns a neural network model for patient-level prediction of importation, as well as the ABM model which is used for identifying infections. Our results demonstrate that NeurABM identifies importation and nosocomial infection cases more accurately than existing methods.

## Introduction

Healthcare-associated infections (HAIs), especially those caused by multi-drug resistant organisms (MDROs) pose a significant threat to patient safety and burden the healthcare system with increased costs due to longer hospital stays and more expensive therapies [28, 26, 25, 30, 10, 15]. Approximately 3% of hospitalized patients in the United States acquire an HAI during their stay, resulting in more than 35,000 deaths annually [3, 14, 7, 24]. Often, patients have already been infected but present no symptoms at admission (i.e., importation cases). For instance, the European Centre for Disease Prevention and Control (ECDC) estimates that importation cases contribute to 13% of HAI cases in Germany and 18.9% in Spain [29]. These importation cases can spread HAI-causing pathogens and lead to nosocomial infection cases, which can also be asymptomatic but further spread pathogens to additional healthy patients [27].

Despite the critical concerns associated with importation and nosocomial infection cases, identifying them rapidly and accurately still remains a challenging problem. Current methods to identify them include surveillance tests [22, 16], machine learning-based methods [23, 13], and transmission modeling-based meth-ods [21, 11, 1, 19, 18, 31, 8]. However, they all suffer from unavoidable drawbacks. Surveillance tests such as culture or PCR tests are common methods in hospitals to identify importation and nosocomial infection cases. However, they are costly, require time to process, and are not 100% accurate [6]. Additionally, these are typically only able to be used for a subset of MDROs, and therefore not applicable for all kinds of HAIs. Machine learning and statistical techniques use patients’ electronic health record (EHR) data to predict the probability of importation and nosocomial infection cases [23, 13]. However, the performance of machine learning methods has not proven to be high enough due to many reasons, including imbalance (since HAI cases form a very small fraction of the entire patient population), bias in the data (since testing is generally not done in a systematic manner), and the fact that machine learning methods do not incorporate epidemiological knowledge in their frameworks. Finally, modeling-based methods are based on detailed mechanistic models (e.g., compartmental mixing models [18, 31] and agent-based models (ABMs) [1, 21, 11]) to capture the transmission dynamics of HAIs within a healthcare facility. They are calibrated to infections in the hospital and use projections from such models for prediction. Although ABMs have used information about contact networks between patients and providers within healthcare facilities to model the infection status of an individual patient, they still cannot directly incorporate the risk factor of each patient from the EHR data, such as medications, lab results, vital signs, and device use history into modelling. As we will also show in the later results section, the inability of leveraging EHR data of patients leads to suboptimal performance.

In this work, we propose a new framework, NeurABM, to identify HAI importation and nosocomial infection cases by *coupling* a neural network and an ABM and training simultaneously. We use Methicillin-resistant *Staphylococcus aureus* (MRSA) as an example HAI in later sections. Figure 1 shows an overview of our framework. As shown in the figure, the neural network estimates the importation probability for each patient using EHR data, while the ABM incorporates MRSA dynamics and is used to estimate the MRSA infection probability. After training, NeurABM runs as a discrete time process; at each time step *t*, the following two steps are performed: (1) the neural network estimates the importation probability (i.e., identifies importation cases) for each new patient who enters the hospital, and (2) the ABM (which keeps track of the disease states of all patients in hospital till time *t* − 1) runs the next step of the disease simulation to estimate all disease states (i.e., identify nosocomial infection cases, including those are asymptomatic) at time *t*. The parameters of NeurABM consist of two parts: those of the neural network and those of the ABM (e.g., general parameters such as transmission and recovery rates, and patient-specific parameters such as importation probabilities in this paper); these parameters are learned by minimizing a loss function that considers the errors in the ABM projections and ground truth incidence data from EHR. Since the dynamics of MRSA transmission depend on the importation model (and conversely, through this kind of training process), this approach couples the neural network and ABM and is trained end-to-end, which mitigates the issues in using either of them individually. NeurABM significantly extends the work of [4], which was the first method to consider such a joint deep learning and ABM approach, by introducing approximation techniques to scale the disease model and incorporating rich patient-level EHR data.

![Figure 1:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/07/15/2024.07.14.24310393/F1.medium.gif)

[Figure 1:](http://medrxiv.org/content/early/2024/07/15/2024.07.14.24310393/F1)

Figure 1: 
Our NeurABM framework involves 4 steps: (1) The neural network takes the patients’ risk factor data collected from EHR as input, and outputs both the agent-based model (ABM) parameters, denoted by ***θ****M* (which are applicable to every patient), and patient-specific parameters, denoted by ***θ****p* (importation probabilities in this paper, darker red means higher probability), for each patient *p*. (2) These parameters and the adjacency matrices for contact networks of each day collected from EHR are then fed into the ABM simulator for simulation for *T* days, and the output will be the probability that each patient *p* is in the *Carriage* state *ŷ**p*,*t* for each day *t* (darker red means higher probability). (3) We then compare this *ŷ**p*,*t* with the ground-truth observation of patients in *Carriage* state (via lab testing), denoted by *y**p*,*t*, and compute the loss ℒ (*ŷ**p*,*t*, *y**p*,*t*). (4) We backpropagate this loss to the neural network to tune the neural network parameters *ϕ*.

We demonstrate the performance of NeurABM using EHR data for patients at the University of Virginia (UVA) hospital intensive care units (ICUs). Our results show that NeurABM identifies not only importation cases but also nosocomial infection cases better than other machine learning or modeling-based baselines. Note that our NeurABM is a general framework that integrates both neural networks and mechanistic models in an end-to-end way, one which can be easily extended to other ABMs or EHR data and study other clinical problems.

## Results

Here, we show the performance of our NeurABM in identifying MRSA importation and nosocomial infection cases in the ICUs of the UVA hospital in 2019. We used EHR data from the UVA hospital to construct patient contact networks (used by the ABM) and collect patient risk factors (used by the neural network). We use the SIS-ABM model [12, 5] as the ABM for disease transmission in NeurABM. Ground-truth MRSA infections are identified from lab test results for each patient in the EHR. For each week *k*, we used the contact networks, patient risk factors, and lab test results until week *k* −1 to train the NeurABM and identify importation cases before week *k* −1. We then ran the SIS-ABM model for 7 more days to infer the infection states of patients for week *k*, which correspond to nosocomial infections (see Materials and Methods section). Note that only data prior to week *k* are used in this process—this is same as the setting considered in [21] for detection of asymptotic cases (though their ABM is different); further, they do not consider the importation problem. We also compared NeurABM with other baselines in machine learning categories (feedforward neural network [20], decision tree [9], naive bayes [2]), modeling categories (the SIS-ABM model [12, 5], SILI-ABM model [21]), and clinical heuristic categories (length of stay [21]). Note that SILI-ABM model [21] is not designed to identify importation cases, we only compare with them for identifying nosocomial infection cases. We also ran experiments where test results are only available until week *k* −2 (see Supporting Information), and the results show that NeurABM still performs better than other baselines.

### Identifying importation cases

In Figure 2a, we show the precision-recall curves for NeurABM and other methods. Note that in clinical practice, very low precision is not very useful, since this means too many tests and treatments do not help to identify and treat MRSA cases. Therefore, we always expect high recall with not-too-low precision. Following previous work [17], we consider precision smaller than 0.25 as clinically inapplicable and focus on three important precision levels: 0.25, 0.5, and 0.75 (dashed grey lines).

![Figure 2:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/07/15/2024.07.14.24310393/F2.medium.gif)

[Figure 2:](http://medrxiv.org/content/early/2024/07/15/2024.07.14.24310393/F2)

Figure 2: 
The performance in identifying importation cases includes: (a) The precision-recall curves (PRC). The x-axis represents precision, and the y-axis represents recall. The red and other color curves represent NeurABM and other baselines. A larger area under the precision-recall curve (AUPRC) indicates better performance. AUPRC values are listed in the legends, and NeurABM has the highest AUPRC value. (b) The negative predictive value (NPV) with different thresholds. The x-axis is the threshold for classification, and the y-axis is the NPV value. Circles, squares, and triangles correspond to the thresholds and NPV values where precision is 0.25, 0.5, and 0.75, respectively. A higher NPV value indicates fewer missing importation cases that are not identified and therefore better performance, and NeurABM has the highest NPV values. (c) The receiver operating characteristic (ROC) curves in identifying MRSA importation cases. The x-axis is the false positive rate, and the y-axis is the true positive rate. A larger area under the ROC (AUC-ROC) indicates better performance. AUC-ROC values are listed in the legends, and NeurABM has the highest AUC-ROC value. (d) The recall, F1 score, AUPRC, false positive rate, NPV, and AUC-ROC under different precisions (0.25, 0.5, 0.75). The best AUPRC and AUC-ROC are in bold.

NeurABM always achieves the highest recall when precision is 0.25, 0.5, and 0.75, indicating that our framework is effective. Besides, the area under the precision-recall curve (AUPRC) for NeurABM is the largest (0.60) among all methods. In Figure 2b, we show how the negative predictive value (NPV) changes with threshold changes (when the estimated probability is higher than the threshold, we classify this patient as a MRSA importation case, and vice versa). The negative predictive value is the fraction of the number of true negative cases over predicted negative cases. Intuitively, a higher NPV means that when we predict a patient as negative, he or she is less likely to be a false negative patient that we fail to identify. We can see that NeurABM’s NPV rate is always higher than 0.95 and other baselines, indicating that our NeurABM can identify the importation cases well with fewer missing/undetected patients. We also show the receiver operating characteristic (ROC) curve in Figure 2c. Here, the area under the curve (AUC-ROC) for our framework is still the largest (0.86) compared to other baselines. In the table in Figure 2d, we also list the recall, F1 score, and false positive rate corresponding to different precision values. We can see that NeurABM always achieves the highest recall and F1 score with a given precision, indicating the effectiveness of our framework in identifying MRSA importation cases.

### Identifying nosocomial infection cases

As shown in Figure 3a, the x-axis and y-axis represent precision and recall, respectively. The red curve represents the results for NeurABM. As shown in the figure, the area under the precision-recall curve for our framework is the largest (0.58) compared to other baselines. The dashed grey lines correspond to the precision of 0.25, 0.5, and 0.75. Again, NeurABM always achieves the highest recall with precision equal to 0.25, 0.5, and 0.75, indicating that our framework is effective. In Figure 3b, we show how the negative predictive value (NPV) changes with the threshold for classification. We can see that the NPV rate is always higher than 0.9 and other baselines, indicating that NeurABM can identify nosocomial infection cases well with fewer missing/undetected patients. We also show the ROC curve in Figure 3c. Here, the area under the ROC curve for our framework is the largest (0.86) compared to other baselines. In the table in Figure 3d, NeurABM always achieves the highest recall and F1 score with a given precision, demonstrating the effectiveness of our framework in identifying nosocomial MRSA infection cases.

![Figure 3:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/07/15/2024.07.14.24310393/F3.medium.gif)

[Figure 3:](http://medrxiv.org/content/early/2024/07/15/2024.07.14.24310393/F3)

Figure 3: 
The performance in identifying nosocomial infection cases includes: (a) The precision-recall curves. The red and other color curves represent NeurABM and other baselines. Higher AUPRC is better, and NeurABM has the highest AUPRC value. (b) The negative predictive value with different thresholds. Circles, squares, and triangles correspond to the thresholds and NPV values where precision is 0.25, 0.5, and 0.75, respectively. Higher NPV value is better, and NeurABM has the highest NPV values. (c) The receiver operating characteristic curves in identifying MRSA nosocomial infection cases. Higher AUC-ROC is better, and NeurABM has the highest AUC-ROC value. (d) The recall, F1 score, AUPRC, false positive rate, NPV, and AUC-ROC under different precisions. The best AUPRC and AUC-ROC are in bold.

### Identification of high-risk MRSA cases

This can help in implementing better infection control methods within the hospital. We consider the strategy of testing patients based on the ranked infected probability estimated by NeurABM and other baselines, and determine what percentage of MRSA cases can be identified. This is shown in Figure 4, where we rank all patients according to the estimated infected probability of each method from high to low, and test patients in this order. Here, we observe that NeurABM can always identify more nosocomial MRSA infection cases (y-axis) given the same test budget (x-axis), which suggests that our framework is effective and practical in identifying MRSA cases in clinical settings.

![Figure 4:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/07/15/2024.07.14.24310393/F4.medium.gif)

[Figure 4:](http://medrxiv.org/content/early/2024/07/15/2024.07.14.24310393/F4)

Figure 4: 
Percentage of identified MRSA cases by screening “high-risk” patients. For each patient in the UVA ICUs, we use each method to estimate their MRSA infection probability and rank them according to this probability from high to low. We then screen different percentages of patients (x-axis) and see how many actual MRSA cases can be identified (y-axis). As seen in the figure, NeurABM can always identify more MRSA cases than other baselines.

### Case study: Explanation of neural network

We further investigate which patient risk factors are considered as having a high risk of being importation cases by NeurABM. As shown in Figure 5, each dot represents a patient, and the color of the dot represents the risk factor value (red means higher, and blue means lower). Dots with a higher impact value (to the right of the control line) mean that NeurABM tends to classify this patient as having a higher probability of being an importation case (and vice versa). Here, we can see that patients who have had contact with more MRSA patients in the past 7/14 days prior to ICU admission are considered the most dangerous. Besides, we notice that patients with a device usage history, who come from or were discharged to other healthcare facilities last time, and emergency patients are more likely to be considered as importation cases.

![Figure 5:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/07/15/2024.07.14.24310393/F5.medium.gif)

[Figure 5:](http://medrxiv.org/content/early/2024/07/15/2024.07.14.24310393/F5)

Figure 5: 
The patient risk factors that are considered as having a high risk of being importation cases by the trained neural network. Color of the dots represent risk factor values, with red indicating higher values. A higher impact means that NeurABM is more likely to consider the patient as an importation case.

## Discussion

NeurABM is a novel framework for supporting a diverse class of HAI surveillance and control questions. This framework allows the use of diverse data sources from EHRs by deep learning methods and ABMs. While joint deep learning and ABMs have been considered before, as in [4] (which our work builds on), our approach is distinct in that the deep learning method for prediction of importations and the ABM run in lock-step, allowing the ABM to directly incorporate the predictions of importation. Additionally, the detailed contact network used in the SIS-ABM model in NeurABM also plays an important role, and justifies the utility of detailed contact representations for surveillance and control of HAIs.

Our experiments show that NeurABM identifies not only MRSA importation cases but also nosocomial infection cases in UVA ICUs, with performance that can be considered clinically useful; in contrast, prior methods using EHR data haven’t achieved this level of performance. Moreover, the negative predictive value (NPV) for our method is always higher than 0.95, indicating that NeurABM can identify these cases well with fewer missing/undetected patients. Our case study also reveals the risk factors that are highly related to MRSA importations, which allows clinicians to better respond to potential MRSA importation cases in their hospital.

However, our framework is not without limitations. One limitation is that the SIS-ABM model may oversimplify MRSA carriage by defining just two states: *susceptible* and *carriage*. There are different forms of MRSA carriage, including various clinical infection types such as skin abscess or bloodstream infection, which may confer different levels of risk for transmission. Similarly, the SIS-ABM model does not have an explicit “colonization” state. However, our framework is quite general and can be extended to use more complex ABMs that incorporate these states separately. We also did not take false negative or false positive MRSA tests into consideration, although the sensitivity and specificity of the nares MRSA tests are considered quite good. Another potential limitation is that MRSA surveillance has several biases, and NeurABM is not trained to specifically mitigate these biases.

Nevertheless, NeurABM opens up a new direction of research for using both rich patient risk factors from EHR data as well as agent-based models designed with epidemiological knowledge and can be used for many other questions about HAI spread in hospitals. The specific architecture of our method, which runs the deep learning method and ABM step at each time allows us to add predictors at different stages within this framework to address other kinds of questions. For instance, the framework can be adapted to forecast nosocomial infections and severe outcomes by adding a predictor for these outcomes after the ABM. The ABM can also be enriched with representations of interventions implemented in the hospital. For patient-specific parameters, we focus on importation probabilities in this work. However, many other parameters such as recovery rate can also be patient-specific and our NeurABM framework can be easily extended to it. The performance of these variations depends on data availability, and is a promising topic for future research. It can also be adapted to other healthcare-associated infections, such as *C*.*diff*. The fact that NeurABM is amenable to such adaptations quite easily is an indication of its generality.

## Methods

### Dataset

We extract three different types of patient data based on the electronic health records (EHR) from the University of Virginia hospital: patient demographic information and risk factors (e.g., comorbidities, medical history), lab testing, and contact network data.

### Patient risk factor data

This dataset consists of risk factors for all patients in ICUs. From the EHR dataset, we collected 19 different risk factors for each patient, all of which are available before ICU admission. From July 1, 2019, to December 31, 2019, there were 2470 patients in UVA ICUs, and 157 of them were MRSA importation cases (all patients received an MRSA test within (*t* −3, *t* + 3) days of being admitted into one of the ICUs, and patients who tested positive for MRSA within this range are considered as importation cases). A list and description of each risk factor are provided in the Supporting Information.

### Lab testing data

This dataset consists of infection data for each patient. There were two different types of tests to diagnose MRSA: culture tests and polymerase chain reaction (PCR) tests. However, since a negative culture test cannot disqualify an individual from MRSA infection, we focused on only positive culture tests and both positive and negative PCR tests. For a given patient *p* on a given day *t, y**p*,*t* = 1 represents that the patient was tested positive on day *t* or if their most recent test in the past was positive. Likewise, *y**p*,*t* = 0 if this patient was tested negative on day *t* or if their most recent test was negative.

### Contact network data

This dataset consists of a series of contact networks ***A****t* comprising three different entities: patients, health-care workers (HCWs), and locations, and each network is for one specific day. From the EHR dataset, we can collect the movement information of patients and HCWs (e.g., the ward that a patient stayed, and when the doctors and nurses visited a specific ward). Note that these movement information also includes start and end times; we can infer whether these patients and HCWs were co-located (i.e., time overlapped) at any specific location. Specifically, if two patients or HCWs *v*1, *v*2 colocated at location *l* on day *t*, we would create edges between *v*1 and *v*2, *v*1 and *l, v*2 and *l* on day *t* in ***A****t*. However, because of the nature of this data, individuals such as support staff or patient guests are not tracked, and thus are not included in the network. Additionally, HCW-HCW colocations are not tracked in rooms where care is not administered, such as break rooms.

### Problem setup

We use the trained NeurABM model to identify importation and nosocomial infection cases. For importation cases, the patient-specific parameter ***θ****p* in this work is importation probability for each patient. Specifically, for each week *k*, we used the contact networks, patient risk factors, and lab testing results until week *k* −1 to train the NeurABM, and our task is to identify the importation cases from them until week *k* −1. Note that NeurABM framework do not access to the ground-truth importation cases data. Instead, we only use the ground-truth importation cases data for evaluation. For nosocomial infection cases, we follow the setup of a previous work [21]: For each week *k*, we used the contact networks, patient risk factors, and lab testing results until week *k* −1 to train the NeurABM, and then ran the SIS-ABM model for 7 more days to infer the infection states of all patients for week *k*. We followed a real-world step-forward scenario that made weekly predictions. For example, if we were at the end of week 40 (beginning of October), we would train on the data from week 28 (beginning of July) to week 39 to estimate both the importation cases between week 28 and week 39 (end of September) and nosocomial infection cases in week 40 (i.e., no information after week 40 is used). Then, at the end of week 41, we would train on the data from week 29 to 40 and identify the nosocomial infection cases for week 41. We repeated this procedure until we were at the end of week 52.

### Transmission model

In this work, we use the SIS-ABM model to capture the MRSA spread dynamics in UVA ICUs [12]. SIS-ABM is a pathogen load-based model that keeps track of pathogen load on all people and locations using a load vector ***l****t*. For each patient *i*, they can either be in the *Susceptible* (*S*) or *Carriage* (*C*) state. Specifically, the probability of transitioning from *S* to *C* is proportional to the amount of pathogen on this patient ***l****t*(*i*), which can be formulated as a linear dose-response function *β****l****t*(*i*) (*β* is the disease infectivity parameter). Once in the carriage state, the patient keeps shedding more pathogen loads at each step, which can later be transferred to its neighbors (including both people and locations). Such a shedding process continues until the patient recovers with a recovery probability δ.

For the pathogen load transfer, as described in the previous text, the SIS-ABM model uses daily contact networks ***A****t* to capture the exchange of pathogens among patients, HCWs, and locations. Specifically, we construct a transfer matrix ***R****t* for each day *t*, where ***R****ijt* = *τ**ijt****A****ijt*. Here, ***A****t* is the adjacency matrix of contact networks on day *t*, and *τ**ijt* is the transfer ratio parameter (the ratio of pathogen being transferred (or remaining if *i* = *j*) from *j* to *i* on day *t*). Using this ***R****t* and ***l****t*(*i*), the SIS-ABM model updates the pathogen loads every day as a linear operation. We also restrict the column-sums of ***R****t* to be less than or equal to 1, which implies that the total amount of pathogen cannot increase after transfer (i.e., |***R****t****l****t*| ≤|***l****t*|). Note that susceptible patients may still carry a small amount of pathogen loads and spread them to others, and HCWs and locations are always in the *susceptible* state, which means that they can spread the MRSA pathogen loads but are non-infectable. We provide more details in the Supporting Information.

### NeurABM framework

As shown in Figure 1, our NeurABM is composed of two components: a neural network and an agent-based model (ABM) simulator. The neural network is used to estimate both patient-specific parameters and ABM parameters.

For the neural network component, we take the risk factors ***f*** (where *f**p* is for patient *p*) as input and then estimate both the patient-specific parameters ***θ****p* (which is a vector and each element *θ**p* is for patient *p* and only influenced by the patient itself’s risk factors *f**p*, in this work it is importation probability for each patient) and ABM parameters ***θ****M* (i.e., the general parameters that apply to every patient in the SIS-ABM model, like disease infectivity parameter *β* or recovery probability δ) together. The neural network is parameterized by *ϕ* and we use ***θ****p*, ***θ****M* = *NN* (***f*** ; *ϕ*) to represent it.

For the ABM simulator, we implement the simulation process of the SIS-ABM model using matrix operations in a differentiable way. ABM simulator takes the adjacency matrices of contact networks ***A****t* and the parameters learned by neural networks (***θ****p*, ***θ****M*) as input, and simulates MRSA spread in *T* days to estimate patient states on each day ***ŷ*** (where *ŷ**p*,*t* is for patient *p* on day *t*). Specifically, the simulation process of the SIS-ABM model can be decomposed into three substeps: (1) pathogen load transmission where the load transfers via contact edges, (2) updating the states for each patient based on their pathogen loads and recovery probability, and (3) updating the timestep from day *t* to day *t* + 1. This process can be repeated for arbitrary steps to simulate the MRSA spread over *T* days. We use ***ŷ*** = *ABM* (***A***; ***θ****p*, ***θ****M*) to represent it.

With both the neural network and the ABM simulator, we can then integrate both components together to train simultaneously. Specifically, one training epoch comprises the following four steps.

#### Step 1

We feed the risk factor data ***f*** into the neural network as the input to estimate the patient-specific parameters ***θ****p*. In this work, ***θ****p* is the probability of being importation cases for each patient *p*. It is a vector of size *N*, where *N* is the number of patients in the contact network. Meanwhile, the neural network will also give the general ABM parameters ***θ****M* that are applied to all patients (e.g., *β, α*,) …

#### Step 2

We then feed ***θ****p*, ***θ****M*, and the contact networks ***A****t* into the ABM simulator and simulate for *T* steps. The output will be the vector ***ŷ*** of size *N* × *T*, in which *ŷ**p*,*t* represents the probability of being in the state *carriage* for patient *p* on day *t*.

#### Step 3

We compare the estimated carriage probability ***ŷ*** with the corresponding ground-truth observations (i.e., known carriage patients based on lab testing) ***y***. We use the weighted binary cross entropy loss (BCE loss) L(***ŷ, y***) = ∑*p* ∑*t* *w**pos**y**p*,*t* log(*ŷ**p*,*t*)+*w**neg*(1−*y**p*,*t*) log(1− *ŷ**p*,*t*) as the loss function. Here *w**pos* and *w**neg* are the weights for positive and negative observations. We set *w**pos* : *w**neg* ∝∑*p* ∑*t* 𝟙 [*y**p*,*t* = 0] : ∑*p* ∑*t* [*y**p*,*t* = 1], where 𝟙 [·] is the indicator function, which is 1 if the condition is true, and 0 otherwise.

#### Step 4

With the BCE loss ℒ (***ŷ, y***), and the differentiable ABM simulator, we can calculate the gradient of the loss with respect to the neural network parameters *ϕ* via backpropagation. This allows us to better tune the neural network and learn more reasonable parameters as the input for the ABM simulator.

The four steps are repeated until the loss ℒ (***ŷ, y***) converges. More details are provided in the Supporting Information.

### Baselines

To compare our NeurABM with current modeling or machine learning-based methods, we also compare with other baselines including machine learning-based methods (neural network [20], decision tree [9], naive bayes [2]), mechanistic modeling-based methods (SIS-ABM model [12, 5], SILI-ABM model [21]), and clinical heuristic methods (length of stay [21]).

For machine learning based methods, we train two models: one for identifying importation cases and another for identifying nosocomial infection cases. For importation cases, we train on the ground-truth importation cases from January 2019 to June 2019 and test on July to December (i.e., the same time period for NeurABM). For identifying nosocomial infection cases in each week *k*, we train on data until week *k* −1 and test on week *k*. For modeling-based methods, we run their models following their original papers [21, 12] and take the average infected probability of 100 simulations as the probabilities of being importation cases and nosocomial infection cases. For clinical heuristic methods, the length of stay will consider patients staying longer in the hospital to have higher probabilities.

## Supporting information

Supplementary Information [[supplements/310393_file02.pdf]](pending:yes)

## Data Availability

The outputs of our model are available on GitHub via link: [https://github.com/AdityaLab/NEURABM](https://github.com/AdityaLab/NEURABM). The electronic health record (EHR) data used in developing the models is not available since it is highly sensitive, and we do not have permission to release it. However, we provide the code, a demo, and a synthetic dataset on GitHub.

## Acknowledgements

This work was supported in part by the NSF (Expeditions CCF-1918770, CAREER IIS-2028586, RAPID IIS-2027862, Medium IIS-1955883, Medium IIS-2106961, PIPP CCF-2200269), Centers for Disease Control and Prevention Modeling Infectious Diseases In Healthcare program (5U01CK000589), National Institutes of Health (K23AI163368 to G.R.M.) and National Center for Advancing Translational Science (UL1TR003015, KL2TR003016 to G.R.M.), faculty research award from Facebook and funds/computing resources from Georgia Tech.

*   Received July 14, 2024.
*   Revision received July 14, 2024.
*   Accepted July 15, 2024.


*   © 2024, Posted by Cold Spring Harbor Laboratory

This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at [http://creativecommons.org/licenses/by-nc-nd/4.0/](http://creativecommons.org/licenses/by-nc-nd/4.0/)

## References

1.  [1].Adhikari, B., Lewis, B., Vullikanti, A., Jimenez, J. M., and Prakash, B. A. Fast and near-optimal monitoring for healthcare acquired infection outbreaks. PLoS computational biology 15, 9 (2019), e1007284.
    
    
2.  [2].Bishop, C. Pattern recognition and machine learning. Springer google schola 2 (2006), 5–43.
    
    
3.  [3].Centers for Disease Control and Prevention. Antibiotic Resistance Threats in the United States, 2019.
    
    
4.  [4].Chopra, A., Rodríguez, A., Subramanian, J., Quera-Bofarull, A., Krishnamurthy, B., Prakash, B. A., and Raskar, R. Differentiable agent-based epidemiology. In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems (2023), pp. 1848–1857.
    
    
5.  [5].Cui, J., Cho, S., Kamruzzaman, M., Bielskas, M., Vullikanti, A., and Prakash, B. A. Using spectral characterization to identify healthcare-associated infection (hai) patients for clinical contact precaution. Scientific Reports 13, 1 (2023), 16197.
    
    
6.  [6].Dangerfield, B., Chung, A., Webb, B., and Seville, M. T. Predictive value of methicillin-resistant staphylococcus aureus (mrsa) nasal swab pcr assay for mrsa pneumonia. Antimicrobial agents and chemotherapy 58, 2 (2014), 859–864.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiYWFjIjtzOjU6InJlc2lkIjtzOjg6IjU4LzIvODU5IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjQvMDcvMTUvMjAyNC4wNy4xNC4yNDMxMDM5My5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

7.  [7].Fernandez-Gracia, J., Onnela, J.-P., Barnett, M. L., Eguiluz, V. M., and Christakis, N. A. Influence of a patient transfer network of us inpatient facilities on the incidence of nosocomial infections. Scientific reports 7, 1 (2017), 2930.
    
    
8.  [8].Haghpanah, F., Lin, G., Klein, E., and Program, C. M.-H. Deconstructing the effects of stochasticity on transmission of hospital-acquired infections in icus. Royal Society Open Science 10, 9 (2023), 230277.
    
    
9.  [9].Hastie, T., Tibshirani, R., Friedman, J. H., and Friedman, J. H. The elements of statistical learning: data mining, inference, and prediction, vol. 2. Springer, 2009.
    
    
10. [10].Howden, B. P., Giulieri, S. G., Wong Fok Lung, T., Baines, S. L., Sharkey, L. K., Lee, J. Y., Hachani, A., Monk, I. R., and Stinear, T. P. Staphylococcus aureus host interactions and adaptation. Nature Reviews Microbiology 21, 6 (2023), 380–395.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41579-023-00852-y&link_type=DOI) 

11. [11].Jang, H., Justice, S., Polgreen, P. M., Segre, A. M., Sewell, D. K., and Pemmaraju, S. V. Evaluating architectural changes to alter pathogen dynamics in a dialysis unit. In 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) (2019), IEEE, pp. 961–968.
    
    
12. [12].Jang, H., Justice, S., Polgreen, P. M., Segre, A. M., Sewell, D. K., and Pemmaraju, S. V. Evaluating architectural changes to alter pathogen dynamics in a dialysis unit: For the cdc mind-healthcare group. In Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (New York, NY, USA, 2020), ASONAM ‘19, Association for Computing Machinery, p. 961–968.
    
    
13. [13].Jang, H., Pai, S., Adhikari, B., and Pemmaraju, S. V. Risk-aware temporal cascade reconstruction to detect asymptomatic cases: For the cdc mind healthcare network. In 2021 IEEE International Conference on Data Mining (ICDM) (2021), IEEE, pp. 240–249.
    
    
14. [14].Kallen, A. J., Mu, Y., Bulens, S., Reingold, A., Petit, S., Gershman, K., Ray, S. M., Harrison, L. H., Lynfield, R., Dumyati, G., et al. Health care–associated invasive mrsa infections, 2005-2008. Jama 304, 6 (2010), 641–647.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1001/jama.2010.1115&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20699455&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F07%2F15%2F2024.07.14.24310393.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000280829200019&link_type=ISI) 

15. [15].Lowy, F. D. Staphylococcus aureus infections. New England journal of medicine 339, 8 (1998), 520–532.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJM199808203390806&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=9709046&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F07%2F15%2F2024.07.14.24310393.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000075447000006&link_type=ISI) 

16. [16].Luteijn, J., Hubben, G., Pechlivanoglou, P., Bonten, M., and Postma, M. Diagnostic accuracy of culture-based and pcr-based detection tests for methicillin-resistant staphylococcus aureus: a meta-analysis. Clinical microbiology and infection 17, 2 (2011), 146–154.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/j.1469-0691.2010.03202.x&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20219085&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F07%2F15%2F2024.07.14.24310393.atom) 

17. [17].Martinez, D. A., Levin, S. R., Klein, E. Y., Parikh, C. R., Menez, S., Taylor, R. A., and Hinson, J. S. Early prediction of acute kidney injury in the emergency department with machine-learning methods applied to electronic health record data. Annals of emergency medicine 76, 4 (2020), 501–514.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.annemergmed.2020.05.026&link_type=DOI) 

18. [18].Mietchen, M. S., Short, C. T., Samore, M., Lofgren, E. T., and in Healthcare Program (MInD-Healthcare), C. M. I. D. Examining the impact of icu population interaction structure on modeled colonization dynamics of staphylococcus aureus. PLoS Computational Biology 18, 7 (2022), e1010352.
    
    
19. [19].Montella, E., Alfano, R., Sacco, A., Bernardo, C., Ribera, I., Triassi, M., and Maria Ponsiglione, A. Healthcare associated infections in the neonatal intensive care unit of the “federico ii” university hospital: Statistical analysis and study of risk factors. In 2021 International Symposium on Biomedical Engineering and Computational Biology (New York, NY, USA, 2022), BECB 2021, Association for Computing Machinery.
    
    
20. [20].Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.
    
    
21. [21].Pei, S., Liljeros, F., and Shaman, J. Identifying asymptomatic spreaders of antimicrobial-resistant pathogens in hospital settings. Proceedings of the National Academy of Sciences 118, 37 (2021), e2111190118.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czoxODoiMTE4LzM3L2UyMTExMTkwMTE4IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjQvMDcvMTUvMjAyNC4wNy4xNC4yNDMxMDM5My5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

22. [22].Perreault, S. K., Binks, B., McManus, D. S., and Topal, J. E. Evaluation of the negative predictive value of methicillin-resistant staphylococcus aureus nasal swab screening in patients with acute myeloid leukemia. Infection Control & Hospital Epidemiology 42, 7 (2021), 853–856.
    
    
23. [23].Raschpichler, G., Raupach-Rosin, H., Akmatov, M. K., Castell, S., Rübsamen, N., Feier, B., Szkopek, S., Bautsch, W., Mikolajczyk, R., and Karch, A. Development and external validation of a clinical prediction model for mrsa carriage at hospital admission in southeast lower saxony, germany. Scientific reports 10, 1 (2020), 17998.
    
    
24. [24].Rocha, L. E., Singh, V., Esch, M., Lenaerts, T., Liljeros, F., and Thorson, A. Dynamic contact networks of patients and mrsa spread in hospitals. Scientific reports 10, 1 (2020), 1–10.
    
    
25. [25].Roth, J. A., Hornung-Winter, C., Radicke, I., Hug, B. L., Biedert, M., Abshagen, C., Battegay, M., and Widmer, A. F. Direct costs of a contact isolation day: a prospective cost analysis at a swiss university hospital. infection control & hospital epidemiology 39, 1 (2018), 101–103.
    
    
26. [26].Sharma, A., Leal, J., Kim, J., Pearce, C., Pillai, D. R., and Hollis, A. The cost of contact precautions: A systematic analysis. Canadian Journal of Infection Control 35, 4 (2020).
    
    
27. [27].Smith, D. L., Dushoff, J., Perencevich, E. N., Harris, A. D., and Levin, S. A. Persistent colonization and the spread of antibiotic resistance in nosocomial pathogens: resistance is a regional problem. Proceedings of the National Academy of Sciences 101, 10 (2004), 3709–3714.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czoxMToiMTAxLzEwLzM3MDkiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyNC8wNy8xNS8yMDI0LjA3LjE0LjI0MzEwMzkzLmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 

28. [28].Stone, P. W. Economic burden of healthcare-associated infections: an american perspective. Expert review of pharmacoeconomics & outcomes research 9, 5 (2009), 417–422.
    
    
29. [29].Suetens, C., Latour, K., Karki, T., Ricchizzi, E., Kinross, P., Moro, M. L., Jans, B., Hopkins, S., Hansen, S., Lyytikainen, O., et al. Prevalence of healthcare-associated infections, estimated incidence and composite antimicrobial resistance index in acute care hospitals and long-term care facilities: results from two european point prevalence surveys, 2016 to 2017. Eurosurveillance 23, 46 (2018), 1800516.
    
    
30. [30].Valiquette, L., Chakra, C. N. A., and Laupland, K. B. Financial impact of health care-associated infections: When money talks. Canadian Journal of Infectious Diseases and Medical Microbiology 25, 2 (2014), 71–74.
    
    
31. [31].van Kleef, E., Robotham, J. V., Jit, M., Deeny, S. R., and Edmunds, W. J. Modelling the transmission of healthcare associated infections: a systematic review. BMC infectious diseases 13 (2013), 1–13.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/1471-2334-13-1&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23280237&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F07%2F15%2F2024.07.14.24310393.atom)