Abstract
While Enterobacteriaceae bacteria are commonly found in healthy human gut, their colonisation of other body parts can potentially evolve into serious infections and health threats. We aim to design a graph-based machine learning model to assess risks of inpatient colonisation by multi-drug resistant (MDR) Enterobacteriaceae. The colonisation prediction problem was defined as a binary classification task, where the goal is to predict whether a patient is colonised by MDR Enterobacteriaceae in an undesirable body part during their hospital stay. To capture topological features, interactions among patients and healthcare workers were modelled using a graph structure, where patients are described by nodes and their interactions by edges. Then, a graph neural network (GNN) model was trained to learn colonisation patterns from the patient network enriched with clinical and spatiotemporal features. The GNN model predicts colonisation risk with an AUROC of 0.93 (95% CI: 0.92-0.94), 7% above a logistic regression baseline (0.86 [0.85-0.87]). Comparing different graph topologies, the configuration that considers only in-ward edges (0.93 [0.92-0.94]) outperforms the configurations that include only out-ward edges (0.86 [0.85-0.87]) and both edges (0.90 [0.89-0.91]). For the top-3 most prevalent MDR Enterobacteriaceae, the AUROC varies from 0.92 (0.90-0.93) for Escherichia coli up to 0.95 (0.92-0.98) for Enterobacter cloacae, using the GNN – in-ward model. Topological features via graph modelling improves the performance of machine learning models for Enterobacteriaceae colonisation prediction. GNNs could be used to support infection prevention and control programmes to detect patients at risk of colonisation by MDR Enterobacteriaceae and other bacteria families.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
HES-SO and the Portuguese Polytechnics Coordinating Council (CCISP). SGP also acknowledges Fundação para a Ciência e Tecnologia for her direct funding (CEECINST/00051/2018) and her research unit (UIDB/05704/2020).
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The study used the publicly available MIMIC-III dataset. The use of this dataset was approved by the Institutional Review Boards of Beth Israel Deaconess Medical Center (BIDMC) and the Massachusetts Institute of Technology (MIT). Given the de-identified nature of the data, our institution waived the need for further ethical approval for this study.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
(racha.gouareb{at}unige.ch)
(dimitrios.proios{at}unige.ch)
(sonia.pereira{at}ipleiria.pt)
(douglas.teodoro{at}unige.ch)
Data Availability
The data that support the findings of this study are available from the MIMIC-III database, which is provided by the MIT Lab for Computational Physiology and can be accessed at https://mimic.physionet.org. MIMIC-III is a large, freely-available database comprising de-identified health-related data associated with over forty thousand patients who stayed in critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012. Please note that access to the MIMIC-III database requires the completion of a data use agreement and proof of completion of a course in human subject research ethics, such as the CITI 'Data or Specimens Only Research' course.