Deep-Learning Model for Personalized Prediction of Positive MRSA Culture Results Using Patient’s Time-Series Electronic Health Records
========================================================================================================================================

* Masayuki Nigo
* Laila Rasmy
* Ziqian Xie
* Bijun Sai Kannadath
* Degui Zhi

## Abstract

Methicillin-resistant Staphylococcus aureus (MRSA) is a common bacterial cause of morbidity and mortality. Our deep-learning model (PyTorch_EHR) processes time-series structured electronic health record (EHR) data, including previous cultures and antimicrobial exposures, to predict the lab result of MRSA culture positivity over the next two weeks. After training and evaluation on data from 8,164 MRSA and 22,563 non-MRSA patient events from Memorial Hermann Hospital System, Houston, Texas, the PyTorch_EHR outperformed traditional machine learning methods logistic regression and light GBM (Area Under the Curve of Receiver Operating Curve [AUC]PyTorch_EHR=91.12%, AUCLR=85.91%, AUCLGBM=89.11%). External validation using the MIMIC-IV dataset of 393,713 patient events from a tertiary care center in Boston, Massachusetts, confirmed PyTorch_EHR’s accuracy (AUCPyTorch_EHR=85.50%, AUCLR=83.24%, AUCLGBM=82.48%). The model maintained its accuracy across most subgroup analyses based on infection type. The cumulative incidence curves based on our model successfully high-, medium-, and low-risk patients. This study demonstrates the potential of deep-learning models to predict the presence of MRSA-positive cultures to optimize MRSA antimicrobial therapy.

## Background

Methicillin-Resistant *Staphylococcus aureus* (MRSA) is one of the common pathogens causing both hospital-acquired and community-associated infections.1 Since this unique pathogen eliminates the majority of beta-lactam class antibiotics as a treatment option, physicians often need to add an antibiotic, such as vancomycin, to empirically treat this pathogen when suspected. Considering the side effect profile of vancomycin and antibiotic stewardship standpoint, it is highly desirable to avoid unnecessary antimicrobial therapy.2 Furthermore, a recent study showed the absolute benefit of empirical therapy against MRSA is 0.1% or less.3 Therefore, accurately identifying those high risk patients is critical to avoid unnecessary side effects from the empirical therapy with preserving the benefit of treatment. Though multiple clinical factors have been proposed as risk factors for MRSA infection, they have multiple limitations. 4,5 Commonly, the tested population is limited to specific populations, such as patients with ventilator-associated pneumonia.6 Due to the complex association of each risk factor, it is often difficult to discern actual risks when multiple risk factors simultaneously exist. For example, previous exposure to cephalosporine and fluoroquinolone were considered risk factors.7,8 The risk seems to accumulate when multiple antibiotics were previously prescribed.9 Further, the optimal timeline between the index infection and the presence of the risk factor is not well established, and often arbitrary duration is used.10 More flexible models which can integrate multiple risk factors and various timing of the risk factors are warranted for frontline physicians to safely decide the necessity of empirical antibiotic therapy.

Electronic medical records (EHR) became widely available in the U.S. since the Meaningful Use program was introduced as part of the 2009 Health Information Technology for Economic and Clinical Health Act. EHR became a rich data source for daily clinical practice and research purposes. The more data in EHR expands, the more information become available at physicians’ disposal to process and interpret to determine the patients’ management. Artificial intelligence (A.I.) has become an attractive technology to process real-time big EHR data in health care and achieve personalized medicine by processing a wide range of data. A.I. could reveal complex relationships among numerous factors in EHRs. Additionally, A.I. has been applied to various types of data, such as genetic, image, and EHR data.11,12 However, a few models currently predict drug-resistant bacteria.13–15 Even those available models only use very limited input data, such as basic demographics and previous susceptibility results or a limited number of patients.13 Furthermore, the model only predicts the index culture or screening, which may not be optimal in clinical use to guide antibiotic therapy.16 Deep-leaning-based models, such as recurrent neural network (RNN) models, have a significant advantage in time sequence events because the fundamental model structure allows sequential inputs into the model. Further, RNNs with medical code embedding can take inputs directly from a real-time EHR data stream, automatically adjust to reflect subtle changes, and provide real-time outputs. PyTorch_EHR has been successfully applied to predict various clinical outcomes. 14–16 Despite the potentially high expressive power of deep learning models, no deep learning models using EHR data for predicting MRSA cultures are available.

This study aims to create a deep-learning-based prediction model for positive MRSA culture using big time-series electronic health record data and compare the traditional machine learning models. Further, we evaluated the model’s generalizability in external EHR data in a different region in the U.S.

## Method

Two datasets were used in this study; Memorial Hermann Hospital System (MHHS) and Medical Information Mart for Intensive Care (MIMIC) - IV for model training and external validation. Patient data were retrospectively retrieved from the microbiology database at Memorial Hermann Hospital System, Houston, Texas. MIMIC-IV ver. 2.1 is a relational de-identified EHR database containing real hospital encounters from a tertiary academic medical center in Boston, MA, USA.20

In MHH datasets, patients aged more than 18 years old were obtained from 1/2018 and 4/2021. Those patients had at least bacterial cultures during the study period. To avoid an imbalanced dataset, we randomly selected approximately 5000 patients with MRSA, non-MRSA positive cultures, including MSSA and other types of bacteria, and negative cultures. Demographic data, admission data, diagnostic & procedure codes, antibiotic administration, other ID related test results, and previous microbiological data, including the type of cultures, name of bacteria, and all antibiotic sensitivities, are obtained from the database. The admission ward information was converted only to emergency department (ED), intensive care unit (ICU), and intermediate unit (IMU) so that that information can be generalizable to MIMIC-IV data. Microbiology table includes cultures and other infectious diseases tests, such as serologies. To avoid any label leakage, we used only results reported by the index time. The laboratory orders were included without results if they were ordered by the index time. For diagnostic and procedure codes, ICD-9 or 10 codes were used. Since other data tables, such as antibiotics, did not contain standardized codes for medications, free text, such as “vancomycin”, was used. Extracted data were cleaned and converted to categorical data to fit Pytorch_EHR scheme.

Regarding MIMIC-IV data, all patients aged more than 18 years old who had bacterial cultures were similarly retrieved. To validate the generalizability of the model, each data table was mapped with the MHHS data table. Only data mapped with MMHS data were used in MIMIC-IV dataset. Since MIMIC-IV datasets aggregated the ICD and procedure codes at each encounter level, we only used the ICD or procedure codes only reported in the previous encounters to avoid label leakage. Microbiology event table was used to identify the patient. A total of 25,599 *S. aureus* from various cultures were found in the table. 19,605 isolates (76.6 %) were tested for antimicrobial sensitivity for various reasons; multiple positive cultures with *S. aureus* in a short period and positive wound cultures due to multiple organisms. After removing positive *S. aureus* within seven days after positive MSSA or MRSA, only *519 S. aureus* isolates did not have any recent sensitivity to classify MRSA or MSSA. Those isolates were classified into the negative MRSA group. We further divided the datasets as 70:10:20 to fine-tune the pre-trained model with MHHS datasets.

In terms of subgroup analysis, we used ICD codes to identify the patients with the ICD code within the two weeks period in the MMHS dataset. Since MIMIC-IV only provided ICD codes at the encounter levels, we used the encounter to find the patients with the ICD codes within the encounter.

### PyTorch_EHR Prediction Model Scheme

We set a two-week window for the prediction, and any first culture within the window was used as an index culture (Figure 1). Considering the majority of MRSA infections or new infections are diagnosed within two weeks, we decided to use a two-week window. This prediction window allows not only predicting at the time of culture but also cultures obtained after initiation of empirical antibiotics, which is essential for physicians to decide whether empirically start or continue MRSA antibiotic coverage. Some patients had multiple cultures over time, including both positive MRSA and non-MRSA cultures. Those patients can be included in both MRSA and non-MRSA groups, depending on the type of positive or negative culture the patient had during the window period.

![Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/06/12/2023.06.08.23291072/F1.medium.gif)

[Figure 1.](http://medrxiv.org/content/early/2023/06/12/2023.06.08.23291072/F1)

Figure 1. Schematic Structure of Deep-Learning Based Prediction Model for Positive MRSA Cultures

In terms of the deep-learning platform, we use Pytorch\_EHR for clinical outcomes predictions using categorical data from electronic medical records. Pytorch\_EHR implemented a model of recurrent neural networks (RNN). We choose the gated recurrent unit (GRU) RNN architecture, which is known for being an efficient sequential deep learning architecture for clinical event predictions. (See S Figure 1) The source code of this model is publicly available to enable its application and further evaluation by other researchers.21 In addition to categorical data, Pytorch_EHR handles the time difference between visits for a better temporal representation of patient history to improve accuracy. (See S Figure 2)22,23 In the project, we converted the interval to days from visits to accommodate the predictions for more acute issues.

![](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/06/12/2023.06.08.23291072/F2/graphic-2.medium.gif)

[](http://medrxiv.org/content/early/2023/06/12/2023.06.08.23291072/F2/graphic-2)

![](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/06/12/2023.06.08.23291072/F2/graphic-3.medium.gif)

[](http://medrxiv.org/content/early/2023/06/12/2023.06.08.23291072/F2/graphic-3)

Figure 2: 
Cumulative Incidence Curve Over Two Weeks in MHHS and MIMIC-IV Datasets

For binary classification tasks, we compared our model against traditional machine learning algorithms: logistic regression24 and light gradient boost machine.25 For survival prediction, we used the DeepSurv26 architecture while replacing the multiple-layer perceptron layers with GRU layers for better sequential information modeling, similar to the way we modeled COVID-19 outcome prediction.18 Pytorch ver. 1.7.1 and Sklearn ver. 0.24.2 are used. After hyperparameter tuning for each model, we run ten times for each model training and test datasets to obtain the average of our model performance and confidence intervals.

This study is approved by Institutional Review Boards at the University of Texas Health Science Center at Houston and Memorial Hermann Hospital System.

### Model interpretation

For the interpretation of MRSA predictions, we used the integrated gradient technique27 to expose the factors that contribute to the personalized model predictions. For recurrent neural network-based models, we can achieve a more personalized explanation that shows the contribution scores for each clinical event at each patient day in the patient trajectory. This is unlike the standard population-level feature contributions that we can infer from logistic regression coefficients or light gradient boost machine feature importance scores. To evaluate our RNN-based model explainability, we reviewed the calculated contribution scores for each clinical event in the input of 10 patients. For facilitation, we visualized the contribution score per patient through a Tableau interactive dashboard, where clinicians can navigate different clinical events within different categories and along different visits within the patient history. This study was approved by internal review boards (IRBs) of the University of Texas Health Science Center at Houston and the Memorial Hermann Hospital System.

## Results

A total of 30,727 and 156,113 patients were identified in the MHHS and MIMIC-IV database, respectively. As described under Methods, a total of 30,727 patients in MHHS database were randomly selected among them. The aggregated patient characteristics are described in table 1. Since some patients were classified into MRSA and MSSA groups, the total number of patients in both groups was 30,727. The patient features were used once if the patient had two or more events in the same group. The last demographic features were used to describe the characteristics when patients are classified more than twice into one group. Overall, MRSA group had a higher number of ICU admission (4.3% vs 2.0% in MHHS and 31.7% vs. 16.7% in MIMIC-IV) and ED patients (66.4% vs. 36.5% in MHHS and 51.3% vs 35.0% in MIMIC-IV). As MIMIC-IV was originally developed based on ICU database, MIMIC-IV database included a higher number of ICU patients. Intermediate care unit (IMU) status was not included in the MIMIC-IV data. The most common age group was 55 - 65 years old in all groups. Gender was equally distributed in all groups. As expected, given the origin of data (MHHS for Houston and MIMIC for Boston), MHHS databases had more Hispanic compared to MIMIC IV (10.5 % vs. 3 – 4 %, respectively). Among races, Caucasian was most common in all groups. In terms of antibiotics administration, vancomycin is the most common antibiotic used followed by cefepime in MHHS. Ceftriaxone is the second most common antibiotic in MIMIC-IV database. Regarding the cultures, blood and urine cultures are common cultures taken during the study periods.

View this table:
[Table 1.](http://medrxiv.org/content/early/2023/06/12/2023.06.08.23291072/T1)

Table 1. Characteristics of Patients with Positive MRSA cultures and without Positive MRSA culturesa

Table 2 summarizes the bacteria and diagnostic codes identified within the event periods. Of note, *S. aureus* and enterococci were the common bacteria in MRSA groups, whereas E. coli is the most common in Non-MRSA group. Bacteremia and skin soft tissue infection are more commonly seen in MRSA groups (8.0% in MHHS and 5.0% in MIMIC-IV). UTI was the most common diagnosis in non-MRSA groups (56.0% in MHHS and 13.8% in MIMIC-IV).

View this table:
[Table 2:](http://medrxiv.org/content/early/2023/06/12/2023.06.08.23291072/T2)

Table 2: 
Name of Bacteria Identified from Cultures and Type of Infections Based on ICD Codes

Table 3 shows the prediction accuracy of the models. Deep-learning model, Pytorch\_EHR model exhibited the highest Area Under the Curve of Receiver Operating Curve (AUROC) of 91.12% (91.023-91.214) (ROC curve: S Figure 3) compared to other machine learning models (LR 85.91% [85.911 – 85.912], and LGBM 88.51% [89.085 – 89.145]) in MMHS database. The findings were also consistently seen in MIMIC-IV database (Pytorch_EHR: 85.50%, LR: 83.24%, and LGBM 82.48%). We also evaluated the AUROC in each patient group with a specific diagnosis during the event. Although the AUC decreased by 5 - 10%, we had acceptable accuracy in each infection in MHHS database.

View this table:
[Table 3.](http://medrxiv.org/content/early/2023/06/12/2023.06.08.23291072/T3)

Table 3. 
Outcome of Models in Overall and Subgroup Analyses

Figure 2 shows the cumulative incidence curve of MRSA culture over two weeks from the index culture. In both databases, our model nicely differentiated the patients with high and low risks of MRSA positive cultures. The cumulative incidence of MRSA positive culture in the MRSA group in MHHS database was 70.4%, whereas the incidence of in MIMIC-IV was approximately 11.9 %. The low incidence despite in high risk in MIMIC-IV was likely due to the overall incidence of positive MRSA culture in MIMIC-IV database.

Finally, we provide the visualization of the feature importance of example patients (S Figure 5). The degree of contribution of each feature is visualized as a bar graph. For example, the example patient #1 is a female patient aged between 45 – 54 years with multiple underlying multiple comorbidities listed on admission two days (−2 days) before the index culture (blood culture on index date). Our model identified a risk score of 0.541 (predicted as a positive patient). After the patient was admitted to the hospital, vancomycin and meropenem were initiated, and a blood culture was ordered. Subsequently, cultures identified MRSA over two weeks. We also visualized other example patients in S Figure 5. Of note, the same features may have different degrees of importance depending on the timing when the features are inputted to the model. In addition, the highly contributing features in this patient are not necessarily important features for other patients.

## Discussion

In this study, our deep-learning based MRSA predictive model outperformed other machine-learning models in both real-world MMHS and MIMIC-IV datasets. The model successfully “learned” patient-specific features to provide personalized risks of positive MRSA cultures over two weeks from index time. The model maintained better predictions even after transferring from MHHS dataset to MIMIC-IV dataset and tolerated the significantly imbalanced outcome in MIMIC-IV dataset. Compared to other existing models, our model successfully predicted the positive MRSA cultures not only on the index day but also over two weeks from the index day. This prediction window is more aligned with the daily clinical practice of physicians since physicians decide empirical antibiotic therapy to treat MRSA, such as intravenous vancomycin, not only for the culture of index day but also any subsequent cultures that may be related to the episode of infection after initiation of therapy. Our deep learning model takes the time sequence of the events in the patient history, which we believe is more consistent with the physician’s assessment in clinical practice. We also tested the model in various types of infection posing various MRSA risks, such as sepsis, bacteremia, and pneumonia. Although there were some decreases in the AUC, superiority and high performance are maintained, which supports this single model to be used in multiple types of infections.

Personalized medicine is of great interest in medical fields. Many studies in this matter focus more on genetic-based predictions rather than based on clinical data from electronic health records (EHR).28 EHR data have become a rich source of real-world data and provide invaluable information. Even without genetic data, we believe EHR data can be a great source for deep-learning models to achieve personalized medicine in multiple clinical settings. Furthermore, compared to traditional machine learning models, deep-learning can easily integrate time sequence data as inputs into the model, which provides significant advantages for those outcome predictions requiring sequential event inputs. Although Pytorch_EHR only uses categorical data from EHR, this model has provided high performance with advantages of relatively simpler preprocessing steps and flexible variable selections for input. This allows us to preserve model transferability and generalizability among different data sources.

Since MRSA emerged, multiple predictive models or risk factors for MRSA infections were proposed. Although the models provided various accuracy, the models often focus on a certain type of infection, such as pneumonia, to achieve and simplify the risk factors and models. Rhodes et al. used a machine learning model to predict community-acquired MRSA pneumonia.29 Although the time frame and patient population differed from our study, their model achieved AUC of 77.5%, which was a lower AUC than ours. Additionally, some risk factor-based models heavily rely on certain tests, such as the nasal MRSA PCR test from nare30, which hampers the model’s generalizability due to the tests’ availability and applicability to other types of infections. Further, some of the results may not be available at the time of starting antibiotics, which limits the usability of models in hospitals. On the contrary, our model carries a significant advantage since the model could take widely available data from EHR and can predict the outcomes even with some missing certain tests for the model. Our model can be used not only for the treatment decision but also, although the utility of contact precaution is still controversial, for Infection prevention to isolate the patients with high-risk groups in advance, even before culture results. Our model used a two-week time window to provide more meaningful predictions in clinical settings. Some predictive models only predict the index culture rather than overall risks.16 In order to be applied in the clinical setting, predicting a two-week window can be more impactful to clinicians when they choose antimicrobial therapy at the time of the initiation. Our cumulative incident curves based on our model prediction nicely differentiated the high and low-risk patients. The majority of patients had positive MRSA cultures on the index day, but approximately 15 % of high-risk patients had positive cultures after the index day, which could be missed if we only predicted the positivity of the index culture.

One of the challenges in deep-learning models is explainability of the models. As provided in S Figure 5, we visualized individual factors contributing to the model predictions. Since the model uses the sequence of time without dichotomizing the time frame with arbitrary cutoff, i.e., positive MRSA culture within 90 days, the contribution weight can be different depending on the patient and timing of events. Furthermore, those highly contributed events were not necessarily directly associated with the predictions of MRSA. The inputs could surrogate other underlying events. Caution is required to interpret the feature importance as those outputs can not be traditional risk factors we use in clinical settings.

There are multiple limitations in this study. First, due to the nature of retrospective studies, potential biases are not evitable. The findings in this study should be confirmed in prospective studies. In addition, although MHHS data are from Houston, Texas, and MIMIC IV is from Boston, Massachusetts, U.S., the model should be validated in different patient populations and high-risk populations, such as immunocompromised patients. Second, this model predicts positive MRSA cultures rather than infections. Since some patients may still have MRSA infections without positive cultures, the model should be used cautiously when there are significant concerns about MRSA when initiating antibiotics. Third, although we included multiple variables in the model, several important variables as known MRSA risk factors, such as residence in a long-term care facility, were not included. Furthermore, vital signs or other basic laboratory results were not included in this model. Those can be considered for future studies. Finally, although we showed the generalizability of the model in this study, the transferability of the model needs to be solved to use the deep-learning model widely.

## Conclusion

In this study, our deep-learning based predictive model successfully predicted positive MRSA culture over two weeks from index culture. Our study revealed the superiority against other traditional machine learning models in both MMHS and MIMIC-IV datasets with high performance, even in significantly imbalanced datasets and subgroup analyses. The model can be widely applied to various types of infections. Further studies in high-risk populations and prospective studies are warranted to confirm the findings.

## Supporting information

Supplemental File [[supplements/291072_file03.docx]](pending:yes)

## Data Availability

All data produced in the present study are available upon reasonable request to the authors

*   Received June 8, 2023.
*   Revision received June 8, 2023.
*   Accepted June 12, 2023.


*   © 2023, Posted by Cold Spring Harbor Laboratory

The copyright holder for this pre-print is the author. All rights reserved. The material may not be redistributed, re-used or adapted without the author's permission.

## References

1.  1.Liu, C. et al. Clinical practice guidelines by the infectious diseases society of america for the treatment of methicillin-resistant Staphylococcus aureus infections in adults and children. Clin Infect Dis 52, e18–55 (2011).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/cid/ciq146&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21208910&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F06%2F12%2F2023.06.08.23291072.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000286216400001&link_type=ISI) 

2.  2.Rybak, M. et al. Therapeutic monitoring of vancomycin in adult patients: a consensus review of the American Society of Health-System Pharmacists, the Infectious Diseases Society of America, and the Society of Infectious Diseases Pharmacists. Am J Health Syst Pharm 66, 82–98 (2009).
    
    [FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiRlVMTCI7czoxMToiam91cm5hbENvZGUiO3M6NDoiYWpocCI7czo1OiJyZXNpZCI7czo3OiI2Ni8xLzgyIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjMvMDYvMTIvMjAyMy4wNi4wOC4yMzI5MTA3Mi5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

3.  3.Carey, G. B. et al. Estimated mortality with early empirical antibiotic coverage of methicillin-resistant Staphylococcus aureus in hospitalized patients with bacterial infections: a systematic review and meta-analysis. Journal of Antimicrobial Chemotherapy 78, 1150–1159 (2023).
    
    

4.  4.Hidron, A. I. et al. Risk factors for colonization with methicillin-resistant Staphylococcus aureus (MRSA) in patients admitted to an urban hospital: emergence of community-associated MRSA nasal carriage. Clin Infect Dis 41, 159–166 (2005).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1086/430910&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=15983910&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F06%2F12%2F2023.06.08.23291072.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000229890800003&link_type=ISI) 

5.  5.Szumowski, J. D. et al. Methicillin-resistant Staphylococcus aureus colonization, behavioral risk factors, and skin and soft-tissue infection at an ambulatory clinic serving a large population of HIV-infected men who have sex with men. Clin Infect Dis 49, 118–121 (2009).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1086/599608&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19480576&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F06%2F12%2F2023.06.08.23291072.atom) 

6.  6.Shorr, A. F. et al. A risk score for identifying methicillin-resistant Staphylococcus aureus in patients presenting to the hospital with pneumonia. BMC Infect Dis 13, 268 (2013).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/1471-2334-13-268&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23742753&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F06%2F12%2F2023.06.08.23291072.atom) 

7.  7.MacDougall, C., Powell, J. P., Johnson, C. K., Edmond, M. B. & Polk, R. E. Hospital and community fluoroquinolone use and resistance in Staphylococcus aureus and Escherichia coli in 17 US hospitals. Clin Infect Dis 41, 435–440 (2005).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1086/432056&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=16028149&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F06%2F12%2F2023.06.08.23291072.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000230611000003&link_type=ISI) 

8.  8.Asensio, A., Guerrero, A., Quereda, C., Lizán, M. & Martinez-Ferrer, M. Colonization and infection with methicillin-resistant Staphylococcus aureus: associated factors and eradication. Infect Control Hosp Epidemiol 17, 20–28 (1996).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1086/647184&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=8789683&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F06%2F12%2F2023.06.08.23291072.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1996TR24300006&link_type=ISI) 

9.  9.Schneider-Lindner, V., Delaney, J. A., Dial, S., Dascal, A. & Suissa, S. Antimicrobial drugs and community-acquired methicillin-resistant Staphylococcus aureus, United Kingdom. Emerg Infect Dis 13, 994–1000 (2007).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.3201/eid1307.061561&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=18214170&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F06%2F12%2F2023.06.08.23291072.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000247758200005&link_type=ISI) 

10. 10.Huang, S. S. & Platt, R. Risk of Methicillin-Resistant Staphylococcus aureus Infection after Previous Infection or Colonization. Clinical Infectious Diseases 36, 281–285 (2003).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1086/345955&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=12539068&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F06%2F12%2F2023.06.08.23291072.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000180546700006&link_type=ISI) 

11. 11.Anahtar, M. N., Yang, J. H. & Kanjilal, S. Applications of Machine Learning to the Problem of Antimicrobial Resistance: an Emerging Model for Translational Research. J Clin Microbiol 59, e0126020 (2021).
    
    

12. 12.Kim, J. I. et al. Machine Learning for Antimicrobial Resistance Prediction: Current Practice, Limitations, and Clinical Perspective. Clin Microbiol Rev 35, e00179–21 (2022).
    
    

13. 13.Feretzakis, G. et al. Using Machine Learning Algorithms to Predict Antimicrobial Resistance and Assist Empirical Treatment. Stud Health Technol Inform 272, 75–78 (2020).
    
    

14. 14.Hsu, C.-C., Lin, Y. E., Chen, Y.-S., Liu, Y.-C. & Muder, R. R. Validation Study of Artificial Neural Network Models for Prediction of Methicillin-Resistant Staphylococcus aureus Carriage. Infect. Control Hosp. Epidemiol. 29, 607–614 (2008).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1086/588588&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=18549315&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F06%2F12%2F2023.06.08.23291072.atom) 

15. 15.Lewin-Epstein, O., Baruch, S., Hadany, L., Stein, G. Y. & Obolski, U. Predicting Antibiotic Resistance in Hospitalized Patients by Applying Machine Learning to Electronic Medical Records. Clinical Infectious Diseases 72, e848–e855 (2021).
    
    

16. 16.Hirano, Y. et al. Machine Learning Approach to Predict Positive Screening of Methicillin-Resistant Staphylococcus aureus During Mechanical Ventilation Using Synthetic Dataset From MIMIC-IV Database. Front Med (Lausanne) 8, 694520 (2021).
    
    

17. 17.Nigo, M. et al. PK-RNN-V E: A deep learning model approach to vancomycin therapeutic drug monitoring using electronic health record data. J Biomed Inform 133, 104166 (2022).
    
    

18. 18.Rasmy, L. et al. Recurrent neural network models (CovRNN) for predicting outcomes of patients with COVID-19 on admission to hospital: model development and validation using electronic health record data. Lancet Digit Health S2589-7500(22)00049–8 (2022) doi:10.1016/S2589-7500(22)00049-8.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S2589-7500(22)00049-8&link_type=DOI) 

19. 19.Rasmy, L. et al. Representation of EHR data for predictive modeling: a comparison between UMLS and other terminologies. J Am Med Inform Assoc 27, 1593–1599 (2020).
    
    

20. 20.Goldberger, A. L. et al. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101, E215–220 (2000).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1161/01.CIR.101.23.e215&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=10851218&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F06%2F12%2F2023.06.08.23291072.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000087571900001&link_type=ISI) 

21. 21.ZhiGroup. Predictive Modeling on Electronic Health Records(EHR) using Pytorch. (2023).
    
    

22. 22.Choi, E. et al. RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism. in Advances in Neural Information Processing Systems vol. 29 (Curran Associates, Inc., 2016).
    
    

23. 23.Wu, S. et al. Modeling asynchronous event sequences with RNNs. J Biomed Inform 83, 167–177 (2018).
    
    

24. 24.sklearn.linear_model.LogisticRegression. scikit-learn [https://scikit-learn/stable/modules/generated/sklearn.linear\_model.LogisticRegression.html](https://scikit-learn/stable/modules/generated/sklearn.linear_model.LogisticRegression.html).
    
    

25. 25.Welcome to LightGBM’s documentation! — LightGBM 3.3.2 documentation. [https://lightgbm.readthedocs.io/en/v3.3.2/](https://lightgbm.readthedocs.io/en/v3.3.2/).
    
    

26. 26.Katzman, J. L. et al. DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Med Res Methodol 18, 24 (2018).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s12874-018-0482-1&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F06%2F12%2F2023.06.08.23291072.atom) 

27. 27.Sundararajan, M., Taly, A. & Yan, Q. Axiomatic attribution for deep networks. in Proceedings of the 34th International Conference on Machine Learning - Volume 70 3319–3328 (JMLR.org, 2017).
    
    

28. 28.Abul-Husn, N. S. & Kenny, E. E. Personalized Medicine and the Power of Electronic Health Records. Cell 177, 58–69 (2019).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.cell.2019.02.039&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F06%2F12%2F2023.06.08.23291072.atom) 

29. 29.Rhodes, N. J. et al. Machine Learning To Stratify Methicillin-Resistant Staphylococcus aureus Risk among Hospitalized Patients with Community-Acquired Pneumonia. Antimicrobial Agents and Chemotherapy 67, e01023–22 (2022).
    
    

30. 30.Baby, N. et al. Nasal Methicillin-Resistant Staphylococcus aureus (MRSA) PCR Testing Reduces the Duration of MRSA-Targeted Therapy in Patients with Suspected MRSA Pneumonia. Antimicrob Agents Chemother 61, e02432–16 (2017).