PT - JOURNAL ARTICLE AU - Gupta, Samir AU - Belouali, Anas AU - Shah, Neil J AU - Atkins, Michael B AU - Madhavan, Subha TI - Automated Identification of Patients with Immune-related Adverse Events from Clinical Notes using Word embedding and Machine Learning AID - 10.1101/2020.05.19.20106583 DP - 2020 Jan 01 TA - medRxiv PG - 2020.05.19.20106583 4099 - http://medrxiv.org/content/early/2020/05/26/2020.05.19.20106583.short 4100 - http://medrxiv.org/content/early/2020/05/26/2020.05.19.20106583.full AB - Immune Checkpoint Inhibitors (ICIs) have substantially improved survival in patients with advanced malignancies. However, ICIs are associated with a unique spectrum of side effects termed Immune-Related Adverse Events (irAEs). To ensure treatment safety, research efforts are needed to comprehensively detect and understand irAEs from real world data (RWD). The goal of this work is to evaluate a Machine Learning-based phenotyping approach that can identify patients with irAEs from a large volume of retrospective clinical notes representing RWD. Evaluation shows promising results with an average F1-score=0.75 and AUC-ROC=0.78. While the extraction of any available irAEs in charts achieves high accuracy, individual irAEs extraction has room for further improvement.Competing Interest StatementThe authors have declared no competing interest.Funding StatementGrant #1: NCI Cancer Center Support Grant for the Georgetown Lombardi Comprehensive Cancer Center ( P30 CA51008) Initials of the authors who received each award: SM, MBA, NJS Grant numbers awarded to each author: P30 CA51008 The full name of each funder: National Cancer Institute, NIH URL of each funder website: https://grants.nih.gov/grants/guide/pa-files/par-20-043.html The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Grant #2: NHGRI Clinical Genome Resource - Expert Curation and EHR Integration (U41 HG009650) Initials of the authors who received each award: SM, SG, AB Grant numbers awarded to each author: U41 HG009650 The full name of each funder: National Human Genome Research Institute, NIH URL of each funder website: https://www.genome.gov/Funded-Programs-Projects/ClinGen-Clinical-Genome-Resource The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesAll necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe parent study was conducted under an IRB study #2017-0559 approved by Georgetown University review board. The current project used a de-identified dataset from the registry and didn't require an IRB.