RT Journal Article SR Electronic T1 Pretrained Patient Trajectories for Adverse Drug Event Prediction Using Common Data Model-based Electronic Health Records JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2024.09.30.24314595 DO 10.1101/2024.09.30.24314595 A1 Kim, Junmo A1 Kim, Joo Seong A1 Lee, Ji-Hyang A1 Kim, Min-Gyu A1 Kim, Taehyun A1 Cho, Chaeeun A1 Park, Rae Woong A1 Kim, Kwangsoo YR 2025 UL http://medrxiv.org/content/early/2025/03/14/2024.09.30.24314595.abstract AB Background. Pretraining electronic health record (EHR) data using language models by treating patient trajectories as natural language sentences has enhanced performance across various medical tasks. However, EHR pretraining models have never been utilized in adverse drug event (ADE) prediction. We constructed and externally validated the EHR pretraining model for several ADE prediction tasks and qualitatively analyzed the important features of each ADE cohort. Methods. A retrospective study was conducted on observational medical outcomes partnership (OMOP)-common data model (CDM) based EHR data from two separate tertiary hospitals. The data included patient information in various domains such as diagnosis, prescription, measurement, and procedure. For pretraining, codes were randomly masked, and the model was trained to infer the masked tokens utilizing preceding and following history. In this process, we adopted domain embedding (DE) to provide information about the domain of the masked token, preventing the model from finding codes from irrelevant domains. For qualitative analysis, we identified important features using the attention matrix from each finetuned model. Results. 510,879 and 419,505 adult inpatients from two separate tertiary hospitals were included in internal and external datasets. EHR pretraining model with DE outperformed all the other baselines in all cohorts. For feature importance analysis, we demonstrated that the results were consistent with priorly reported background clinical knowledge. In addition to cohort-level interpretation, patient-level interpretation was also available. Conclusions. EHR pretraining model with DE is a proper model for various ADE prediction tasks. The results of the qualitative analysis were consistent with background clinical knowledge.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis study did not receive any funding.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The Institutional Review Board (IRB) of Seoul National University Hospital (IRB approval No. 2406-060-1543) approved the study with a waiver of informed consent, considering that our study used retrospective and observational EHR data.I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).Yes I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesAll data produced in the present study are available upon reasonable request to the authors.