RT Journal Article SR Electronic T1 Matching Patients to Clinical Trials using LLaMA 2 Embeddings and Siamese Neural Network JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2024.06.28.24309677 DO 10.1101/2024.06.28.24309677 A1 Chowdhury, Shaika A1 Rajaganapathy, Sivaraman A1 Yu, Yue A1 Tao, Cui A1 Vassilaki, Maria A1 Zong, Nansu YR 2024 UL http://medrxiv.org/content/early/2024/06/30/2024.06.28.24309677.1.abstract AB Patient recruitment is a key desideratum for the success of a clinical trial that entails identifying eligible patients that match the selection criteria for the trial. However, the complexity of criteria information and heterogeneity of patient data render manual analysis a burdensome and time-consuming task. In an attempt to automate patient recruitment, this work proposes a Siamese Neural Network-based model, namely Siamese-PTM. Siamese-PTM employs the pretrained LLaMA 2 model to derive contextual representations of the EHR and criteria inputs and jointly encodes them using two weight-sharing identical subnetworks. We evaluate Siamese-PTM on structured and unstructured EHR to analyze their predictive informativeness as standalone and collective feature sets. We explore a variety of deep models for Siamese-PTM’s encoders and compare their performance against the Single-encoder counterparts. We develop a baseline rule-based classifier, compared to which Siamese-PTM improved performance by 40%. Furthermore, visualization of Siamese-PTM’s learned embedding space reinforces its predictive robustness.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis study is supported by the National Institute of Health (NIH) NIGMS (R00GM135488) and NIH R01AG084236.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The use of data for this study was approved by the Mayo Clinic Institutional Review Board.I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesProtected Health Information (P HI) restrictions apply to the availability of the clinical data here, which were used under IRB approval for use only in the current study. As a result, this dataset is not publicly available. Qualified researchers affiliated with the Mayo Clinic may apply for access to these data through the Mayo Clinic Institutional Review Board.