RT Journal Article SR Electronic T1 Evaluating machine learning approaches for multi-label classification of unstructured electronic health records with a generative large language model JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2024.06.24.24309441 DO 10.1101/2024.06.24.24309441 A1 Vithanage, Dinithi A1 Deng, Chao A1 Wang, Lei A1 Yin, Mengyang A1 Alkhalaf, Mohammad A1 Zhang, Zhenyua A1 Zhu, Yunshu A1 Soewargo, Alan Christy A1 Yu, Ping YR 2024 UL http://medrxiv.org/content/early/2024/06/27/2024.06.24.24309441.abstract AB Multi-label classification of unstructured electronic health records (EHR) poses challenges due to the inherent semantic complexity in textual data. Advances in natural language processing (NLP) using large language models (LLMs) show promise in addressing these issues. Identifying the most effective machine learning method for EHR classification in real-world clinical settings is crucial. Therefore, this experimental research aims to test the effect of zero-shot and few-shot learning prompting strategies, with and without Parameter Efficient Fine-tuning (PEFT) LLMs, on the multi-label classification of the EHR data set. The labels tested are across four clinical classification tasks: agitation in dementia, depression in dementia, frailty index, and malnutrition risk factors. We utilise unstructured EHR data from residential aged care facilities (RACFs), employing the Llama 2-Chat 13B-parameter model as our generative AI-based large language model (LLM). Performance evaluation includes accuracy, precision, recall, and F1 score supported by non-parametric statistical analyses. Results indicate the same level of performance with the same prompting template, either zero-shot or few-shot learning across the four clinical tasks. Few-shot learning outperforms zero-shot learning without PEFT. The study emphasises the significantly enhanced effectiveness of fine-tuning in conjunction with zero-shot and few-shot learning. The performance of zero-shot learning reached the same level as few-shot learning after PEFT. The analysis underscores that LLMs with PEFT for specific clinical tasks maintain their performance across diverse clinical tasks. These findings offer crucial insights into LLMs for researchers, practitioners, and stakeholders utilising LLMs in clinical document analysis.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis study did not receive any fundingAuthor DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The Human Research Ethics Committee of the University of Wollongong approved the study (Ethics Number 2019/159).I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesData is not available.