Generative AI and Large Language Models in Reducing Medication Related Harm and Adverse Drug Events – A Scoping Review ======================================================================================================================== * Jasmine Chiat Ling Ong * Chen Michael * Ning Ng * Kabilan Elangovan * Nichole Yue Ting Tan * Liyuan Jin * Qihuang Xie * Daniel Shu Wei Ting * Rosa Rodriguez-Monguio * David W. Bates * Nan Liu ## Abstract **Background** Medication-related harm has a significant impact on global healthcare costs and patient outcomes, accounting for deaths in 4.3 per 1000 patients. Generative artificial intelligence (GenAI) has emerged as a promising tool in mitigating risks of medication-related harm. In particular, large language models (LLMs) and well-developed generative adversarial networks (GANs) showing promise for healthcare related tasks. This review aims to explore the scope and effectiveness of generative AI in reducing medication-related harm, identifying existing development and challenges in research. **Methods** We searched for peer reviewed articles in PubMed, Web of Science, Embase, and Scopus for literature published from January 2012 to February 2024. We included studies focusing on the development or application of generative AI in mitigating risk for medication-related harm during the entire medication use process. We excluded studies using traditional AI methods only, those unrelated to healthcare settings, or concerning non-prescribed medication uses such as supplements. Extracted variables included study characteristics, AI model specifics and performance, application settings, and any patient outcome evaluated. **Findings** A total of 2203 articles were identified, and 14 met the criteria for inclusion into final review. We found that generative AI and large language models were used in a few key applications: drug-drug interaction identification and prediction; clinical decision support and pharmacovigilance. While the performance and utility of these models varied, they generally showed promise in areas like early identification and classification of adverse drug events and support in decision-making for medication management. However, no studies tested these models prospectively, suggesting a need for further investigation into the integration and real-world application of generative AI tools to improve patient safety and healthcare outcomes effectively. **Interpretation** Generative AI shows promise in mitigating medication-related harms, but there are gaps in research rigor and ethical considerations. Future research should focus on creation of high-quality, task-specific benchmarking datasets for medication safety and real-world implementation outcomes. ## Introduction Medication-related harm poses significant health and economic burden globally. The global prevalence of medication related harm was 12%, of which 15% was severe and fatal, causing a mortality rate of up 4.3 per 1000 patients.1 In contrast, cardiovascular deaths caused by ischemic heart disease accounted for 1.09 deaths per 1000 patients.2 The economic burden of medication-related harm is estimated at $30.1 billion and 79 billion euros in United States and Europe respectively.3 Medication-related harm, also termed as adverse drug events (ADEs) include preventable or non-preventable harm caused by interventions related to medication use.4 Preventable medication error can occur at any step from the physician prescribing medications to the patient receiving the medication. In turn, ADEs are under active surveillance during healthcare delivery to patients with health systems or pharmacovigilance activities. Advances in artificial intelligence (AI), digitization of health records, and accessibility to electronic patient records have been shown to reduce the occurrence, duration, and severity of ADEs.5–7 An AI-powered system has been reported to reduce inpatient prescribing error by up to 20%.8 However, traditional predictive models are still limited by the lack of in-depth clinical reasoning, poor interoperability in electronic health record systems (EHRs), difficulty in detecting rare events or interactions, and paucity of models that leverage unstructured data. Based on existing system, overlooked ADEs would lead to significant healthcare complications while trivial or clinical insignificant effects are over emphasized leading to healthcare administrative burden. Thus, with significant promises to address such unbalanced issue, generative AI (GenAI) and large language models (LLMs) may enable novel approaches previously unfeasible with conventional methods. For instance, preliminary studies have explored the potential of ChatGPT to recognize adverse drug reactions9, pharmacovigilance signal detection10, and automated medication chart review.11,12 This systematic review summarizes the breadth and depth of existing literature on how generative AI have been utilized to reduce ADEs and highlights areas for future investigation. ## Methods ### Search Strategy and Selection Criteria This systematic review was conducted according to PRISMA guidelines.16 We searched PubMed, Web of Science, Embase, and Scopus to identify studies published between 1st January 2012 to 18th February 2024, related to application of generative AI in reducing medication related harm. Details of the search terms are provided in [Supplement III]. Studies were included if they were published in English, described the development or application of generative AI in mitigating potential medication related harms in the care delivery process (Figure 1), and were peer-reviewed original research, review and viewpoints, structured reviews of the literature reported in accordance with PRISMA guidelines, conference abstracts, case reports. We excluded studies that utilized solely predictive modeling approaches or investigated ADEs related to dietary supplements or use of medication not prescribed for the individual. ![Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/09/14/2024.09.13.24313606/F1.medium.gif) [Figure 1.](http://medrxiv.org/content/early/2024/09/14/2024.09.13.24313606/F1) Figure 1. Archetypical care delivery process and potential points of error. Based on these criteria, abstracts were screened for eligibility by two independent reviewers using a standardized tool. If no exclusion criteria were apparent in the abstract, it was included for manuscript review. Full-text manuscripts were conducted by two independent reviewers. Studies that did not meet the selection criteria were excluded at this stage. In cases of discrepancy between reviewers, eligibility was determined by a third reviewer. ### Data Analysis We used a standardized form to extract pertinent information, including study characteristics, model details, application setting, outcome measures, findings, and reported challenges and limitations [Supplement IV]. We did not perform a critical appraisal of study quality as our primary objective is to characterise the scope of research in this field, identify research trends and gaps. The wide range of study designs and outcomes also precluded the application of a uniform quality assessment criterion. ### Role of funding source There was no funding source for this study. ## Results The search yielded 2203 articles from all databases, with 1734 remaining after removing duplicates. After applying inclusion and exclusion criteria, 14 articles were eligible for this review. ![Figure 1a:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/09/14/2024.09.13.24313606/F2.medium.gif) [Figure 1a:](http://medrxiv.org/content/early/2024/09/14/2024.09.13.24313606/F2) Figure 1a: PRISMA flow diagram17 ### Study Characteristics Studies evaluated the performance of GenAI models for various applications, as summarized in Table 2. Four studies focused on the identification, classification, or prediction of drug-drug interactions (DDI).18–21 Three studies assessed the performance and utility of GenAI as decision support tools in benzodiazepine deprescribing22, aid dosing calculation of crushed tablets23 and provision of drug information24. Majority of studies focused on the application of GenAI in adverse event monitoring from specific drug classes and enhancing pharmacovigilance processes. Study designs were predominantly observational and cross-sectional. None of the studies tested models prospectively in their respective settings of application. The proposed applications of GenAI were broadly distributed across clinical (community, inpatient care) and public health settings. View this table: [Table 1.](http://medrxiv.org/content/early/2024/09/14/2024.09.13.24313606/T1) Table 1. Key concepts and definitions View this table: [Table 2:](http://medrxiv.org/content/early/2024/09/14/2024.09.13.24313606/T2) Table 2: Summary of studies included in review View this table: [Table 3:](http://medrxiv.org/content/early/2024/09/14/2024.09.13.24313606/T3) Table 3: Types of generative models and data characteristics ### Dataset Characteristics Pharmacovigilance tasks often used public health databases, including FDA Adverse Event Reporting System (FAERS) Public Dashboard25, Health Canada ADR reporting dashboard26, China Food and Drug Adminstration27. Various datasets were used for named entity recognition tasks in ADE detection, such as user-generated content from web platforms (Medicitalia28, WebMD29) and open or restricted access datasets (CoNLL200330, BioCreative V CDR31, n2c232). Models built for DDI prediction utilized closed-source or in-house datasets such as DrugBank24,33. Testing datasets are often accrued from prior published studies, such as drug-drug interaction and deprescribing case scenarios22,34 with one study using an in-house retrospective cohort of medication prescriptions.19 ### Model Types Proprietary LLMs featured frequently in the reviewed studies, including various versions of ChatGPT and Google Bard. Studies used simple prompts or iterative prompting to generate responses on pretrained LLMs. None of the studies reported the use of additional techniques to enhance model performance, such as retrieval augmented generation or fine-tuning. Custom-developed models adopt iterations of generative adversarial networks (GAN) and variational autoencoders (VAE) in the studies reviewed. The GTCACS35 approach was a three-step approach to better identify discussion topics from social media texts. GAN achieved dimensionality reduction, keyword clustering and summarization. DeepSAVE36, a deep learning framework used an enriched VAE for dimensionality reduction through parsimonious modelling of events captured on social media platform. In one study, BART (Bidirectional and Auto-Regressive Transformers) fine-tuned with a small amount of ADR-specific named entities (few-shot learning) was adapted to allow automated identification of diverse ADEs using small volumes of annotated data.37 GAN was adapted in DGANDDI21 into a graph attention network that encode drug attributes. DGANDDI was capable of binary and multi-class prediction tasks for drug-drug interactions using an enhanced and augmented multi-dimensional dataset generated by GAN. In a similar fashion, GAN was used to generate artificial features to augment data distribution in an imbalanced spontaneous reporting dataset.27 ### Model Performance For tasks that assist clinical decision making, reference standards include information from knowledge databases e.g. Lexicomp® and expert opinion (healthcare providers or pharmacologists). Most commonly reported metrics include accuracy, sensitivity, specificity, F-1 statistic, precision, recall, AUC and AUPRC. Bespoke metrics include qualitative assessment of model responses by human experts, graded on Likert scales for quality, completeness or satisfaction. One study used ChatGPT-4 for qualitative evaluation of model performance, performed in parallel with human expert evaluation.38 #### Drug-Drug Interaction Classification and Prediction18–21 In prediction of potential drug interaction pairs, GAN-based models achieved high accuracy rate. GANs are generative models that learn from the distribution of data or images to create large, realistic synthetic data.39 DGANDDI outperformed baseline methods in both binary and multi-class DDI prediction tasks. In binary prediction, it achieved an accuracy of 96.10%, AUPR of 99.27%, and AUROC of 99.26%. In the multi-class prediction task, DGANDDI attained an accuracy of 95.89%, AUPR of 97.29%, and AUROC of 99.97%. Proprietary LLMs were used to classify DDIs. One study compared the performance of different LLMs including Microsoft Bing AI, ChatGPT-3.5 and ChatGPT-4 and Google Bard.18 When Micromedex was used as the reference standard, accuracy of LLMs ranged between 0.469 to 0.788, with Microsoft Bing AI demonstrating the best performance. Sensitivity was comparable across all LLMs, but specificity was significantly lower for ChatGPT-3.5 and ChatGPT-4. In another study, the Google Bard was used to screen prescriptions for drug-drug interactions, demonstrated low degree of agreement with predictions from Lexicomp.19 There was a nil to slight agreement between interaction risk rating (κ=0.01), severity rating (κ=0.02), and reliability (κ =-0.02). Conversely, ChatGPT (version not reported) was found to be highly accurate in identifying drug-drug interactions in 39 out of 40 DDI pairs tested. When prompted to explain its answer, ChatGPT produced responses that was highly readable. #### Decision Support22–24 In decision support applications, a study leveraging GPT-4 for benzodiazepine deprescribing reported high degree of overall agreement between LLM and human expert in identifying cases eligible for deprescribing.22 Agreement on four different deprescribing criteria was varied, ranging 74.7% to 91.3% (lack of indication: κ = .352, P < .001; prolonged use: κ = .088, P = .280; safety concerns: κ = .123, P = .006; incorrect dosage: κ = .264, P = .001). Qualitative analysis of GPT-4 responses found that up to 22% were ambiguous, generic and contained inconsistencies. Another study introduced a web-based calculator developed to guide dosing calculation, particularly in paediatric care where such errors are prevalent.23 The authors used ChatGPT (version not reported) and Visual Studio to write the underlying HTML code for dose division calculations and webpage interface creation. The webpage’s reliability and feasibility were then assessed using retrospective data and validated questionnaires, scoring 88.38 on the System Usability Scale. Accuracy and reproducibility of the calculator was not evaluated. In the provision of pharmacovigilance related enquiries, ChatGPT-4 responses was compared against responses by pharmacovigilance specialists. The median score (IQR) of the ChatGPT’s responses on a 10-point Likert scale was 4.8 (3–7.3), with a specific focus on drug causality scoring lower at 3.7 (3– 6.3), and information on medication and proper use scoring slightly better at 5 (3.2–8.3). The authors conclude that chatbot’s responses were generally not acceptable, especially in terms of precision and clinical relevance. #### Pharmacovigilance27,35–37,40 For signal detection and ADR classification, studies used generative AI for training data augmentation and dimension reduction. These models e.g. GTCACS outperformed non GenAI methods on internal validity measures. The DeepSAVE model, which uses a VAE approach, was tested on a dataset comprising of 104 million user search queries and 800 events. DeepSAVE outperformed existing methods (e.g. disproportionality analysis, DA atop Event Mention Classifier) with the highest F-measure across all validation datasets. A GAN-based classification model developed to automatically evaluate risk categories of drugs during post-marking surveillance demonstrated highest accuracy of 97.9% when compared against existing models. In one study, authors demonstrated few-shot learning with LightNER and BART, the ADR recognition performance in low-resource datasets significantly improved. For instance, the LightNER model fine-tuned using the N2C2 dataset, achieved an F1-score of 61.42%, indicating the model’s effective transfer of task knowledge from rich-resource to low-resource settings. GPT-3 was used in another study to generate a comprehensive lexicon of drug abuse synonyms from social media sources. Coupled with automated API queries and simple automated filters (e.g. google filters), the proposed method yielded precision of 0.859 and 0.770, recall of 0.431 and 0.395 for alprazolam and fentanyl respectively. LLM was harnessed to improve efficiency of pharmacovigilance process in one study, where authors used iterative prompting of GPT-4 to review and summarize food effects on drugs from drug review documents. Final draft summaries generated by GPT-4 were rated by FDA professionals, with 85% rated as factually consistent with reference summaries. This showcases GPT-4’s potential to aid in faster and more reliable drug assessment processes. ## Discussion As healthcare systems increasingly prioritize patient safety, the integration of AI has the potential to enhance the detection and prevention of ADEs, and, by extension, reduce the substantial economic cost. Our scoping review revealed three key applications of GenAI in the literature to date: identification and prediction of drug-drug interactions, provision of decision support in medication management, and automation of pharmacovigilance activities. ### Effectiveness of Generative AI in Enhancing Safety Drug-drug interactions make up nearly 3% of all hospital admissions and account for up to 5% of all inpatient medication errors.41 Harmful DDIs are often only reported from post-marketing surveillance activities, rather than at the clinical trial stage.42 Our review included studies that predict potential DDIs pre-clinically. Performance of models augmented by GAN outperforms those trained using traditionally augmented data using a fraction of the original training dataset.43,44 GAN can be a useful tool in enhancing prediction accuracy where data is limited. On the other hand, LLMs demonstrated variable performance in screening DDIs from prescriptions. Studies frequently used simple prompting strategies to elucidate response from LLMs, with no additional techniques used to provide contextual knowledge or reduce incorrect responses (or “hallucinations”). Methods such as retrieval augmented generation (RAG) or fine-tuning may allow LLMs to tailor responses to specified tasks through provision of contextual knowledge (e.g. drug-drug interaction database).45,46 The advantage of such techniques have been shown in other clinical tasks, including differential diagnosis47, evidenced-based decision support46,48 and patient chart review49. These techniques however, may rely on well-curated, clinically adjudicated drug-drug interaction datasets that are not often freely available. As decision support tools, studies adopting LLMs are mainly exploratory in nature. We found a wide range of tasks and purposes (e.g. prescription review, dosage calculation, and answering medication enquiry). These broad applications are enabled by generalist properties of large language models.50 LLMs demonstrate capacity to perform tasks with little to no task-specific training, also known as “zero-shot” or “few-shots” learning. In the context of reducing medication harm, LLMs may simulate clinical reasoning and inferential skills across diverse medical disciplines, drug classes and user settings without the need for explicit training. For instance, an LLM trained to screen prescriptions for inappropriate benzodiazepine use may be adapted easily to screen for inappropriate drug use in elderly patients. In addition, LLMs are well poised as medical chatbots, given their text generation capabilities demonstrating high degree of fluency, empathy, and personalization, even outperforming clinicians.51 These explorations, however, highlight existing challenges to clinical adoption of LLMs. While studies to date are in research phase and no exploration in terms of auto-piloting or co-piloting as modes of clinical integration. Accuracy, reliability and consistency of LLM responses using general purpose LLMs such as ChatGPT precludes its autonomous use in clinical settings. A promising area of LLM application in enhancing efficiency and impact is in ADE monitoring and pharmacovigilance, where GenAI tools may enhance timeliness and accuracy of ADR detection from specific medication classes. Our review has shown that GenAI enhanced accuracy of signal detection, disease and drug entity recognition over conventional natural language processing tools. LLMs are able to handle a wide breadth of data sources (i.e. electronic health records, online databases, and social media platform), facilitating the detection of rare events and offering a generalist capability that is essential for continuous learning and adaptation.52 Automation of specific tasks in pharmacovigilance that is traditionally resource-intensive is a potential avenue for productivity gain with the use of GenAI models.53 ### Clinical Implications In our review, we are unable to provide conclusive evidence that GenAI will reduce medication-related harms when applied in clinical settings. Along the continuum of medication use process, GenAI models were actually only adopted for highly selective domains and tasks. While a comprehensive review or metrics of medication-related tasks across multiple domains or tasks across different GenAI models have yet to be concluded. In other words, current generative AI in research are still applied as narrow based AI focused on specific task, yet to be explored or achieve their “generative” potential for medication safety. Non-generative AI models was evaluated in another review, where 78 articles described the application of AI in reducing ADEs.54 A variety of AI techniques were described including neural networks and tree-based algorithms in predicting potential ADEs and enhancing early detection. Utilizing diverse data sources like genetic information and electronic health records, these AI models aimed to inform clinical decisions on safe prescribing and medication management. Instead of applying generative AI models in tasks that mandate deterministic outputs, we propose that LLMs can be adopted in ways to reduce cognitive workload for healthcare professionals. Healthcare professionals work with high volumes of multi-modal patient data and are required to pay attention to details, synthesize information and make clinical decisions in real time. High cognitive load pose risk for burnout and medical errors.55 For example, LLMs can be used to analyse and reduce alert burden in electronic medical records, in medication incident analysis, and summarization in a similar fashion to discharge notes generation.12,56 In a study published after we completed literature search, LLMs were used in a co-pilot system to extract key named entities of online submitted prescriptions and assembly into coherent instructions.57 This system was shown to reduce near-miss events and improved the efficiency of pharmacy operations in a large-scale online pharmacy. Finally, LLMs can be leveraged upon as a tool in patient education and engagement thereby enhancing patient access to critical medication related information.58 A critical evaluation of studies included in our review revealed a lack of adherence to reporting guidelines for AI studies. We did not perform a quality review of the studies in view of the scoping nature of this review and diverse hypotheses of studies included. Checklists and reporting guidelines such as the MI-CLAIM for transparent model reporting59, TRIPOD+AI checklist for comprehensive reporting of predictive models60 and DECIDE-AI checklist for early stage clinical evaluation of AI-based decision support tools61 should be adopted in future studies. However, there is still lack of validated reporting tools for LLM-based AI model, though initial efforts have been made to create LLM-specific frameworks.62 Evaluation or discussions on model fairness, bias and other ethical considerations such as data privacy were also found to be lacking in the included studies. ### Limitations Our study has several limitations. The heterogeneity across studies regarding application, GenAI tools used and setting prevented a formal assessment of predictive validity for different AI models. Diversity of training and testing datasets used precludes generalizability of findings across different demographic groups. Patient outcomes were not reported in all studies, limiting any conclusions about the role of GenAI and its impact on patient outcomes. We limited our review to only peer reviewed articles. We acknowledge that the field of generative AI and LLM is rapidly evolving and a large number of studies may still be in the preprint stage or archived. ### Blueprint for Future Studies Future research should focus on developing and benchmarking generative AI models against established healthcare standards to further validate their performance and cost-effectiveness to ensure their safe integration into clinical practice. There is a need to develop expert curated, high quality training datasets with diverse representation from different geographical, ethnic and social groups. Such datasets, when shared, can facilitate and accelerate training of GenAI models adapted for different applications in the medication use process. For example, an expert annotated dataset of incident reports can be used to fine-tune an LLM-based model to predict risk for medication incidents.63 Other areas of high interest include the use of LLMs for real-time monitoring of drug safety and the exploration of GAN the synthetic generation of training data, which can help overcome the limitations posed by rare ADE occurrences. ## Conclusion GenAI and LLMs demonstrate potential in enhancing medication safety and reducing medication-related harm. Published studies reveal potential areas for successful future implementation. However, the current areas that have been addressed are targeted at only some of the key safety issues of medication safety today. Moreover, research rigor and comprehensive ethical evaluation is lacking in the studies to date. Future studies should address gaps in lack of high-quality datasets specific for medication safety tasks. Continuous update of this review is warranted given the burgeoning nature of this field. ## Contributors NL, JCLO and CM conceptualized and designed the systematic review. JCLO, CM, NN, KB, NYTT, LJ and QX performed article screening and data extraction. RR, DWB, DSWT contributed to the first draft of the report with input from NL. All authors had full access to all the data in the study and had final responsibility for the decision to submit for publication. ## Declaration of Interest We declare no competing interests. ## Data Availability All data produced in the present study are available upon reasonable request to the authors View this table: [Supplement I](http://medrxiv.org/content/early/2024/09/14/2024.09.13.24313606/T4) Supplement I Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) Checklist View this table: [Supplement II](http://medrxiv.org/content/early/2024/09/14/2024.09.13.24313606/T5) Supplement II Population, Concept and Contexts (PCC) for the scoping review View this table: [Supplement III](http://medrxiv.org/content/early/2024/09/14/2024.09.13.24313606/T6) Supplement III Search Strategy View this table: [Supplement IV](http://medrxiv.org/content/early/2024/09/14/2024.09.13.24313606/T7) Supplement IV Data abstraction variables * Received September 13, 2024. * Revision received September 13, 2024. * Accepted September 14, 2024. * © 2024, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at [http://creativecommons.org/licenses/by-nc-nd/4.0/](http://creativecommons.org/licenses/by-nc-nd/4.0/) ## References 1. 1.WHO Integrated Health Services MWH. Global burden of preventable medication-related harm in health care: a systematic review. Geneva: World Health Organization. 2023;Licence: CC BY-NC-SA 3.0 IGO. 2. 2.M V, GA M, JV T, V F, GA R. The Global Burden of Cardiovascular Diseases and Risk: A Compass for Future Health. Journal of the American College of Cardiology. 12/20/2022 2022;80(25)doi:10.1016/j.jacc.2022.11.005 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.jacc.2022.11.005&link_type=DOI) 3. 3.H LL, PJ P. Twenty-First Century Global ADR Management: A Need for Clarification, Redesign, and Coordinated Action. Therapeutic innovation & regulatory science. 2023 Jan 2023;57(1)doi:10.1007/s43441-022-00443-8 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s43441-022-00443-8&link_type=DOI) 4. 4.DW B, DJ C, N L, et al. Incidence of adverse drug events and potential adverse drug events. Implications for prevention. ADE Prevention Study Group. JAMA. 07/05/1995 1995;274(1) 5. 5.DW B, D L, A S, et al. The potential of artificial intelligence to improve patient safety: a scoping review. NPJ digital medicine. 03/19/2021 2021;4(1)doi:10.1038/s41746-021-00423-6 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41746-021-00423-6&link_type=DOI) 6. 6.C G, M B, B M, P R, F M, P T. Accessing Artificial Intelligence for Clinical Decision-Making. Frontiers in digital health. 06/25/2021 2021;3doi:10.3389/fdgth.2021.645232 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.3389/fdgth.2021.645232&link_type=DOI) 7. 7.A C, O A. Role of Artificial Intelligence in Patient Safety Outcomes: Systematic Literature Review. JMIR medical informatics. 07/24/2020 2020;8(7)doi:10.2196/18599 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2196/18599&link_type=DOI) 8. 8.JC S, Q C, JC D, DM R, KB J, RA M. Evaluation of a Novel System to Enhance Clinicians’ Recognition of Preadmission Adverse Drug Reactions. Applied clinical informatics. 2018 Apr 2018;9(2)doi:10.1055/s-0038-1646963 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1055/s-0038-1646963&link_type=DOI) 9. 9.X H, D E, X L, Y Y, J Q, Z L. Evaluating the performance of ChatGPT in clinical pharmacy: A comparative study of ChatGPT and clinical pharmacists. British journal of clinical pharmacology. 2024 Jan 2024;90(1)doi:10.1111/bcp.15896 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/bcp.15896&link_type=DOI) 10. 10.X W, X X, Z L, W T. Bidirectional Encoder Representations from Transformers-like large language models in patient safety and pharmacovigilance: A comprehensive assessment of causal inference implications. *Experimental biology and medicine (Maywood*, NJ*)*. 2023 Nov 2023;248(21)doi:10.1177/15353702231215895 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1177/15353702231215895&link_type=DOI) 11. 11.D R, P P, R K, H K, C V, Y W. Effectiveness of ChatGPT in clinical pharmacy and the role of artificial intelligence in medication therapy management. Journal of the American Pharmacists Association : JAPhA. 2024 Mar-Apr 2024;64(2)doi:10.1016/j.japh.2023.11.023 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.japh.2023.11.023&link_type=DOI) 12. 12.Ong JCL, Jin L, Elangovan K, et al. Development and Testing of a Novel Large Language Model-Based Clinical Decision Support Systems for Medication Safety in 12 Clinical Specialties. 2024/01/29 2024; 13. 13.Medication Without Harm - Global Patient Safety Challenge on Medication Safety. Geneva: World Health Organization LCB-N-SI. [https://iris.who.int/bitstream/handle/10665/255263/WHO-HIS-SDS-2017.6-eng.pdf?sequence=1](https://iris.who.int/bitstream/handle/10665/255263/WHO-HIS-SDS-2017.6-eng.pdf?sequence=1) 14. 14.DW B, DL B, MB VV, J S, L L. Relationship between medication errors and adverse drug events. Journal of general internal medicine. 1995 Apr 1995;10(4)doi:10.1007/BF02600255 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/BF02600255&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=7790981&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F09%2F14%2F2024.09.13.24313606.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1995QT15100004&link_type=ISI) 15. 15.L H, AC vG. Pharmacovigilance: methods, recent developments and future perspectives. European journal of clinical pharmacology. 2008 Aug 2008;64(8)doi:10.1007/s00228-008-0475-9 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s00228-008-0475-9&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=18523760&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F09%2F14%2F2024.09.13.24313606.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000256927900001&link_type=ISI) 16. 16.MJ P, JE M, PM B, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ (Clinical research ed*)*. 03/29/2021 2021;372doi:10.1136/bmj.n71 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1136/bmj.n71&link_type=DOI) 17. 17.Page MJ, McKenzie JE, Bossuyt PM, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. 2021-03-29 2021;doi:10.1136/bmj.n71 [FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiRlVMTCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiYm1qIjtzOjU6InJlc2lkIjtzOjE1OiIzNzIvbWFyMjlfMi9uNzEiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyNC8wOS8xNC8yMDI0LjA5LjEzLjI0MzEzNjA2LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 18. 18.FY A-A, M Z, L G, R A-F, AN B. Evaluating the Sensitivity, Specificity, and Accuracy of ChatGPT-3.5, ChatGPT-4, Bing AI, and Bard Against Conventional Drug-Drug Interactions Clinical Tools. Drug, healthcare and patient safety. 09/20/2023 2023;15doi:10.2147/DHPS.S425858 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2147/DHPS.S425858&link_type=DOI) 19. 19.DM S, SS S, HB A, AM S, MA M. Screening the Drug-Drug Interactions Between Antimicrobials and Other Prescribed Medications Using Google Bard and Lexicomp® Online™ Database. Cureus. 09/09/2023 2023;15(9)doi:10.7759/cureus.44961 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.7759/cureus.44961&link_type=DOI) 20. 20.A J, N P, S S, S M, JK B, H M. The Capability of ChatGPT in Predicting and Explaining Common Drug-Drug Interactions. Cureus. 03/17/2023 2023;15(3)doi:10.7759/cureus.36272 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.7759/cureus.36272&link_type=DOI) 21. 21.H Y, K L, J S. DGANDDI: Double Generative Adversarial Networks for Drug-Drug Interaction Prediction. IEEE/ACM transactions on computational biology and bioinformatics. 2023 May-Jun 2023;20(3)doi:10.1109/TCBB.2022.3219883 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1109/TCBB.2022.3219883&link_type=DOI) 22. 22.I B, D B, M D, et al. Clinical decision-making in benzodiazepine deprescribing by healthcare providers vs. AI-assisted approach. British journal of clinical pharmacology. 2024 Mar 2024;90(3)doi:10.1111/bcp.15963 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/bcp.15963&link_type=DOI) 23. 23.. J S, M R, TM PK, V P, SH C. Dose 4 You: Dose Division Calculator-A Tool to Reduce Calculation Errors. Hospital pharmacy. 2024 Apr 2024;59(2)doi:10.1177/00185787231207757 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1177/00185787231207757&link_type=DOI) 24. 24.F M, W S, C dC, et al. Will artificial intelligence chatbots replace clinical pharmacologists? An exploratory study in clinical practice. European journal of clinical pharmacology. 2023 Oct 2023;79(10)doi:10.1007/s00228-023-03547-8 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s00228-023-03547-8&link_type=DOI) 25. 25.@US_FDA. FDA Adverse Event Reporting System (FAERS) Public Dashboard | FDA. 2024; Canada H. Adverse Reaction Database - Canada.ca. 2009-09-24 2009; 26. 26.J W, G F, Z L, P H, Y Z, W H. Evaluating Drug Risk Using GAN and SMOTE Based on CFDA’s Spontaneous Reporting Data. Journal of healthcare engineering. 08/27/2021 2021;2021doi:10.1155/2021/6033860 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1155/2021/6033860&link_type=DOI) 27. 27.Consulti medici e specialisti online: più informati, più sani! @medicitalia. [https://www.medicitalia.it](https://www.medicitalia.it) 28. 28.WebMD. Drugs & Medications; Topramax. [https://reviewswebmdcom/drugs/drugreview-14494-topamax-oral](https://reviewswebmdcom/drugs/drugreview-14494-topamax-oral). 2024; 29. 29.Erik F. Tjong Kim Sang FDM. Introduction to the CoNLL-2003 shared task | Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4. Proceedings of the seventh conference on Natural language learning at HLT-NAACL. 2003;4 May:142–147. doi:10.3115/1119176.1119195 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.3115/1119176.1119195&link_type=DOI) 30. 30.J L, Y S, RJ J, et al. BioCreative V CDR task corpus: a resource for chemical disease relation extraction. Database : the journal of biological databases and curation. 05/09/2016 2016;2016doi:10.1093/database/baw068 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/database/baw068&link_type=DOI) 31. 31.S H, Y W, F S, O U. The 2019 National Natural language processing (NLP) Clinical Challenges (n2c2)/Open Health NLP (OHNLP) shared task on clinical concept normalization for clinical records. Journal of the American Medical Informatics Association : JAMIA. 10/01/2020 2020;27(10)doi:10.1093/jamia/ocaa106 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/jamia/ocaa106&link_type=DOI) 32. 32.Online D. DrugBank Online | Database for Drug and Drug Target Info. 2024; 33. 33.AS B, MA S, A A, EA G, J B, D F. Prevalence and Determinants of Multimorbidity, Polypharmacy, and Potentially Inappropriate Medication Use in the Older Outpatients: Findings from EuroAgeism H2020 ESR7 Project in Ethiopia. *Pharmaceuticals (Basel*, Switzerland*)*. 08/25/2021 2021;14(9)doi:10.3390/ph14090844 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.3390/ph14090844&link_type=DOI) 34. 34.GraniGiorgio, LenziAndrea, VelardiPaola. Supporting Personalized Health Care With Social Media Analytics: An Application to Hypothyroidism. research-article. 2021-10-15 2021;doi:4 35. 35. F. Ahmad AA, B. Kitchens, D. Adjeroh and D. Zeng. Deep Learning for Adverse Event Detection From Web Search. IEEE Transactions on Knowledge and Data Engineering. 2022;34(6):2681–2695. 36. 36.Chang C-H, National Sun Yat-sen University DoIM, Kaohsiung,Taiwan, Chang F-Y, Hwang S- Y, Yang CC, Drexel University CoCaI, Philadelphia,USA. Prompting for Few-shot Adverse Drug Reaction Recognition from Online Reviews. IEEE Computer Society; 2023:168-175. 37. 37.Y S, P R, J W, et al. Leveraging GPT-4 for food effect summarization to enhance product-specific guidance development via iterative prompting. Journal of biomedical informatics. 2023 Dec 2023;148doi:10.1016/j.jbi.2023.104533 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.jbi.2023.104533&link_type=DOI) 38. 38.A A, A A. Generative adversarial networks and synthetic patient data: current challenges and future perspectives. Future healthcare journal. 2022 Jul 2022;9(2)doi:10.7861/fhj.2022-0013 [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MTA6ImZ1dHVyZWhvc3AiO3M6NToicmVzaWQiO3M6NzoiOS8yLzE5MCI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDI0LzA5LzE0LzIwMjQuMDkuMTMuMjQzMTM2MDYuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 39. 39.KA C, RB A. Using GPT-3 to Build a Lexicon of Drugs of Abuse Synonyms for Social Media Pharmacovigilance. Biomolecules. 02/18/2023 2023;13(2)doi:10.3390/biom13020387 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.3390/biom13020387&link_type=DOI) 40. 40.LL L, DW B, DJ C, et al. Systems analysis of adverse drug events. ADE Prevention Study Group. JAMA. 07/05/1995 1995;274(1) 41. 41.Lu Y, Shen D, Pietsch M, et al. A novel algorithm for analyzing drug-drug interactions from MEDLINE literature. OriginalPaper. Scientific Reports. 2015-11-27 2015;5(1):1-10. doi:doi:10.1038/srep17357 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/srep17357&link_type=DOI) 42. 42.AA V, FF G, YJ K, et al. Deploying deep learning models on unseen medical imaging using adversarial domain adaptation. PloS one. 10/14/2022 2022;17(10)doi:10.1371/journal.pone.0273262 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pone.0273262&link_type=DOI) 43. 43.AJ SK, RS C, JG C, et al. Evaluation of Generative Adversarial Networks for High-Resolution Synthetic Image Generation of Circumpapillary Optical Coherence Tomography Images for Glaucoma. JAMA ophthalmology. 10/01/2022 2022;140(10)doi:10.1001/jamaophthalmol.2022.3375 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1001/jamaophthalmol.2022.3375&link_type=DOI) 44. 44.C Z, R S, A C, et al. Almanac - Retrieval-Augmented Language Models for Clinical Medicine. NEJM AI. 2024 Feb 2024;1(2)doi:10.1056/aioa2300068 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/aioa2300068&link_type=DOI) 45. 45.Ke Y, Jin L, Elangovan K, et al. Development and Testing of Retrieval Augmented Generation in Large Language Models -- A Case Study Report. 2024/01/29 2024; 46. 46.S R, A R, J N, et al. A retrieval-augmented chatbot based on GPT-4 provides appropriate differential diagnosis in gastrointestinal radiology: a proof of concept study. European radiology experimental. 05/17/2024 2024;8(1)doi:10.1186/s41747-024-00457-x [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s41747-024-00457-x&link_type=DOI) 47. 47.S K, M G, M A, A A, LS C, DL S. Optimization of hepatological clinical guidelines interpretation by large language models: a retrieval augmented generation-based framework. NPJ digital medicine. 04/23/2024 2024;7(1)doi:10.1038/s41746-024-01091-y [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41746-024-01091-y&link_type=DOI) 48. 48. Vaid Aea. Using fine-tuned large language models to parse clinical notes in musculoskeletal pain disorders. The Lancet Digital Health. 2024;5(12):e855–e858. 49. 49.Tu T, Azizi S, Driess D, et al. Towards Generalist Biomedical AI. research-article. 2024-02-22 2024;doi:10.1056/AIoa2300138 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/AIoa2300138&link_type=DOI) 50. 50.IA B, YV Z, D G, et al. Comparison of Ophthalmologist and Large Language Model Chatbot Responses to Online Patient Eye Care Questions. JAMA network open. 08/01/2023 2023;6(8)doi:10.1001/jamanetworkopen.2023.30320 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1001/jamanetworkopen.2023.30320&link_type=DOI) 51. 51.M M, O B, ZSH A, et al. Foundation models for generalist medical artificial intelligence. Nature. 2023 Apr 2023;616(7956)doi:10.1038/s41586-023-05881-4 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-023-05881-4&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=37045921&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F09%2F14%2F2024.09.13.24313606.atom) 52. 52.M S, J P, P Y, et al. The Use of Artificial Intelligence in Pharmacovigilance: A Systematic Review of the Literature. Pharmaceutical medicine. 2022 Oct 2022;36(5)doi:10.1007/s40290-022-00441-z [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s40290-022-00441-z&link_type=DOI) 53. 53.A S, W S, MG A, et al. Key use cases for artificial intelligence to reduce the frequency of adverse drug events: a scoping review. The Lancet Digital health. 2022 Feb 2022;4(2)doi:10.1016/S2589-7500(21)00229-6 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S2589-7500(21)00229-6&link_type=DOI) 54. 54.DE E, SN G, S N, et al. Evaluating and reducing cognitive load should be a priority for machine learning in healthcare. Nature medicine. 2022 Jul 2022;28(7)doi:10.1038/s41591-022-01833-z [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41591-022-01833-z&link_type=DOI) 55. 55.D VV, C VU, L B, et al. Adapted large language models can outperform medical experts in clinical text summarization. Nature medicine. 2024 Apr 2024;30(4)doi:10.1038/s41591-024-02855-5 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41591-024-02855-5&link_type=DOI) 56. 56.C P, J L, R V, V G, E W, M B. Large language models for preventing medication direction errors in online pharmacies. Nature medicine. 04/25/2024 2024;doi:10.1038/s41591-024-02933-8 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41591-024-02933-8&link_type=DOI) 57. 57.Yang R, Tan, TF, Lu, W, Thirunavukarasu, AJ, Ting, DSW, Liu, N . Large language models in health care: development, applications, and challenges . . Health Care Sci 2023;2:255 – 263. 58. 58.B N, G Q, BK B-J, et al. Minimum information about clinical artificial intelligence modeling: the MI-CLAIM checklist. Nature medicine. 2020 Sep 2020;26(9)doi:10.1038/s41591-020-1041-y [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41591-020-1041-y&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F09%2F14%2F2024.09.13.24313606.atom) 59. 59.GS C, KGM M, P D, et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ (Clinical research ed*)*. 04/16/2024 2024;385doi:10.1136/bmj-2023-078378 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1136/bmj-2023-078378&link_type=DOI) 60. 60.B V, M N, B C, et al. Reporting guideline for the early-stage clinical evaluation of decision support systems driven by artificial intelligence: DECIDE-AI. Nature medicine. 2022 May 2022;28(5)doi:10.1038/s41591-022-01772-9 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41591-022-01772-9&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F09%2F14%2F2024.09.13.24313606.atom) 61. 61.Ning Y, Teixayavong S, Shang Y, et al. Generative Artificial Intelligence in Healthcare: Ethical Considerations and Assessment Checklist. 2023/11/02 2023; 62. 62.ZSY W, N W, J L, S U. A large dataset of annotated incident reports on medication errors. Scientific data. 02/29/2024 2024;11(1)doi:10.1038/s41597-024-03036-2 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41597-024-03036-2&link_type=DOI) 63. 63.Rights of the Individual. 2023; 64. 64.Organization TWH. Regulation and Prequalification: What is Pharmacovigilance? Accessed 20 Feb 2024,