Development of a Liver Disease-Specific Large Language Model Chat Interface using Retrieval Augmented Generation

Jin Ge; Steve Sun; Joseph Owens; Victor Galvez; Oksana Gologorskaya; Jennifer C. Lai; Mark J. Pletcher; Ki Lai

doi:10.1101/2023.11.10.23298364

Abstract

Background Large language models (LLMs) have significant capabilities in clinical information processing tasks. Commercially available LLMs, however, are not optimized for clinical uses and are prone to generating incorrect or hallucinatory information. Retrieval-augmented generation (RAG) is an enterprise architecture that allows embedding of customized data into LLMs. This approach “specializes” the LLMs and is thought to reduce hallucinations.

Methods We developed “LiVersa,” a liver disease-specific LLM, by using our institution’s protected health information (PHI)-complaint text embedding and LLM platform, “Versa.” We conducted RAG on 30 publicly available American Association for the Study of Liver Diseases (AASLD) guidelines and guidance documents to be incorporated into LiVersa. We evaluated LiVersa’s performance by comparing its responses versus those of trainees from a previously published knowledge assessment study regarding hepatitis B (HBV) treatment and hepatocellular carcinoma (HCC) surveillance.

Results LiVersa answered all 10 questions correctly when forced to provide a “yes” or “no” answer. Full detailed responses with justifications and rationales, however, were not completely correct for three of the questions.

Discussions In this study, we demonstrated the ability to build disease-specific and PHI-compliant LLMs using RAG. While our LLM, LiVersa, demonstrated more specificity in answering questions related to clinical hepatology – there were some knowledge deficiencies due to limitations set by the number and types of documents used for RAG. The LiVersa prototype, however, is a proof of concept for utilizing RAG to customize LLMs for clinical uses and a potential strategy to realize personalized medicine in the future.

Introduction

Large language models (LLMs), such as OpenAI’s Generative Pre-trained Transformer (GPT) family of models, have demonstrated significant capabilities in clinical information processing tasks, such as data extraction,¹ summarizing literature,² content generation,³ and predictive modeling.⁴ Commercially available general-purpose LLMs, such as OpenAI’s ChatGPT, however, are trained on publicly available data and not optimized for clinical uses.⁵ This means that the outputs of publicly available LLMs when prompted with clinical questions may include incorrect, incomplete, or hallucinatory information.^6,7 Despite these limitations, LLMs are thought to have significant potential in biomedical and clinical applications. This is because the practice of modern medicine is highly complex endeavor with ever-increasing amounts of knowledge generated yearly.⁸ For instance, it was estimated that two papers were uploaded to PubMed every minute in 2016,⁹ a figure that has undoubtedly increased in the seven years since. National practice societies have also responded to the ever-expanding body of clinical knowledge by developing comprehensive practice guidelines. For instance, the American Association of the Study of Liver Diseases (AASLD) has issued guidelines and guidance documents for 26 liver diseases and conditions, and two quality measures concerning cirrhosis and hepatocellular carcinoma (HCC).¹⁰ Methods to incorporate medical literature, such as clinical practice guidelines and guidance documents, into LLM responses, therefore, represent growing area of interest to help general-purpose LLMs become “specialized” for clinically-focused applications.

Currently, there are three general approaches and techniques to allow LLM “specialization:” 1. Fine-tune the original LLM model, which is computationally expensive;^11,12 2. Prompting within the LLM, which only accommodates small amounts of data and requires iterative user input;^13–15 and 3. Retrieval-augmented generation (RAG).^16–18 RAG is an enterprise architecture that augments LLM abilities by adding an information retrieval system that provides external data, which then supplement and constrain LLM output. In practice, this means that a dataset, such as a compendium of clinical practice guidelines, is vectorized and encoded using embedding models and then incorporated into the LLM by layering it on top of the LLM information retrieval and output processes.¹⁶ The theoretical advantages of RAG are two-fold: 1. Handling large numbers of documents to provide “ground knowledge” for the LLM, and 2. Decreasing hallucinations by limiting the potential “solution-space” for LLM outputs. This RAG approach has been proposed and utilized in other clinical specialties, such as general medicine and urology.^17,19,20 Existing RAG implementations, however, have largely been on publicly available and not protected health information [PHI]-compliant LLMs – thereby limiting permitted use in clinical practice

In this study, we utilized RAG to create a prototype liver disease-specific LLM, called “LiVersa,” within the University of California, San Francisco’s (UCSF) PHI-compliant implementation of Microsoft OpenAI GPT family of LLM models, “Versa.”

Methods

This study was not considered human subjects research because no human data or specimens were used. Consequently, approval from the UCSF Institutional Review Board was not sought. To construct our liver disease-specific LLM (LiVersa), we used all available provider-facing clinical practice guidelines, guidance documents, and quality measure documents published by AASLD on its website.¹⁰ We excluded introduction and executive summary documents as the content in these were often duplicated in the corresponding full-length versions (e.g., for Wilson’s Disease). We included both original guidelines/guidance documents and updates if both were publicly available for download on AASLD website (e.g., for Chronic Hepatitis B [HBV]). Patient-facing and/or outdated guidelines/guidance documents were not included in this study. In total, 30 documents were retrieved to be incorporated into the RAG process (Table 1).

View this table:

Table 1 AASLD Guidelines, Guidance Documents, and Quality Measure Documents Incorporated in LiVersa

We utilized application programming interfaces (APIs) provided by the Microsoft Azure OpenAI Cognitive Search suite of tools to incorporate the AASLD guidelines and documents for RAG (Figure 1).¹⁶ In the pre-processing phase, the 30 AASLD guidelines and documents in PDF format were transformed into embeddings using Microsoft Azure OpenAI’s ADA Text Embedding Version 2 model (text-embedding-ada-002). Text embeddings are numerical representations of text where words for phrases are represented as multi-dimensional vectors.^21,22 These embedding vectors are then stored in a database with the Microsoft Azure Cognitive Search services. During chat interactions with LiVersa, users’ prompts are converted into embeddings in real time using the same text-embedding-ada-002 model. A search is then performed on the previously processed vector database of AASLD guidelines and documents to find matches for the prompt embeddings. The search results from this search process are then passed to the gpt-35-turbo or gpt-4-32k LLMs to generate a completion, which is the output from the LLM in the LiVersa chat interface.

Figure 1 Diagram for Retrieval Augmented Generated as Executed Through Azure Cognitive Search

We pre-set three sample questions into “Sample Questions” section of the LiVersa interface:

What are the indications for liver transplantation?
Who should be screened for chronic hepatitis B?
What is the recommended therapy for a patient with BCLC 0-A HCC without portal hypertension?

To compare LiVersa’s performance on free-form clinical hepatology questions, we compared its responses to medical trainees’ responses in a previously published case-vignette based knowledge assessment questions on HBV treatment and hepatocellular carcinoma (HCC) surveillance.²³ We chose this set of questions because they have also been evaluated by the publicly available ChatGPT.²⁴ Prior to input into LiVersa, case-vignettes were edited for clarity and appropriateness with the goal of retaining the essence of the clinical scenario. We asked LiVersa to give both a full response and a forced “yes” or “no” response by appending the prompt with “Answer yes or no” to each inputted case-vignette (Figure 2).

Figure 2 LiVersa Chat Interface

Results

The user interface for LiVersa is shown in Figure 2. The interface is the same as to the one used for “Versa Chat” (the chat interface for the general-purpose PHI-compliant LLM implementation at UCSF), except with the data source being set to the “UCSF Intranet: AASLD Clinical Practice Guidelines (Prototype).” LiVersa’s responses to the three sample questions are featured in Table 2. LiVersa’s performance on case-vignette based knowledge assessment questions on HBV treatment and HCC surveillance are featured in Table 3, Columns B and C. The correct answers along with percentages of trainees who answered correctly per prior studies are featured in Table 3, Column D. LiVersa answered all 10 questions correctly when forced to provide a “yes” or “no” answer. Rationales and justifications within the full answers, however, were not completely correct for three clinical scenarios in HCC surveillance. These three clinical scenarios were: “A 25-year-old Haitian man with chronic hepatitis B, on treatment with entecavir,” “A 40-year-old Cuban woman who was recently diagnosed with hepatitis B after developing jaundice in the setting of a surgical procedure. There is no evidence of cirrhosis,” and “A 40-year-old woman from Thailand with cirrhosis and chronic inactive hepatitis B.”

View this table:

Table 2 LiVersa Responses to “Sample Questions” Embedded in the Chat Interface

View this table:

Table 3 LiVersa Responses to HBV Treatment and HCC Surveillance Questions

Discussion

Two of the most significant barriers to using general-purpose LLMs like ChatGPT in clinical practice are the tendency for these LLMs to “hallucinate” or generate confident sounding but false responses, and their lack of compatibility with PHI.^6,7 The first issue stems from the nature of the transformer architecture,²⁵ which optimizes the LLM’s objective of predicting the most probable next word (token) from its pre-trained data without consideration for accuracy. Augmenting the knowledge base of general-purpose LLMs, such as through RAG, becomes one of the key strategies to shaping and constraining LLM outputs to prevent false information from being propagated and disseminated. Our work is one of the first demonstrations and proof-of-concept of using RAG to create a liver disease-specific and, more importantly, PHI-compliant LLM chat interface. Our incorporation of hepatology specific knowledge, such as the AASLD practice guidelines, resulted in the LiVersa chat interface generating answers that were likely more specific than those generated by general-purpose ChatGPT.²⁴

While LiVersa answered all 10 questions regarding HBV treatment and HCC surveillance in our test set correctly, the rationales given for three cases were not completely correct. These incorrect responses reflect limitations of utilizing RAG. For instance, LiVersa’s stated rationale for a “Yes” recommendation for the case-vignette of “A 25-year-old Haitian man with chronic hepatitis B, on treatment with entecavir” was based on the presumption that the patient was from an HBV endemic country (Table 3). The actual rationale given in prior literature is based on the 2018 version of the AASLD HCC practice guidance document, which recommended early surveillance for African HBV patients and/or North American HBV patients of African-American descent.

In this example, we highlight two key issues. First, the 2018 version of the AASLD HCC practice guidance document was not uploaded into LiVersa’s RAG dataset, the 2023 version was.^26,27 LiVersa gave the wrong justification for HCC surveillance in this case simply because the 2018 version was not available to it and “restrictions” placed by the RAG dataset forced it to generate a hallucinatory answer. This limitation could easily be overcome by increasing the number and variety of literature incorporated into LiVersa’s RAG dataset. Additional documents that could be considered for inclusion in the future include clinical practice guidelines/guidance documents from other societies, such as those from the American Gastroenterological Association, American College of Gastroenterology, European Association for the Study of the Liver, and Asian Pacific Association for the Study of the Liver; and from compendium sources, such as the Cochrane Review and UpToDate.

The second key issue is contextual bias. In this instance, the recommendation for HCC surveillance is based on the assumption that the man from Haiti in the clinical scenario is of African descent. This is a problem that affects both literature data incorporated into the RAG dataset and in the pre-training data that underpins the LLM itself. The more “correct” response should have been to ask for additional context to clarify the self-identified national-origin or racial/ethnic background of the patient to allow the LLM to give a more comprehensive recommendation/output. This is an illustration of the problems concerning algorithm bias that generative artificial intelligence (GAI) research community is actively grappling with.^28–30 Finally, in addition to the two key considerations noted above, there is a minor limitation with regards to our grading of LiVersa’s responses. We had edited the case-vignette knowledge assessment questions for clarity and appropriateness. While we tried to retain the intent of the clinical cases, the wording of these cases may not be the same as those evaluated by trainees and ChatGPT in prior studies.^23,24 This may have affected the comparisons in Table 3.

Despite these limitations, we demonstrated that RAG could be a powerful method to create a “specialized” LLM. Given that our LiVersa prototype was developed and deployed in a PHI-compliant environment (UCSF’s Versa), it could theoretically be used clinically to evaluate actual patient scenarios. By extension, there is a potential to incorporate both literature and patient data from the electronic health record through extraction by the Fast Healthcare Interoperability Resources (FHIR) API in the RAG process.³¹ This process would allow for the creation of patient-specific clinical LLMs that could be a true realization of GAI-enabled personalized medicine.

Data Availability

The clinical guidelines/guidance documents used in the methods of this manuscript are publicly available.

Financial/Grant Support

The authors of this study were supported in part by the KL2TR001870 (National Center for Advancing Translational Sciences, Ge), P30DK026743 (UCSF Liver Center Grant, Ge and Lai), UL1TR001872 (National Center for Advancing Translational Sciences, Pletcher), and R01AG059183/K24AG080021 (National Institute on Aging, Lai). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or any other funding agencies. The funding agencies played no role in the analysis of the data or the preparation of this manuscript.

Disclosures

The authors of this manuscript have the following potential conflicts of interest to disclose:

- Dr. Jin Ge receives research support from Merck and Co; and consults for Astellas Pharmaceuticals/Iota Biosciences.
- Dr. Jennifer C. Lai receives research support from Lipocene and Vir Biotechnologies; receives an education grant from Nestle Nutrition Sciences; serves on an advisory board for Novo Nordisk; and consults for Genfit, Third Rock Ventures, and Boehringer Ingelheim.

Writing Assistance

None.

Author Contributions

Authorship was determined using ICMJE recommendations.

Ge: Concept and design; data extraction; analysis and interpretation of data; drafting of manuscript; critical revision of the manuscript for important intellectual content; study supervision

Sun: Data extraction; analysis and interpretation of data; critical revision of the manuscript for important intellectual content

Owens: Concept and design; analysis and interpretation of data; critical revision of the manuscript for important intellectual content

Galvez: Analysis and interpretation of data; critical revision of the manuscript for important intellectual content

Gologorskaya: Analysis and interpretation of data; critical revision of the manuscript for important intellectual content

J. Lai: Concept and design; critical revision of the manuscript for important intellectual content; study supervision

Pletcher: Concept and design; critical revision of the manuscript for important intellectual content; study supervision

K. Lai: Concept and design; critical revision of the manuscript for important intellectual content; study supervision

Acknowledgements

- The authors thank the UCSF AI Tiger Team, Academic Research Services, Research Information Technology, and the Chancellor’s Task Force for Generative AI for their software development, analytical, and technical support related to the use of Versa API gateway (the UCSF secure implementation of large language models and generative AI via API gateway), Versa chat (the chat user interface), and related data assets.

Abbreviations

AASLD: American Association for the Study of Liver Diseases
APIs: application programming interfaces
FHIR: Fast Healthcare Interoperability Resources
HBV: hepatitis B
HCC: hepatocellular carcinoma
GAI: generative artificial intelligence
GPT: generative pre-trained transformer
LLMs: large language models
PHI: protected health information
RAG: retrieval-augmented generation
UCSF: University of California, San Francisco

References

1.↵
Ge J, Li M, Delk MB, Lai JC. A comparison of large language model versus manual chart review for extraction of data elements from the electronic health record. medRxiv. September 1, 2023.
2.↵
Rahman M, Terano HJR, Rahman N, Salamzadeh A, Rahaman S. Chatgpt and academic research: A review and recommendations based on practical examples. J Educ, Mngt, and Dev Studies. 2023;3(1):1–12.
OpenUrl
3.↵
Nayak A, Alkaitis MS, Nayak K, Nikolov M, Weinfurt KP, Schulman K. Comparison of history of present illness summaries generated by a chatbot and senior internal medicine residents. JAMA Intern Med. 2023;183(9):1026–1027.
OpenUrl
4.↵
Han C, Kim DW, Kim S, et al. Evaluation Of GPT-4 for 10-Year Cardiovascular Risk Prediction: Insights from the UK Biobank and KoGES Data. 2023.
5.↵
ChatGPT: Optimizing Language Models for Dialogue. Accessed December 17, 2022. https://openai.com/blog/chatgpt/
6.↵
Ge J, Lai JC. Artificial intelligence-based text generators in hepatology: ChatGPT is just the beginning. Hepatol Commun. 2023;7(4).
7.↵
Ji Z, Lee N, Frieske R, et al. Survey of hallucination in natural language generation. ACM Comput Surv. November 17, 2022.
8.↵
Densen P. Challenges and opportunities facing medical education. Trans Am Clin Climatol Assoc. 2011;122:48–58.
OpenUrl PubMed
9.↵
Landhuis E. Scientific literature: Information overload. Nature. 2016;535(7612):457-458.
OpenUrl CrossRef
10.↵
Practice Guidelines | AASLD. Accessed November 8, 2023. https://www.aasld.org/practice-guidelines
11.↵
GPT-3.5 Turbo fine-tuning and API updates. Accessed November 8, 2023. https://openai.com/blog/gpt-3-5-turbo-fine-tuning-and-api-updates
12.↵
Jiang LY, Liu XC, Nejatian NP, et al. Health system-scale language models are all-purpose prediction engines. Nature. 2023;619(7969):357–362.
OpenUrl
13.↵
Kojima T, Gu SS, Reid M, Matsuo Y, Iwasawa Y. Large Language Models are Zero-Shot Reasoners. arXiv. 2022.
14.
Brown TB, Mann B, Ryder N, et al. Language models are few-shot learners. arXiv. 2020.
15.↵
Parnami A, Lee M. Learning from Few Examples: A Summary of Approaches to Few-Shot Learning. arXiv. 2022.
16.↵
RAG and generative AI - Azure Cognitive Search | Microsoft Learn. Accessed November 8, 2023. https://learn.microsoft.com/en-us/azure/search/retrieval-augmented-generation-overview
17.↵
Wang Y, Ma X, Chen W. Augmenting Black-box LLMs with Medical Textbooks for Clinical Question Answering. arXiv. 2023.
18.↵
Lozano A, Fleming SL, Chiang C-C, Shah N. Clinfo.ai: An Open-Source Retrieval-Augmented Large Language Model System for Answering Medical Questions using Scientific Literature. arXiv. 2023.
19.↵
Khene Z-E, Bigot P, Mathieu R, Rouprêt M, Bensalah K, French Committee of Urologic Oncology. Development of a personalized chat model based on the european association of urology oncology guidelines: harnessing the power of generative artificial intelligence in clinical practice. Eur Urol Oncol. July 18, 2023.
20.↵
Ferber D, Kather JN. Large Language Models in Uro-oncology. Eur Urol Oncol. October 13, 2023.
21.↵
Embeddings - OpenAI API. Accessed October 27, 2023. https://platform.openai.com/docs/guides/embeddings
22.↵
New and improved embedding model. Accessed October 27, 2023. https://openai.com/blog/new-and-improved-embedding-model
23.↵
Mahfouz M, Nguyen H, Tu J, et al. Knowledge and perceptions of hepatitis B and hepatocellular carcinoma screening guidelines among trainees: A tale of three centers. Dig Dis Sci. 2020;65(9):2551–2561.
OpenUrl
24.↵
Yeo YH, Samaan JS, Ng WH, et al. Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma. Clin Mol Hepatol. 2023;29(3):721–732.
OpenUrl
25.↵
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. arXiv. 2017.
26.↵
Marrero JA, Kulik LM, Sirlin CB, et al. Diagnosis, staging, and management of hepatocellular carcinoma: 2018 practice guidance by the american association for the study of liver diseases. Hepatology. 2018;68(2):723–750.
OpenUrl CrossRef PubMed
27.↵
Singal AG, Llovet JM, Yarchoan M, et al. AASLD Practice Guidance on prevention, diagnosis, and treatment of hepatocellular carcinoma. Hepatology. May 22, 2023.
28.↵
Fang X, Che S, Mao M, Zhang H, Zhao M, Zhao X. [2309.09825] Bias of AI-Generated Content: An Examination of News Produced by Large Language Models. arXiv. September 18, 2023.
29.
Zack T, Lehman E, Suzgun M, et al. Coding Inequity: Assessing GPT-4’s Potential for Perpetuating Racial and Gender Biases in Healthcare. medRxiv. July 16, 2023.
30.↵
DeCamp M, Lindvall C. Latent bias and the implementation of artificial intelligence in medicine. J Am Med Inform Assoc. 2020;27(12):2020–2023.
OpenUrl CrossRef PubMed
31.↵
Braunstein ML. Pre-FHIR Interoperability and Clinical Decision Support Standards. In: Health Informatics on FHIR: How Hl7’s New API Is Transforming Healthcare. Springer International Publishing; 2018:151–177.
32.
Lee WM, Larson AM, Stravitz RT. AASLD position paper: the management of acute liver failure: update 2011. Hepatology. 2011;55(3):965–967.
OpenUrl
33.
Crabb DW, Im GY, Szabo G, Mellinger JL, Lucey MR. Diagnosis and Treatment of Alcohol-Associated Liver Diseases: 2019 Practice Guidance From the American Association for the Study of Liver Diseases. Hepatology. 2020;71(1):306–333.
OpenUrl
34.
Biggins SW, Angeli P, Garcia-Tsao G, et al. Diagnosis, evaluation, and management of ascites, spontaneous bacterial peritonitis and hepatorenal syndrome: 2021 practice guidance by the american association for the study of liver diseases. Hepatology. 2021;74(2):1014–1048.
OpenUrl CrossRef PubMed
35.
Mack CL, Adams D, Assis DN, et al. Diagnosis and management of autoimmune hepatitis in adults and children: 2019 practice guidance and guidelines from the american association for the study of liver diseases. Hepatology. 2020;72(2):671–722.
OpenUrl CrossRef PubMed
36.
Fontana RJ, Liou I, Reuben A, et al. AASLD practice guidance on drug, herbal, and dietary supplement-induced liver injury. Hepatology. 2023;77(3):1036–1065.
OpenUrl
37.
Bacon BR, Adams PC, Kowdley KV, Powell LW, Tavill AS, American Association for the Study of Liver Diseases. Diagnosis and management of hemochromatosis: 2011 practice guideline by the American Association for the Study of Liver Diseases. Hepatology. 2011;54(1):328–343.
OpenUrl CrossRef PubMed
38.
Vilstrup H, Amodio P, Bajaj J, et al. Hepatic encephalopathy in chronic liver disease: 2014 Practice Guideline by the American Association for the Study of Liver Diseases and the European Association for the Study of the Liver. Hepatology. 2014;60(2):715–735.
OpenUrl CrossRef PubMed Web of Science
39.
Terrault NA, Bzowej NH, Chang K-M, et al. AASLD guidelines for treatment of chronic hepatitis B. Hepatology. 2016;63(1):261–283.
OpenUrl CrossRef PubMed
40.
Terrault NA, Lok ASF, McMahon BJ, et al. Update on prevention, diagnosis, and treatment of chronic hepatitis B: AASLD 2018 hepatitis B guidance. Hepatology. 2018;67(4):1560–1599.
OpenUrl CrossRef PubMed
41.
Bhattacharya D, Aronsohn A, Price J, Lo Re V, AASLD-IDSA HCV Guidance Panel. Hepatitis C Guidance 2023 Update: AASLD-IDSA Recommendations for Testing, Managing, and Treating Hepatitis C Virus Infection. Clin Infect Dis. May 25, 2023.
42.
Rockey DC, Caldwell SH, Goodman ZD, Nelson RC, Smith AD, American Association for the Study of Liver Diseases. Liver biopsy. Hepatology. 2009;49(3):1017–1044.
OpenUrl CrossRef PubMed Web of Science
43.
Martin P, DiMartini A, Feng S, Brown R, Fallon M. Evaluation for liver transplantation in adults: 2013 practice guideline by the American Association for the Study of Liver Diseases and the American Society of Transplantation. Hepatology. 2014;59(3):1144–1165.
OpenUrl CrossRef PubMed
44.
Squires RH, Ng V, Romero R, et al. Evaluation of the pediatric patient for liver transplantation: 2014 practice guideline by the American Association for the Study of Liver Diseases, American Society of Transplantation and the North American Society for Pediatric Gastroenterology, Hepatology and Nutrition. Hepatology. 2014;60(1):362–398.
OpenUrl CrossRef PubMed
45.
Lucey MR, Terrault N, Ojo L, et al. Long-term management of the successful adult liver transplant: 2012 practice guideline by the American Association for the Study of Liver Diseases and the American Society of Transplantation. Liver Transpl. 2013;19(1):3–26.
OpenUrl CrossRef PubMed
46.
Kelly DA, Bucuvalas JC, Alonso EM, et al. Long-term medical management of the pediatric patient after liver transplantation: 2013 practice guideline by the American Association for the Study of Liver Diseases and the American Society of Transplantation. Liver Transpl. 2013;19(8):798–825.
OpenUrl CrossRef PubMed
47.
Lai JC, Tandon P, Bernal W, et al. Malnutrition, frailty, and sarcopenia in patients with cirrhosis: 2021 practice guidance by the american association for the study of liver diseases. Hepatology. 2021;74(3):1611–1644.
OpenUrl PubMed
48.
Rinella ME, Neuschwander-Tetri BA, Siddiqui MS, et al. AASLD Practice Guidance on the clinical assessment and management of nonalcoholic fatty liver disease. Hepatology. 2023;77(5):1797–1835.
OpenUrl
49.
Rogal SS, Hansen L, Patel A, et al. AASLD Practice Guidance: Palliative care and symptom-based management in decompensated cirrhosis. Hepatology. 2022;76(3):819–853.
OpenUrl
50.
Kaplan DE, Bosch J, Ripoll C, et al. AASLD practice guidance on risk stratification and management of portal hypertension and varices in cirrhosis. Hepatology. October 23, 2023.
51.
Lee EW, Eghtesad B, Garcia-Tsao G, et al. AASLD practice guidance on the use of TIPS, variceal embolization, and retrograde transvenous obliteration in the management of variceal hemorrhage. Hepatology. June 30, 2023.
52.
Lindor KD, Bowlus CL, Boyer J, Levy C, Mayo M. Primary Biliary Cholangitis: 2018 Practice Guidance from the American Association for the Study of Liver Diseases. Hepatology. 2019;69(1):394–419.
OpenUrl CrossRef PubMed
53.
Lindor KD, Bowlus CL, Boyer J, Levy C, Mayo M. Primary biliary cholangitis: 2021 practice guidance update from the American Association for the Study of Liver Diseases. Hepatology. 2022;75(4):1012–1013.
OpenUrl PubMed
54.
Bowlus CL, Arrivé L, Bergquist A, et al. AASLD practice guidance on primary sclerosing cholangitis and cholangiocarcinoma. Hepatology. 2023;77(2):659–702.
OpenUrl
55.
Sarkar M, Brady CW, Fleckenstein J, et al. Reproductive health and liver disease: practice guidance by the american association for the study of liver diseases. Hepatology. 2021;73(1):318–365.
OpenUrl PubMed
56.
Northup PG, Garcia-Pagan JC, Garcia-Tsao G, et al. Vascular liver disorders, portal vein thrombosis, and procedural bleeding in patients with liver disease: 2020 practice guidance by the american association for the study of liver diseases. Hepatology. 2021;73(1):366–413.
OpenUrl CrossRef PubMed
57.
Schilsky ML, Roberts EA, Bronstein JM, et al. A multidisciplinary approach to the diagnosis and management of Wilson disease: 2022 Practice Guidance on Wilson disease from the American Association for the Study of Liver Diseases. Hepatology. December 7, 2022.
58.
Kanwal F, Tapper EB, Ho C, et al. Development of quality measures in cirrhosis by the practice metrics committee of the american association for the study of liver diseases. Hepatology. 2019;69(4):1787–1797.
OpenUrl
59.
Asrani SK, Ghabril MS, Kuo A, et al. Quality measures in HCC care by the Practice Metrics Committee of the American Association for the Study of Liver Diseases. Hepatology. 2022;75(5):1289–1299.
OpenUrl
60.
Karvellas CJ, Bajaj JS, Kamath PS, et al. AASLD Practice guidance on Acute-on-chronic liver failure and the management of critically Ill patients with cirrhosis. Hepatology. November 9, 2023.

View the discussion thread.

Posted November 11, 2023.

Download PDF

Data/Code

Citation Tools

Subject Area

Gastroenterology

Subject Areas

All Articles

Addiction Medicine (412)
Allergy and Immunology (726)
Anesthesia (214)
Cardiovascular Medicine (3107)
Dentistry and Oral Medicine (349)
Dermatology (263)
Emergency Medicine (463)
Endocrinology (including Diabetes Mellitus and Metabolic Disease) (1100)
Epidemiology (13046)
Forensic Medicine (13)
Gastroenterology (862)
Genetic and Genomic Medicine (4866)
Geriatric Medicine (449)
Health Economics (751)
Health Informatics (3068)
Health Policy (1108)
Health Systems and Quality Improvement (1135)
Hematology (410)
HIV/AIDS (962)
Infectious Diseases (except HIV/AIDS) (14350)
Intensive Care and Critical Care Medicine (885)
Medical Education (453)
Medical Ethics (120)
Nephrology (502)
Neurology (4631)
Nursing (247)
Nutrition (689)
Obstetrics and Gynecology (847)
Occupational and Environmental Health (764)
Oncology (2393)
Ophthalmology (677)
Orthopedics (270)
Otolaryngology (333)
Pain Medicine (306)
Palliative Medicine (88)
Pathology (516)
Pediatrics (1243)
Pharmacology and Therapeutics (521)
Primary Care Research (522)
Psychiatry and Clinical Psychology (3976)
Public and Global Health (7201)
Radiology and Imaging (1606)
Rehabilitation Medicine and Physical Therapy (958)
Respiratory Medicine (944)
Rheumatology (460)
Sexual and Reproductive Health (478)
Sports Medicine (403)
Surgery (514)
Toxicology (65)
Transplantation (222)
Urology (190)

[1] 1.↵
Ge J, Li M, Delk MB, Lai JC. A comparison of large language model versus manual chart review for extraction of data elements from the electronic health record. medRxiv. September 1, 2023.

[2] 2.↵
Rahman M, Terano HJR, Rahman N, Salamzadeh A, Rahaman S. Chatgpt and academic research: A review and recommendations based on practical examples. J Educ, Mngt, and Dev Studies. 2023;3(1):1–12.
OpenUrl

[3] 3.↵
Nayak A, Alkaitis MS, Nayak K, Nikolov M, Weinfurt KP, Schulman K. Comparison of history of present illness summaries generated by a chatbot and senior internal medicine residents. JAMA Intern Med. 2023;183(9):1026–1027.
OpenUrl

[4] 4.↵
Han C, Kim DW, Kim S, et al. Evaluation Of GPT-4 for 10-Year Cardiovascular Risk Prediction: Insights from the UK Biobank and KoGES Data. 2023.

[5] 5.↵
ChatGPT: Optimizing Language Models for Dialogue. Accessed December 17, 2022. https://openai.com/blog/chatgpt/

[6] 6.↵
Ge J, Lai JC. Artificial intelligence-based text generators in hepatology: ChatGPT is just the beginning. Hepatol Commun. 2023;7(4).

[7] 7.↵
Ji Z, Lee N, Frieske R, et al. Survey of hallucination in natural language generation. ACM Comput Surv. November 17, 2022.

[8] 8.↵
Densen P. Challenges and opportunities facing medical education. Trans Am Clin Climatol Assoc. 2011;122:48–58.
OpenUrl PubMed

[9] 9.↵
Landhuis E. Scientific literature: Information overload. Nature. 2016;535(7612):457-458.
OpenUrl CrossRef

[10] 10.↵
Practice Guidelines | AASLD. Accessed November 8, 2023. https://www.aasld.org/practice-guidelines

[11] 11.↵
GPT-3.5 Turbo fine-tuning and API updates. Accessed November 8, 2023. https://openai.com/blog/gpt-3-5-turbo-fine-tuning-and-api-updates

[12] 12.↵
Jiang LY, Liu XC, Nejatian NP, et al. Health system-scale language models are all-purpose prediction engines. Nature. 2023;619(7969):357–362.
OpenUrl

[13] 13.↵
Kojima T, Gu SS, Reid M, Matsuo Y, Iwasawa Y. Large Language Models are Zero-Shot Reasoners. arXiv. 2022.

[14] 14.
Brown TB, Mann B, Ryder N, et al. Language models are few-shot learners. arXiv. 2020.

[15] 15.↵
Parnami A, Lee M. Learning from Few Examples: A Summary of Approaches to Few-Shot Learning. arXiv. 2022.

[16] 16.↵
RAG and generative AI - Azure Cognitive Search | Microsoft Learn. Accessed November 8, 2023. https://learn.microsoft.com/en-us/azure/search/retrieval-augmented-generation-overview

[17] 17.↵
Wang Y, Ma X, Chen W. Augmenting Black-box LLMs with Medical Textbooks for Clinical Question Answering. arXiv. 2023.

[18] 18.↵
Lozano A, Fleming SL, Chiang C-C, Shah N. Clinfo.ai: An Open-Source Retrieval-Augmented Large Language Model System for Answering Medical Questions using Scientific Literature. arXiv. 2023.

[19] 19.↵
Khene Z-E, Bigot P, Mathieu R, Rouprêt M, Bensalah K, French Committee of Urologic Oncology. Development of a personalized chat model based on the european association of urology oncology guidelines: harnessing the power of generative artificial intelligence in clinical practice. Eur Urol Oncol. July 18, 2023.

[20] 20.↵
Ferber D, Kather JN. Large Language Models in Uro-oncology. Eur Urol Oncol. October 13, 2023.

[21] 21.↵
Embeddings - OpenAI API. Accessed October 27, 2023. https://platform.openai.com/docs/guides/embeddings

[22] 22.↵
New and improved embedding model. Accessed October 27, 2023. https://openai.com/blog/new-and-improved-embedding-model

[23] 23.↵
Mahfouz M, Nguyen H, Tu J, et al. Knowledge and perceptions of hepatitis B and hepatocellular carcinoma screening guidelines among trainees: A tale of three centers. Dig Dis Sci. 2020;65(9):2551–2561.
OpenUrl

[24] 24.↵
Yeo YH, Samaan JS, Ng WH, et al. Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma. Clin Mol Hepatol. 2023;29(3):721–732.
OpenUrl

[25] 25.↵
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. arXiv. 2017.

[26] 26.↵
Marrero JA, Kulik LM, Sirlin CB, et al. Diagnosis, staging, and management of hepatocellular carcinoma: 2018 practice guidance by the american association for the study of liver diseases. Hepatology. 2018;68(2):723–750.
OpenUrl CrossRef PubMed

[27] 27.↵
Singal AG, Llovet JM, Yarchoan M, et al. AASLD Practice Guidance on prevention, diagnosis, and treatment of hepatocellular carcinoma. Hepatology. May 22, 2023.

[28] 28.↵
Fang X, Che S, Mao M, Zhang H, Zhao M, Zhao X. [2309.09825] Bias of AI-Generated Content: An Examination of News Produced by Large Language Models. arXiv. September 18, 2023.

[29] 29.
Zack T, Lehman E, Suzgun M, et al. Coding Inequity: Assessing GPT-4’s Potential for Perpetuating Racial and Gender Biases in Healthcare. medRxiv. July 16, 2023.

[30] 30.↵
DeCamp M, Lindvall C. Latent bias and the implementation of artificial intelligence in medicine. J Am Med Inform Assoc. 2020;27(12):2020–2023.
OpenUrl CrossRef PubMed

[31] 31.↵
Braunstein ML. Pre-FHIR Interoperability and Clinical Decision Support Standards. In: Health Informatics on FHIR: How Hl7’s New API Is Transforming Healthcare. Springer International Publishing; 2018:151–177.

[32] 32.
Lee WM, Larson AM, Stravitz RT. AASLD position paper: the management of acute liver failure: update 2011. Hepatology. 2011;55(3):965–967.
OpenUrl

[33] 33.
Crabb DW, Im GY, Szabo G, Mellinger JL, Lucey MR. Diagnosis and Treatment of Alcohol-Associated Liver Diseases: 2019 Practice Guidance From the American Association for the Study of Liver Diseases. Hepatology. 2020;71(1):306–333.
OpenUrl

[34] 34.
Biggins SW, Angeli P, Garcia-Tsao G, et al. Diagnosis, evaluation, and management of ascites, spontaneous bacterial peritonitis and hepatorenal syndrome: 2021 practice guidance by the american association for the study of liver diseases. Hepatology. 2021;74(2):1014–1048.
OpenUrl CrossRef PubMed

[35] 35.
Mack CL, Adams D, Assis DN, et al. Diagnosis and management of autoimmune hepatitis in adults and children: 2019 practice guidance and guidelines from the american association for the study of liver diseases. Hepatology. 2020;72(2):671–722.
OpenUrl CrossRef PubMed

[36] 36.
Fontana RJ, Liou I, Reuben A, et al. AASLD practice guidance on drug, herbal, and dietary supplement-induced liver injury. Hepatology. 2023;77(3):1036–1065.
OpenUrl

[37] 37.
Bacon BR, Adams PC, Kowdley KV, Powell LW, Tavill AS, American Association for the Study of Liver Diseases. Diagnosis and management of hemochromatosis: 2011 practice guideline by the American Association for the Study of Liver Diseases. Hepatology. 2011;54(1):328–343.
OpenUrl CrossRef PubMed

[38] 38.
Vilstrup H, Amodio P, Bajaj J, et al. Hepatic encephalopathy in chronic liver disease: 2014 Practice Guideline by the American Association for the Study of Liver Diseases and the European Association for the Study of the Liver. Hepatology. 2014;60(2):715–735.
OpenUrl CrossRef PubMed Web of Science

[39] 39.
Terrault NA, Bzowej NH, Chang K-M, et al. AASLD guidelines for treatment of chronic hepatitis B. Hepatology. 2016;63(1):261–283.
OpenUrl CrossRef PubMed

[40] 40.
Terrault NA, Lok ASF, McMahon BJ, et al. Update on prevention, diagnosis, and treatment of chronic hepatitis B: AASLD 2018 hepatitis B guidance. Hepatology. 2018;67(4):1560–1599.
OpenUrl CrossRef PubMed

[41] 41.
Bhattacharya D, Aronsohn A, Price J, Lo Re V, AASLD-IDSA HCV Guidance Panel. Hepatitis C Guidance 2023 Update: AASLD-IDSA Recommendations for Testing, Managing, and Treating Hepatitis C Virus Infection. Clin Infect Dis. May 25, 2023.

[42] 42.
Rockey DC, Caldwell SH, Goodman ZD, Nelson RC, Smith AD, American Association for the Study of Liver Diseases. Liver biopsy. Hepatology. 2009;49(3):1017–1044.
OpenUrl CrossRef PubMed Web of Science

[43] 43.
Martin P, DiMartini A, Feng S, Brown R, Fallon M. Evaluation for liver transplantation in adults: 2013 practice guideline by the American Association for the Study of Liver Diseases and the American Society of Transplantation. Hepatology. 2014;59(3):1144–1165.
OpenUrl CrossRef PubMed

[44] 44.
Squires RH, Ng V, Romero R, et al. Evaluation of the pediatric patient for liver transplantation: 2014 practice guideline by the American Association for the Study of Liver Diseases, American Society of Transplantation and the North American Society for Pediatric Gastroenterology, Hepatology and Nutrition. Hepatology. 2014;60(1):362–398.
OpenUrl CrossRef PubMed

[45] 45.
Lucey MR, Terrault N, Ojo L, et al. Long-term management of the successful adult liver transplant: 2012 practice guideline by the American Association for the Study of Liver Diseases and the American Society of Transplantation. Liver Transpl. 2013;19(1):3–26.
OpenUrl CrossRef PubMed

[46] 46.
Kelly DA, Bucuvalas JC, Alonso EM, et al. Long-term medical management of the pediatric patient after liver transplantation: 2013 practice guideline by the American Association for the Study of Liver Diseases and the American Society of Transplantation. Liver Transpl. 2013;19(8):798–825.
OpenUrl CrossRef PubMed

[47] 47.
Lai JC, Tandon P, Bernal W, et al. Malnutrition, frailty, and sarcopenia in patients with cirrhosis: 2021 practice guidance by the american association for the study of liver diseases. Hepatology. 2021;74(3):1611–1644.
OpenUrl PubMed

[48] 48.
Rinella ME, Neuschwander-Tetri BA, Siddiqui MS, et al. AASLD Practice Guidance on the clinical assessment and management of nonalcoholic fatty liver disease. Hepatology. 2023;77(5):1797–1835.
OpenUrl

[49] 49.
Rogal SS, Hansen L, Patel A, et al. AASLD Practice Guidance: Palliative care and symptom-based management in decompensated cirrhosis. Hepatology. 2022;76(3):819–853.
OpenUrl

[50] 50.
Kaplan DE, Bosch J, Ripoll C, et al. AASLD practice guidance on risk stratification and management of portal hypertension and varices in cirrhosis. Hepatology. October 23, 2023.

[51] 51.
Lee EW, Eghtesad B, Garcia-Tsao G, et al. AASLD practice guidance on the use of TIPS, variceal embolization, and retrograde transvenous obliteration in the management of variceal hemorrhage. Hepatology. June 30, 2023.

[52] 52.
Lindor KD, Bowlus CL, Boyer J, Levy C, Mayo M. Primary Biliary Cholangitis: 2018 Practice Guidance from the American Association for the Study of Liver Diseases. Hepatology. 2019;69(1):394–419.
OpenUrl CrossRef PubMed

[53] 53.
Lindor KD, Bowlus CL, Boyer J, Levy C, Mayo M. Primary biliary cholangitis: 2021 practice guidance update from the American Association for the Study of Liver Diseases. Hepatology. 2022;75(4):1012–1013.
OpenUrl PubMed

[54] 54.
Bowlus CL, Arrivé L, Bergquist A, et al. AASLD practice guidance on primary sclerosing cholangitis and cholangiocarcinoma. Hepatology. 2023;77(2):659–702.
OpenUrl

[55] 55.
Sarkar M, Brady CW, Fleckenstein J, et al. Reproductive health and liver disease: practice guidance by the american association for the study of liver diseases. Hepatology. 2021;73(1):318–365.
OpenUrl PubMed

[56] 56.
Northup PG, Garcia-Pagan JC, Garcia-Tsao G, et al. Vascular liver disorders, portal vein thrombosis, and procedural bleeding in patients with liver disease: 2020 practice guidance by the american association for the study of liver diseases. Hepatology. 2021;73(1):366–413.
OpenUrl CrossRef PubMed

[57] 57.
Schilsky ML, Roberts EA, Bronstein JM, et al. A multidisciplinary approach to the diagnosis and management of Wilson disease: 2022 Practice Guidance on Wilson disease from the American Association for the Study of Liver Diseases. Hepatology. December 7, 2022.

[58] 58.
Kanwal F, Tapper EB, Ho C, et al. Development of quality measures in cirrhosis by the practice metrics committee of the american association for the study of liver diseases. Hepatology. 2019;69(4):1787–1797.
OpenUrl

[59] 59.
Asrani SK, Ghabril MS, Kuo A, et al. Quality measures in HCC care by the Practice Metrics Committee of the American Association for the Study of Liver Diseases. Hepatology. 2022;75(5):1289–1299.
OpenUrl

[60] 60.
Karvellas CJ, Bajaj JS, Kamath PS, et al. AASLD Practice guidance on Acute-on-chronic liver failure and the management of critically Ill patients with cirrhosis. Hepatology. November 9, 2023.

Development of a Liver Disease-Specific Large Language Model Chat Interface using Retrieval Augmented Generation

Abstract

Introduction

Methods

Results

Discussion

Data Availability

Financial/Grant Support

Disclosures

Writing Assistance

Author Contributions

Acknowledgements

Abbreviations

References

Citation Manager Formats

Subject Area