Utilizing Large Language Models to Simplify Radiology Reports: a comparative analysis of ChatGPT3.5, ChatGPT4.0, Google Bard, and Microsoft Bing
================================================================================================================================================

* Rushabh Doshi
* Kanhai Amin
* Pavan Khosla
* Simar Bajaj
* Sophie Chheang
* Howard P. Forman

## Abstract

This paper investigates the application of Large Language Models (LLMs), specifically OpenAI’s ChatGPT3.5, ChatGPT4.0, Google Bard, and Microsoft Bing, in simplifying radiology reports, thus potentially enhancing patient understanding. We examined 254 anonymized radiology reports from diverse examination types and used three different prompts to guide the LLMs’ simplification processes. The resulting simplified reports were evaluated using four established readability indices. All LLMs significantly simplified the reports, but performance varied based on the prompt used and the specific model. The ChatGPT models performed best when additional context was provided (i.e., specifying user as a patient or requesting simplification at the 7th grade level). Our findings suggest that LLMs can effectively simplify radiology reports, although improvements are needed to ensure accurate clinical representation and optimal readability. These models have the potential to improve patient health literacy, patient-provider communication, and ultimately, health outcomes.

## Introduction

Imaging reports are a cornerstone of medical decision-making, providing information for diagnosis, treatment planning, and monitoring disease progression. Historically, only the radiologist and referring provider accessed these reports, but the rise of telemedicine and patient portals, as well as regulatory changes, most recently the 21st Century Cures Act, have increased access to electronic health records and transformed patients’ relationship with their medical information.1–4

Digital health literacy, defined as the degree to which a patient can obtain, process, and understand electronic information,5 is critical for patients to fully benefit from this transformation.6 Radiology reports, however, are filled with technical jargon, making them relatively uninterpretable to individuals without a clinical background.7 Expanded access to these reports could thus exacerbate patient anxiety, misunderstanding, and emotional distress, particularly with abnormal findings.8–10 Improving radiological literacy could help address these concerns, with other spillover benefits to safety and transparency,11 shared decision-making,12 treatment compliance,13 and reducing health disparities.14

Fifteen years ago, The Joint Commission mandated that health care organizations “encourage patients’ active involvement in their own care as a patient safety strategy,”11 and a linchpin of that requirement is data transparency and accessibility. Launched in 2010, the OpenNotes program, which allowed patients to access their electronic medical records, demonstrated that 99% of patients wanted the program to continue and 85% reported that access would inform their future provider and health system choices.15 In radiology, approaches such as leaving a summary statement at the end of the report,16 structured templates with standardized lexicon,17,18 and video reports19 have all been used to improve digital health literacy. Largely underexplored are emerging artificial intelligence (AI) tools to support patient understanding.

Using deep learning techniques, large language models (LLMs), such as OpenAI’s ChatGPT, Google Bard, and Microsoft Bing, have emerged as promising tools for the simplification of complex medical information.20,21 More specifically, these models leverage natural language processing (NLP) technologies to generate human-like text in response to a user’s prompts. To date, a comparative analysis of these LLMs in radiology has not been fully explored.

In this study, we compared the performance of several popular LLMs in producing simplified reports. Our objective was to evaluate the effectiveness of LLMs and provide insights into their potential for enhancing patient health literacy and promoting better patient–provider communication.

## Methods

To investigate the efficacy of four Large Language Models (LLMs) in simplifying radiology reports, we designed a comparative study focusing on OpenAI’s ChatGPT3.5 and ChatGPT4.0, Google Bard, and Microsoft Bing. Given that Bing has three conversational styles, we elected to use the precise setting over the creative or balanced settings. Our primary outcome was readability score, using an existing open-source dataset of reports.

### Dataset Selection and Modification

We used the MIMIC-III database, which is a comprehensive public database from the Beth Israel Deaconess Medical Center.22,23 A random selection of 254 anonymized reports was made to ensure representation of various examination types (MRI, CT, US (ultrasound), X-ray, Mammogram), anatomical regions, and lengths. This dataset allowed us to evaluate LLM performance across diverse clinical situations.

The reports in the datasets contained redacted information, so we altered the reports to state “Dr. Smith” where a physician name was redacted. Further, we changed redacted dates to “prior,” as many reports compared findings to previous studies.

### Prompt Selection

We first tested the prompt “Simplify this radiology report:” (Prompt 1). We then tested the prompt “I am a patient. Simplify this radiology report:” (Prompt 2).24 Lastly, we tested the prompt, “Simplify this radiology report at the 7th grade level” (Prompt 3). Each prompt was followed with the radiology reports from the MIMIC-III database.

### Processing Radiology Reports and Readability Assessment

Each of the 254 radiology reports were processed individually by the 4 LLMs (accessed on May 1st, 2023: ChatGPT3.5 Legacy, ChatGPT4.0, Microsoft Bing, Google Bard) generating simplified versions of the original reports for each of the three prompts. In order to standardize the outputs and ensure equal comparison, we removed all formatting, including bullet points and numbered lists, as is consistent with previous readability studies.25,26 Ancillary information, such as “Sure I understand you would like a simplified version of your radiology report” and “please note I am not a medical professional,” was also removed to focus the analysis on the clinical content.

We assessed the LLMs’ ability to simplify complex radiology reports by employing four established readability indices: Gunning Fog (GF), Flesch-Kincaid Grade Level (FK), Automated Readability Index (ARI), and Coleman-Liau (CL) indices.27 Each index outputs a score which corresponds to a reading grade level (RGL). RGL relates directly to educational attainment: an RGL of 6 corresponds to a sixth-grade level, an RGL of 12 corresponds to a high school senior level, and an RGL of 17 corresponds to a four-year college graduate level.28–31

As previously described,25 we averaged the GF, FK, ARI, and CL readability scores for each output to calculate an averaged reading grade level score (aRGL). We applied the non-parametric Wilcoxon signed-rank and rank-sum tests to compare RGLs and aRGLs.

## Results

We tested the LLMs with the 3 distinct prompts across 5 imaging modalities: X-ray (N=45), US (N=11), MRI (N=47), CT (N=107), and mammogram (N=33). Original radiologist reports had a median aRGL of 17.2 overall, with X-rays at 13.7, ultrasounds at 14.6, MRIs at 16.5, CTs at 18.4, and mammograms at 18.8 (Table 1). When comparing original radiologist reports, X-ray reports were significantly more readable than CT, mammogram, and MRI reports (p<0.001), and ultrasound reports were significantly more readable than reports for CTs and mammograms (p<0.001, Suppl. Fig. 2). Despite these relative differences, original X-ray and ultrasound reports were still approximately at the college RGLs.

View this table:
[Table 1:](http://medrxiv.org/content/early/2023/06/07/2023.06.04.23290786/T1)

Table 1: 
Median of the aRGL for each LLM and prompt based on examination type.

All four LLMs significantly simplified original radiology reports from baseline complexity across all three prompts for MRI, CT, and mammogram (Figures 1-3, Suppl. Fig. 3). For X-ray and ultrasound, ChatGPT3.5, ChatGPT4.0, and Bing similarly achieved statistically significant simplification across all prompts, but Bard only simplified ultrasounds with Prompt 1 and X-rays with Prompt 2 and 3.

![Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/06/07/2023.06.04.23290786/F1.medium.gif)

[Figure 1.](http://medrxiv.org/content/early/2023/06/07/2023.06.04.23290786/F1)

Figure 1. 
Readability scores of radiologist reports and LLMs using Prompt 1 – “Simplify this radiology finding:” *, **, \***|, \**\*|\* correspond to p<0.05, p<0.01, p<0.001, and p<0.0001, respectively.

![Figure 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/06/07/2023.06.04.23290786/F2.medium.gif)

[Figure 2.](http://medrxiv.org/content/early/2023/06/07/2023.06.04.23290786/F2)

Figure 2. 
Readability scores of radiologist reports and LLMs using Prompt2 – “I am a patient. Simplify this radiology finding:” *, **, \***|, \**\*|\* correspond to p<0.05, p<0.01, p<0.001, and p<0.0001, respectively.

![Figure 3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/06/07/2023.06.04.23290786/F3.medium.gif)

[Figure 3.](http://medrxiv.org/content/early/2023/06/07/2023.06.04.23290786/F3)

Figure 3. 
Readability scores of radiologist reports and LLMs using Prompt 3 – “Simplify this radiology finding at the 7th grade level:” *, **, \***|, \**\*|\* correspond to p<0.05, p<0.01, p<0.001, and p<0.0001, respectively.

### Prompt 1: “Simplify this radiology finding:”

Using Prompt 1, Bing and Bard achieved significantly lower combined median aRGL (9.4 and 8.1) than ChatGPT3.5 and ChatGPT4.0 (10.5 and 10.5, p<0.0001, Figure 1). Bard and Bing otherwise performed similarly, with Bard having the lowest combined median aRGLs for MRI (8.6, p<0.001), mammogram (9.3), and overall (9.1) reports and Bing for CT (8.1) and ultrasound (6.6). With Prompt 1, ChatGPT3.5 and ChatGPT4.0 performed similarly to each another, with typically higher aRGLs than Bing and Bard. The only exception was X-rays where ChatGPT3.5 had the lowest median aRGL (10.4), significantly lower than Bard and Bing.

### Prompt 2: “I am a patient. Simplify this radiology finding:”

With the added context of Prompt 2, ChatGPT3.5 and ChatGPT4.0 produced outputs with significantly lower aRGLs overall compared to Bard and Bing (p<0.0001) and for all imaging modalities tested (p<0.05, Figure 2). While there were no significant differences between ChatGPT3.5 and ChatGPT4.0, ChatGPT3.5 had the lowest median aRGL outputs for all imaging modalities (overall 7.6, CT 7.8, X-ray 8.5, MRI 7.2, US 6.5, and mammogram 7.0, Table 1).

### Prompt 3: “Simplify this radiology finding at the 7th grade level:”

Using Prompt 3 revealed similar outcomes to Prompt 2. The ChatGPT models significantly outperformed Bard and Bing overall and across all modalities (at least p<0.01, Figure 3), except for X-rays where no difference was found between Bard and ChatGPT4. Despite the two versions performing somewhat similarly, ChatGPT3.5 again produced the lowest aRGL outputs across our analysis (overall 6.7, CT 6.9, X-ray 8.0, MRI 6.6, ultrasound 4.6, and mammogram 5.5; Table 1).

### Prompt 1 vs Prompt 2 vs Prompt 3

Finally, we analyzed performance for each LLM across the three prompt combinations (Fig. 4). The ChatGPT models performed better at reducing aRGL with Prompt 2 and Prompt 3 than with Prompt 1 (p<0.0001); Prompt 3 also outperformed Prompt 2 (p<0.01). On the other hand, Bard and Bing performed better with Prompt 1 when compared to Prompt 2 and Prompt 3 (p<0.0001). We also observed that Prompt 3 outperforms Prompt 2 in producing lower aRGL outputs for Bard and Bing as well (p<0.0001).

![Figure 4.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/06/07/2023.06.04.23290786/F4.medium.gif)

[Figure 4.](http://medrxiv.org/content/early/2023/06/07/2023.06.04.23290786/F4)

Figure 4. 
Comparison of each prompt within LLM. *, **, \***|, \**\*|\* correspond to p<0.05, p<0.01, p<0.001, and p<0.0001, respectively.

## Discussion

In this study, we showed that the baseline readability of radiology reports across CT, X-ray, MRI, ultrasound, and mammograms are above the college graduate level but OpenAI’s ChatGPT3.5 and ChatGPT4.0, Google Bard, and Microsoft Bing can all successfully simplify these reports. The success of each of the LLMs varied, however, according to the specific prompt wording. Microsoft Bing and Google Bard performed best with a straightforward request to simplify a radiology report (Prompt 1), while the ChatGPT models performed best when provided with added context, such as the user specifying they were a patient (Prompt 2) or requesting simplification at the 7th grade level (Prompt 3).

Out of countless potential prompts that could have been tested, we focused our analysis on these three to determine how different types of context impacted readability. Prompt 1 was the simplest, specifying only that the inputted text will be a radiology report and that the LLM is tasked with simplifying it. The other two prompts offered additional context. For Prompt 3, we specified the 7th grade level because the American Medical Association and National Institutes of Health recommend that patient education materials should be written between the third- and seventh-grade levels given that the average American reads at the eighth-grade level.18,32,33 As expected, Prompt 3 outperformed Prompt 2 across all LLMs tested, although we recognize that requesting simplification at a specific grade level is less accessible for most users than specifying that “I am a patient.” Unexpectedly, however, Prompt 1 obtained the lowest aRGLs for two of the four LLMs tested, Microsoft Bing and Google Bard,—suggesting that richer context does not always equate to improved readability for every LLM.

Several explanations may underlie the observed differences in readability scores across the LLMs. For one, variations in training data and preprocessing techniques could impact the different LLMs’ ability to handle the jargon, abbreviations, and numerical information found in radiology reports.34 Furthermore, there may simply be fundamental differences in LLM architectures and algorithms that make certain models more amenable to simplifying medical information.35 We nonetheless found the differences between Microsoft Bing and Open AI’s ChatGPT models remarkable because Bing is powered by OpenAI. The finding that ChatGPT3.5 produced similar outputs to ChatGPT4.0 was also notable because it suggests that updated software does not automatically equate to improved performance, at least in regards to readability.

With patients already using these LLMs to simplify medical information,36 providers cannot ignore how the information-sharing landscape has changed and should consider accordingly. For instance, radiologists may consider using LLMs proactively to create a patient-friendly report, inputting it into the electronic medical record alongside their original report to help alleviate patient anxiety, misunderstanding, and emotional distress.37 Epic, Cerner, and other electronic health record companies may soon integrate LLMs into their software such that radiologists would not need to leave the interface to rely on third party tools.38

While LLMs demonstrate promise in helping patients better understand their radiology reports, the ultimate goal should be to strike a balance between readability and preserving clinical fidelity.39 Indeed, excessive simplification could contribute to clinical inaccuracies and actually cause patients greater anxiety, so the role of healthcare providers in facilitating communication and understanding should not be overlooked. We believe LLMs could eventually be used as supplementary tools to aid patient-provider communication rather than a replacement for personal interaction and discussion, however, it is essential to study the accuracy and fidelity of these outputs before recommending their usage on a wider-scale.40

This study has limitations. For one, radiologists or medical professionals did not assess simplified outputs, so we cannot speak to the accuracy, fidelity, and clinical utility of these reports. The readability metrics used in this study are similarly limited because they are language- and structure-focused, so these measures do not necessarily capture relevance or comprehensibility from a medical perspective. Furthermore, due to the formulaic nature of these metrics, outputted RGLs were sometimes above a meaningful grade level (i.e., a score of 30) and thus held little interpretability on their own. In this study, we were interested in assessing the readability of reports after LLM simplification and evaluating relative differences from baseline. Finally, we extracted radiology reports from the MIMIC-III dataset, which is derived from a single hospital, and employed a cross-sectional design, which may not be ideal for capturing continuous changes in LLMs’ performance. A longitudinal study design, as well as a larger, more diverse dataset, might have improved these results’ validity and generalizability.

## Conclusion

Our study highlights how radiology reports are complex medical documents that implement language and style above the college graduate reading level, but LLMs are powerful tools for simplifying these reports. Our findings should not be viewed as an endorsement for any particular LLM, instead demonstrating that each LLM tested has the ability to simplify radiology reports across modalities. Careful fine-tuning and customization for each LLM may ensure optimal simplification while maintaining the clinical integrity of the reports.

## Data Availability

All data produced in the present study are available upon reasonable request to the authors.

![eFigure 1:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/06/07/2023.06.04.23290786/F5.medium.gif)

[eFigure 1:](http://medrxiv.org/content/early/2023/06/07/2023.06.04.23290786/F5)

eFigure 1: 
GF, FK, ARI, and CL readability scores using Prompt 1. *, **, \***|, \**\*|\* correspond to p<0.05, p<0.01, p<0.001, and p<0.0001, respectively

![eFigure 2:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/06/07/2023.06.04.23290786/F6.medium.gif)

[eFigure 2:](http://medrxiv.org/content/early/2023/06/07/2023.06.04.23290786/F6)

eFigure 2: 
GF, FK, ARI, and CL readability scores using Prompt 2. *, **, \***|, \**\*|\* correspond to p<0.05, p<0.01, p<0.001, and p<0.0001, respectively

![eFigure 3:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/06/07/2023.06.04.23290786/F7.medium.gif)

[eFigure 3:](http://medrxiv.org/content/early/2023/06/07/2023.06.04.23290786/F7)

eFigure 3: 
GF, FK, ARI, and CL readability scores using Prompt 3. *, **, \***|, \**\*|\* correspond to p<0.05, p<0.01, p<0.001, and p<0.0001, respectively

View this table:
[eTable 1:](http://medrxiv.org/content/early/2023/06/07/2023.06.04.23290786/T2)

eTable 1: Median scores across LLM, prompt, modality, and readability index.
*, **, \***|, \**\*|\* correspond to p<0.05, p<0.01, p<0.001, and p<0.0001, respectively

View this table:
[eTable 2:](http://medrxiv.org/content/early/2023/06/07/2023.06.04.23290786/T3)

eTable 2: Comparison of each modality within each prompt and LLM.

View this table:
[eTable 3:](http://medrxiv.org/content/early/2023/06/07/2023.06.04.23290786/T4)

eTable 3: Comparison of each prompt and LLM combination within modality.
Description: P1-Prompt 1, P2 – Prompt 2, P3 – Prompt 3. Combination listed in matrix has lower median; significance levels are shown. *, **, \***|, \**\*|\* correspond to p<0.05, p<0.01, p<0.001, and p<0.0001, respectively

## Footnotes

*   Formatting, no text has been edited.

*   Received June 4, 2023.
*   Revision received June 6, 2023.
*   Accepted June 7, 2023.


*   © 2023, Posted by Cold Spring Harbor Laboratory

This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at [http://creativecommons.org/licenses/by-nc-nd/4.0/](http://creativecommons.org/licenses/by-nc-nd/4.0/)

## REFERENCES

1.  1.Dworkowitz A. Provider Obligations For Patient Portals Under The 21st Century Cures Act. Health Aff Forefr. Published online May 16, 2022. doi:10.1377/forefront.20220513.923426
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1377/forefront.20220513.923426&link_type=DOI) 

2.  2.Jain B, Bajaj SS, Stanford FC. All Infrastructure Is Health Infrastructure. Am J Public Health. 2022;112(1):24–26. doi:10.2105/AJPH.2021.306595
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2105/AJPH.2021.306595&link_type=DOI) 

3.  3.Lo B, Charow R, Laberge S, Bakas V, Williams L, Wiljer D. Why are Patient Portals Important in the Age of COVID-19? Reflecting on Patient and Team Experiences From a Toronto Hospital Network. J Patient Exp. 2022;9:23743735221112216. doi:10.1177/23743735221112216
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1177/23743735221112216&link_type=DOI) 

4.  4.Yee V, Bajaj SS, Stanford FC. Paradox of telemedicine: building or neglecting trust and equity. Lancet Digit Health. 2022;4(7):e480–e481. doi:10.1016/S2589-7500(22)00100-5
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S2589-7500(22)00100-5&link_type=DOI) 

5.  5.Dunn P, Hazzard E. Technology approaches to digital health literacy. Int J Cardiol. 2019;293:294–296. doi:10.1016/j.ijcard.2019.06.039
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ijcard.2019.06.039&link_type=DOI) 

6.  6.Rodriguez JA, Clark CR, Bates DW. Digital Health Equity as a Necessity in the 21st Century Cures Act Era. JAMA. 2020;323(23):2381–2382. doi:10.1001/jama.2020.7858
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1001/jama.2020.7858&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F06%2F07%2F2023.06.04.23290786.atom) 

7.  7.Bruno MA, Petscavage-Thomas JM, Mohr MJ, Bell SK, Brown SD. The “Open Letter”: Radiologists’ Reports in the Era of Patient Web Portals. J Am Coll Radiol. 2014;11(9):863–867. doi:10.1016/j.jacr.2014.03.014
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.jacr.2014.03.014&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24836272&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F06%2F07%2F2023.06.04.23290786.atom) 

8.  8.Kim E, Table B, Ring D, Fatehi A, Crijns TJ. Linguistic tones in MRI reports correlate with severity of pathology for rotator cuff tendinopathy. Arch Orthop Trauma Surg. Published online August 23, 2022. doi:10.1007/s00402-022-04543-w
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s00402-022-04543-w&link_type=DOI) 

9.  9.Bruno B, Steele S, Carbone J, Schneider K, Posk L, Rose SL. Informed or anxious: patient preferences for release of test results of increasing sensitivity on electronic patient portals. Health Technol. 2022;12(1):59–67. doi:10.1007/s12553-021-00628-5
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s12553-021-00628-5&link_type=DOI) 

10. 10.Mehan WA, Gee MS, Egan N, Jones PE, Brink JA, Hirsch JA. Immediate Radiology Report Access: A Burden to the Ordering Provider. Curr Probl Diagn Radiol. 2022;51(5):712–716. doi:10.1067/j.cpradiol.2022.01.012
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1067/j.cpradiol.2022.01.012&link_type=DOI) 

11. 11.Agency for Healthcare Research and Quality. Patient Engagement and Safety. Patient Safety Network. Published September 7, 2019. Accessed May 22, 2023. [https://psnet.ahrq.gov/primer/patient-engagement-and-safety](https://psnet.ahrq.gov/primer/patient-engagement-and-safety)
    
    
12. 12.Waseem N, Kircher S, Feliciano JL. Information Blocking and Oncology: Implications of the 21st Century Cures Act and Open Notes. JAMA Oncol. 2021;7(11):1609–1610. doi:10.1001/jamaoncol.2021.3520
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1001/jamaoncol.2021.3520&link_type=DOI) 

13. 13.Assiri G. The Impact of Patient Access to Their Electronic Health Record on Medication Management Safety: A Narrative Review. Saudi Pharm J SPJ. 2022;30(3):185–194. doi:10.1016/j.jsps.2022.01.001
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.jsps.2022.01.001&link_type=DOI) 

14. 14.Berkman ND, Sheridan SL, Donahue KE, et al. Health literacy interventions and outcomes: an updated systematic review. Evid ReportTechnology Assess. 2011;(199):1–941.
    
    
15. 15.Esch T, Mejilla R, Anselmo M, Podtschaske B, Delbanco T, Walker J. Engaging patients through open notes: an evaluation using mixed methods. BMJ Open. 2016;6(1):e010034. doi:10.1136/bmjopen-2015-010034
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiYm1qb3BlbiI7czo1OiJyZXNpZCI7czoxMToiNi8xL2UwMTAwMzQiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMy8wNi8wNy8yMDIzLjA2LjA0LjIzMjkwNzg2LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 

16. 16.Kadom N, Tamasi S, Vey BL, et al. Info-RADS: Adding a Message for Patients in Radiology Reports. J Am Coll Radiol. 2021;18(1):128–132. doi:10.1016/j.jacr.2020.09.049
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.jacr.2020.09.049&link_type=DOI) 

17. 17.Panicek DM, Hricak H. How Sure Are You, Doctor? A Standardized Lexicon to Describe the Radiologist’s Level of Certainty. AJR Am J Roentgenol. 2016;207(1):2–3. doi:10.2214/AJR.15.15895
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2214/AJR.15.15895&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27065212&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F06%2F07%2F2023.06.04.23290786.atom) 

18. 18.Vincoff NS, Barish MA, Grimaldi G. The patient-friendly radiology report: history, evolution, challenges and opportunities. Clin Imaging. 2022;89:128–135. doi:10.1016/j.clinimag.2022.06.018
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.clinimag.2022.06.018&link_type=DOI) 

19. 19.Recht MP, Westerhoff M, Doshi AM, et al. Video Radiology Reports: A Valuable Tool to Improve Patient-Centered Radiology. Am J Roentgenol. 2022;219(3):509–519. doi:10.2214/AJR.22.27512
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2214/AJR.22.27512&link_type=DOI) 

20. 20.Lyu Q, Tan J, Zapadka ME, et al. Translating Radiology Reports into Plain Language using ChatGPT and GPT-4 with Prompt Learning: Promising Results, Limitations, and Potential. Published online March 28, 2023. doi:10.48550/arXiv.2303.09038
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.48550/arXiv.2303.09038&link_type=DOI) 

21. 21.Ali SR, Dobbs TD, Hutchings HA, Whitaker IS. Using ChatGPT to write patient clinic letters. Lancet Digit Health. 2023;5(4):e179–e181. doi:10.1016/S2589-7500(23)00048-1
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S2589-7500(23)00048-1&link_type=DOI) 

22. 22.Johnson, Alistair, Pollard, Tom, Mark, Roger. MIMIC-III Clinical Database. Published online September 4, 2016. doi:10.13026/C2XW26
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.13026/C2XW26&link_type=DOI) 

23. 23.Johnson AEW, Pollard TJ, Shen L, et al. MIMIC-III, a freely accessible critical care database. Sci Data. 2016;3(1):160035. doi:10.1038/sdata.2016.35
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/sdata.2016.35&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27219127&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F06%2F07%2F2023.06.04.23290786.atom) 

24. 24.Ahn S. The impending impacts of large language models on medical education. Korean J Med Educ. 2023;35(1):103–107. doi:10.3946/kjme.2023.253
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.3946/kjme.2023.253&link_type=DOI) 

25. 25.Pearson K, Ngo S, Ekpo E, et al. Online Patient Education Materials Related to Lipoprotein(a): Readability Assessment. J Med Internet Res. 2022;24(1):e31284. doi:10.2196/31284
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2196/31284&link_type=DOI) 

26. 26.Rodriguez F, Ngo S, Baird G, Balla S, Miles R, Garg M. Readability of Online Patient Educational Materials for Coronary Artery Calcium Scans and Implications for Health Disparities. J Am Heart Assoc. 2020;9(18):e017372. doi:10.1161/JAHA.120.017372
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1161/JAHA.120.017372&link_type=DOI) 

27. 27.Chen W, Durkin C, Huang Y, Adler B, Rust S, Lin S. Simplified Readability Metric Drives Improvement of Radiology Reports: an Experiment on Ultrasound Reports at a Pediatric Hospital. J Digit Imaging. 2017;30(6):710–717. doi:10.1007/s10278-017-9972-7
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s10278-017-9972-7&link_type=DOI) 

28. 28.Habeeb A. How readable and reliable is online patient information on chronic rhinosinusitis? J Laryngol Otol. 2021;135(7):644–647. doi:10.1017/S0022215121001559
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1017/S0022215121001559&link_type=DOI) 

29. 29.Kincaid JP,  Fishburne Jr, Robert P. R, Richard L. C, Brad S. Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel: Defense Technical Information Center; 1975. doi:10.21236/ADA006655
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.21236/ADA006655&link_type=DOI) 

30. 30.Coleman M, Liau TL. A computer readability formula designed for machine scoring. J Appl Psychol. 1975;60:283–284. doi:10.1037/h0076540
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1037/h0076540&link_type=DOI) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1975W086900026&link_type=ISI) 

31. 31.Sare A, Patel A, Kothari P, Kumar A, Patel N, Shukla PA. Readability Assessment of Internet-based Patient Education Materials Related to Treatment Options for Benign Prostatic Hyperplasia. Acad Radiol. 2020;27(11):1549–1554. doi:10.1016/j.acra.2019.11.020
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.acra.2019.11.020&link_type=DOI) 

32. 32.Weiss BD. Health Literacy and Patient Safety: Help Patients Understand: Manual for Clinicians. AMA Foundation; 2007.
    
    
33. 33.Hansberry DR, Agarwal N, Baker SR. Health literacy and online educational resources: an opportunity to educate patients. AJR Am J Roentgenol. 2015;204(1):111–116. doi:10.2214/AJR.14.13086
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2214/AJR.14.13086&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25539245&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F06%2F07%2F2023.06.04.23290786.atom) 

34. 34.Zhao WX, Zhou K, Li J, et al. A Survey of Large Language Models. Published online May 7, 2023. Accessed May 22, 2023. [http://arxiv.org/abs/2303.18223](http://arxiv.org/abs/2303.18223)
    
    
35. 35.Fan L, Li L, Ma Z, Lee S, Yu H, Hemphill L. A Bibliometric Review of Large Language Models Research from 2017 to 2023. Published online April 3, 2023. Accessed May 22, 2023. [http://arxiv.org/abs/2304.02020](http://arxiv.org/abs/2304.02020)
    
    
36. 36.Lee TC, Staller K, Botoman V, Pathipati MP, Varma S, Kuo B. ChatGPT Answers Common Patient Questions About Colonoscopy. Gastroenterology. Published online May 5, 2023. doi:10.1053/j.gastro.2023.04.033
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1053/j.gastro.2023.04.033&link_type=DOI) 

37. 37.Mezrich JL, Jin G, Lye C, Yousman L, Forman HP. Patient Electronic Access to Final Radiology Reports: What Is the Current Standard of Practice, and Is an Embargo Period Appropriate? Radiology. 2021;300(1):187–189. doi:10.1148/radiol.2021204382
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1148/radiol.2021204382&link_type=DOI) 

38. 38.Landi H. HIMSS23: Epic taps Microsoft to integrate generative AI into EHRs with Stanford, UC San Diego as early adopters. Fierce Healthcare. Published April 17, 2023. Accessed May 22, 2023. [https://www.fiercehealthcare.com/health-tech/himss23-epic-taps-microsoft-integrate-generative-ai-ehrs-stanford-uc-san-diego-early](https://www.fiercehealthcare.com/health-tech/himss23-epic-taps-microsoft-integrate-generative-ai-ehrs-stanford-uc-san-diego-early)
    
    
39. 39.Jeblick K, Schachtner B, Dexl J, et al. ChatGPT Makes Medicine Easy to Swallow: An Exploratory Case Study on Simplified Radiology Reports. Published online December 30, 2022. Accessed May 22, 2023. [http://arxiv.org/abs/2212.14882](http://arxiv.org/abs/2212.14882)
    
    
40. 40.Doshi RH, Bajaj SS, Krumholz HM. ChatGPT: Temptations of Progress. Am J Bioeth. 2023;23(4):6–8. doi:10.1080/15265161.2023.2180110
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1080/15265161.2023.2180110&link_type=DOI)