PT  - JOURNAL ARTICLE
AU  - Atsukawa, Natsuko
AU  - Tatekawa, Hiroyuki
AU  - Oura, Tatsushi
AU  - Matsushita, Shu
AU  - Horiuchi, Daisuke
AU  - Takita, Hirotaka
AU  - Mitsuyama, Yasuhito
AU  - Omori, Ayako
AU  - Shimono, Taro
AU  - Miki, Yukio
AU  - Ueda, Daiju
TI  - Evaluation of Radiology Residents’ Reporting Skills Using Large Language Models: An Observational Study
AID  - 10.1101/2024.11.06.24316838
DP  - 2024 Jan 01
TA  - medRxiv
PG  - 2024.11.06.24316838
4099  - http://medrxiv.org/content/early/2024/11/06/2024.11.06.24316838.short
4100  - http://medrxiv.org/content/early/2024/11/06/2024.11.06.24316838.full
AB  - Background Large language models (LLMs) have the potential to objectively evaluate radiology resident reports; however, research on their use for feedback in radiology training and assessment of resident skill development remains limited.Purpose This study aimed to assess the effectiveness of LLMs in revising radiology reports by comparing them with reports verified by board-certified radiologists and to analyze the progression of resident’s reporting skills over time.Materials and methods To identify the LLM that best aligned with human radiologists, 100 reports were randomly selected from a total of 7376 reports authored by nine first-year radiology residents. The reports were evaluated based on six criteria: (1) Addition of missing positive findings, (2) Deletion of findings, (3) Addition of negative findings, (4) Correction of the expression of findings, (5) Correction of the diagnosis, and (6) Proposal of additional examinations or treatments. Reports were segmented into four time-based terms, and 900 reports (450 CT and 450 MRI) were randomly chosen from the initial and final terms of the residents’ first year. The revised rates for each criterion were compared between the first and last terms using the Wilcoxon Signed-Rank test.Results Among the LLMs tested, GPT-4o demonstrated the highest level of agreement with board-certified radiologists. Significant improvements were noted in Criteria 1–3 when comparing reports from the first and last terms (all P &amp;lt; 0.023) using GPT-4o. In contrast, no significant changes were observed for Criteria 4–6. Despite this, all criteria except for Criterion 6 showed progressive enhancement over time.Conclusion LLMs can effectively provide feedback on commonly corrected areas in radiology reports, enabling residents to objectively identify and improve their weaknesses and monitor their progress. Additionally, LLMs may help reduce the workload of radiologists’ mentors.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis study was supported by Guerbet and Iida Group Holdings.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:IRB of Osaka metropolitan university gave ethical approval for this workI confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesData supporting the findings of this study are available upon request from the corresponding author.