Abstract
Background Large language models (LLMs) have the potential to objectively evaluate radiology resident reports; however, research on their use for feedback in radiology training and assessment of resident skill development remains limited.
Purpose This study aimed to assess the effectiveness of LLMs in revising radiology reports by comparing them with reports verified by board-certified radiologists and to analyze the progression of resident’s reporting skills over time.
Materials and methods To identify the LLM that best aligned with human radiologists, 100 reports were randomly selected from a total of 7376 reports authored by nine first-year radiology residents. The reports were evaluated based on six criteria: (1) Addition of missing positive findings, (2) Deletion of findings, (3) Addition of negative findings, (4) Correction of the expression of findings, (5) Correction of the diagnosis, and (6) Proposal of additional examinations or treatments. Reports were segmented into four time-based terms, and 900 reports (450 CT and 450 MRI) were randomly chosen from the initial and final terms of the residents’ first year. The revised rates for each criterion were compared between the first and last terms using the Wilcoxon Signed-Rank test.
Results Among the LLMs tested, GPT-4o demonstrated the highest level of agreement with board-certified radiologists. Significant improvements were noted in Criteria 1–3 when comparing reports from the first and last terms (all P < 0.023) using GPT-4o. In contrast, no significant changes were observed for Criteria 4–6. Despite this, all criteria except for Criterion 6 showed progressive enhancement over time.
Conclusion LLMs can effectively provide feedback on commonly corrected areas in radiology reports, enabling residents to objectively identify and improve their weaknesses and monitor their progress. Additionally, LLMs may help reduce the workload of radiologists’ mentors.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This study was supported by Guerbet and Iida Group Holdings.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
IRB of Osaka metropolitan university gave ethical approval for this work
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
Grant support This study was supported by Guerbet and Iida Group Holdings.
Conflict of interest The authors declare no conflict of interest.
Data Availability
Data supporting the findings of this study are available upon request from the corresponding author.