Abstract
Natural language processing using language models has yielded promising results in various fields. The use of language models may help improve the workflow of radiologists. This retrospective study aimed to construct and evaluate language models for the automatic summarization of radiology reports. Two datasets of radiology reports were used: MIMIC-CXR and the Japan Medical Image Database (JMID). MIMIC-CXR is an open dataset comprising chest radiograph reports. JMID is a large dataset of CT and MRI reports comprising reports from 10 academic medical centers in Japan. A total of 128,032 and 1,101,271 reports from the MIMIC-CXR and JMID, respectively, were included in this study. Four Text-to-Text Transfer Transformer (T5) models were constructed. Recall-Oriented Understudy for Gisting Evaluation (ROUGE), a quantitative metric, was used to evaluate the quality of text summarized from 19,205 and 58,043 test sets from MIMIC-CXR and JMID, respectively. The Wilcoxon signed-rank test was utilized to evaluate the differences among the ROUGE values of the four T5 models. In addition, subsets of automatically summarized text in the test sets were manually evaluated by two radiologists. Based on the Wilcoxon signed-rank test, the best T5 models were selected for the automatic summarization. The quantitative metrics of the best T5 models were as follows: ROUGE-1 = 57.75 ± 30.99, ROUGE-2 = 49.96 ± 35.36, and ROUGE-L = 54.07 ± 32.48 in MIMIC-CXR; ROUGE-1 = 50.00 ± 29.24, ROUGE-2 = 39.66 ± 30.21, and ROUGE-L = 47.87 ± 29.44 in JMID. The radiologists’ evaluations revealed that 86% (86/100) and 85% (85/100) of the texts automatically summarized from MIMIC-CXR and JMID, respectively, were clinically useful. The T5 models constructed in this study were capable of automatic summarization of radiology reports. The radiologists’ evaluations revealed that most of the automatically summarized texts were clinically valuable.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This work was supported by JSPS KAKENHI (Grant Numbers: 22K07665, 23K07154, 23K17229, and 23KK0148).
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
IRB of the Japan Medical Image Database (JMID) project and Kobe University Hospital ethical approval for this work. The requirement for informed consent was waived.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
Funding information This work was supported by JSPS KAKENHI (Grant Numbers: 22K07665, 23K07154, 23K17229, and 23KK0148).
Data Availability
All data produced in the present study are available upon reasonable request to the corresponding author.
Abbreviations
- CXR
- Chest x-ray
- JMID
- Japan Medical Image Database
- NLP
- Natural language processing
- ROUGE
- Recall-Oriented Understudy for Gisting Evaluation
- T5
- Text-to-Text Transfer Transformer