PT  - JOURNAL ARTICLE
AU  - Nishio, Mizuho
AU  - Matsunaga, Takaaki
AU  - Matsuo, Hidetoshi
AU  - Nogami, Munenobu
AU  - Kurata, Yasuhisa
AU  - Fujimoto, Koji
AU  - Sugiyama, Osamu
AU  - Akashi, Toshiaki
AU  - Aoki, Shigeki
AU  - Murakami, Takamichi
TI  - Fully automatic summarization of radiology reports using natural language processing with language models
AID  - 10.1101/2023.12.01.23299267
DP  - 2023 Jan 01
TA  - medRxiv
PG  - 2023.12.01.23299267
4099  - http://medrxiv.org/content/early/2023/12/02/2023.12.01.23299267.short
4100  - http://medrxiv.org/content/early/2023/12/02/2023.12.01.23299267.full
AB  - Natural language processing using language models has yielded promising results in various fields. The use of language models may help improve the workflow of radiologists. This retrospective study aimed to construct and evaluate language models for the automatic summarization of radiology reports. Two datasets of radiology reports were used: MIMIC-CXR and the Japan Medical Image Database (JMID). MIMIC-CXR is an open dataset comprising chest radiograph reports. JMID is a large dataset of CT and MRI reports comprising reports from 10 academic medical centers in Japan. A total of 128,032 and 1,101,271 reports from the MIMIC-CXR and JMID, respectively, were included in this study. Four Text-to-Text Transfer Transformer (T5) models were constructed. Recall-Oriented Understudy for Gisting Evaluation (ROUGE), a quantitative metric, was used to evaluate the quality of text summarized from 19,205 and 58,043 test sets from MIMIC-CXR and JMID, respectively. The Wilcoxon signed-rank test was utilized to evaluate the differences among the ROUGE values of the four T5 models. In addition, subsets of automatically summarized text in the test sets were manually evaluated by two radiologists. Based on the Wilcoxon signed-rank test, the best T5 models were selected for the automatic summarization. The quantitative metrics of the best T5 models were as follows: ROUGE-1 = 57.75 ± 30.99, ROUGE-2 = 49.96 ± 35.36, and ROUGE-L = 54.07 ± 32.48 in MIMIC-CXR; ROUGE-1 = 50.00 ± 29.24, ROUGE-2 = 39.66 ± 30.21, and ROUGE-L = 47.87 ± 29.44 in JMID. The radiologists’ evaluations revealed that 86% (86/100) and 85% (85/100) of the texts automatically summarized from MIMIC-CXR and JMID, respectively, were clinically useful. The T5 models constructed in this study were capable of automatic summarization of radiology reports. The radiologists’ evaluations revealed that most of the automatically summarized texts were clinically valuable.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis work was supported by JSPS KAKENHI (Grant Numbers: 22K07665, 23K07154, 23K17229, and 23KK0148).Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:IRB of the Japan Medical Image Database (JMID) project and Kobe University Hospital ethical approval for this work. The requirement for informed consent was waived.I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesAll data produced in the present study are available upon reasonable request to the corresponding author.CXRChest x-rayJMIDJapan Medical Image DatabaseNLPNatural language processingROUGERecall-Oriented Understudy for Gisting EvaluationT5Text-to-Text Transfer Transformer