RT Journal Article
SR Electronic
T1 Comparison of the Diagnostic Performance from Patient’s Medical History and Imaging Findings between GPT-4 based ChatGPT and Radiologists in Challenging Neuroradiology Cases
JF medRxiv
FD Cold Spring Harbor Laboratory Press
SP 2023.08.28.23294607
DO 10.1101/2023.08.28.23294607
A1 Horiuchi, Daisuke
A1 Tatekawa, Hiroyuki
A1 Oura, Tatsushi
A1 Oue, Satoshi
A1 Walston, Shannon L
A1 Takita, Hirotaka
A1 Matsushita, Shu
A1 Mitsuyama, Yasuhito
A1 Shimono, Taro
A1 Miki, Yukio
A1 Ueda, Daiju
YR 2023
UL http://medrxiv.org/content/early/2023/08/29/2023.08.28.23294607.abstract
AB Purpose To compare the diagnostic performance between Chat Generative Pre-trained Transformer (ChatGPT), based on the GPT-4 architecture, and radiologists from patient’s medical history and imaging findings in challenging neuroradiology cases.Methods We collected 30 consecutive “Freiburg Neuropathology Case Conference” cases from the journal Clinical Neuroradiology between March 2016 and June 2023. GPT-4 based ChatGPT generated diagnoses from the patient’s provided medical history and imaging findings for each case, and the diagnostic accuracy rate was determined based on the published ground truth. Three radiologists with different levels of experience (2, 4, and 7 years of experience, respectively) independently reviewed all the cases based on the patient’s provided medical history and imaging findings, and the diagnostic accuracy rates were evaluated. The Chi-square tests were performed to compare the diagnostic accuracy rates between ChatGPT and each radiologist.Results ChatGPT achieved an accuracy rate of 23% (7/30 cases). Radiologists achieved the following accuracy rates: a junior radiology resident had 27% (8/30) accuracy, a senior radiology resident had 30% (9/30) accuracy, and a board-certified radiologist had 47% (14/30) accuracy. ChatGPT’s diagnostic accuracy rate was lower than that of each radiologist, although the difference was not significant (p = 0.99, 0.77, and 0.10, respectively).Conclusion The diagnostic performance of GPT-4 based ChatGPT did not reach the performance level of either junior/senior radiology residents or board-certified radiologists in challenging neuroradiology cases. While ChatGPT holds great promise in the field of neuroradiology, radiologists should be aware of its current performance and limitations for optimal utilization.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis study did not receive any fundingAuthor DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:N/AI confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesAll data produced in the present work are contained in the manuscript.