RT Journal Article
SR Electronic
T1 Evaluating AI Proficiency in Nuclear Cardiology: Large Language Models take on the Board Preparation Exam
JF medRxiv
FD Cold Spring Harbor Laboratory Press
SP 2024.07.16.24310297
DO 10.1101/2024.07.16.24310297
A1 Builoff, Valerie
A1 Shanbhag, Aakash
A1 Miller, Robert JH
A1 Dey, Damini
A1 Liang, Joanna X.
A1 Flood, Kathleen
A1 Bourque, Jamieson M.
A1 Chareonthaitawee, Panithaya
A1 Phillips, Lawrence M.
A1 Slomka, Piotr J
YR 2024
UL http://medrxiv.org/content/early/2024/07/16/2024.07.16.24310297.abstract
AB Background Previous studies evaluated the ability of large language models (LLMs) in medical disciplines; however, few have focused on image analysis, and none specifically on cardiovascular imaging or nuclear cardiology.Objectives This study assesses four LLMs - GPT-4, GPT-4 Turbo, GPT-4omni (GPT-4o) (Open AI), and Gemini (Google Inc.) - in responding to questions from the 2023 American Society of Nuclear Cardiology Board Preparation Exam, reflecting the scope of the Certification Board of Nuclear Cardiology (CBNC) examination.Methods We used 168 questions: 141 text-only and 27 image-based, categorized into four sections mirroring the CBNC exam. Each LLM was presented with the same standardized prompt and applied to each section 30 times to account for stochasticity. Performance over six weeks was assessed for all models except GPT-4o. McNemar’s test compared correct response proportions.Results GPT-4, Gemini, GPT4-Turbo, and GPT-4o correctly answered median percentiles of 56.8% (95% confidence interval 55.4% - 58.0%), 40.5% (39.9% - 42.9%), 60.7% (59.9% - 61.3%) and 63.1% (62.5 – 64.3%) of questions, respectively. GPT4o significantly outperformed other models (p=0.007 vs. GPT-4Turbo, p&lt;0.001 vs. GPT-4 and Gemini). GPT-4o excelled on text-only questions compared to GPT-4, Gemini, and GPT-4 Turbo (p&lt;0.001, p&lt;0.001, and p=0.001), while Gemini performed worse on image-based questions (p&lt;0.001 for all).Conclusion GPT-4o demonstrated superior performance among the four LLMs, achieving scores likely within or just outside the range required to pass a test akin to the CBNC examination. Although improvements in medical image interpretation are needed, GPT-4o shows potential to support physicians in answering text-based clinical questions.Competing Interest StatementRJHM received consulting fees from BMS and Pfizer and research support from Pfizer. KF serves as CEO of ASNC. JMB is a consultant for GE Healthcare. PC is a consultant for Clairo and GE Healthcare, has received speaking/lecture fees from Ionetix and had received royalties from UpToDate and is the President-Elect of ASNC. LMP has served as a consultant for Novo Nordisk and is the President of ASNC. PS participates in software royalties for QPS software at Cedars-Sinai Medical Center, has received research grant support from Siemens Medical Systems, and has received consulting fees from Synektik, SA. The remaining authors declare no competing interests.Funding StatementThis research was supported in part by grant R35HL161195 from the National Heart, Lung, and Blood Institute/ National Institutes of Health (NHLBI/NIH) (PI: Piotr Slomka). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesI confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesAll data produced in the present study are available upon reasonable request to the authorsGPTGenerative Pre-trained TransformerLLMLarge Language ModelCBNCCertification Board of Nuclear CardiologyASNCAmerican Society of Nuclear CardiologySPECTSingle Photon Emission Computed TomographyPETPositron Emission Tomography