PT - JOURNAL ARTICLE AU - Newton, Philip M. AU - Summers, Christopher J. AU - Zaheer, Uzman AU - Xiromeriti, Maira AU - Stokes, Jemima R. AU - Bhangu, Jaskaran Singh AU - Roome, Elis G. AU - Roberts-Phillips, Alanna AU - Mazaheri-Asadi, Darius AU - Jones, Cameron D. AU - Hughes, Stuart AU - Gilbert, Dominic AU - Jones, Ewan AU - Essex, Keioni AU - Ellis, Emily C. AU - Davey, Ross AU - Cox, Adrienne A. AU - Bassett, Jessica A. TI - Can ChatGPT-4o really pass medical science exams? A pragmatic analysis using novel questions AID - 10.1101/2024.06.29.24309595 DP - 2024 Jan 01 TA - medRxiv PG - 2024.06.29.24309595 4099 - http://medrxiv.org/content/early/2024/06/30/2024.06.29.24309595.short 4100 - http://medrxiv.org/content/early/2024/06/30/2024.06.29.24309595.full AB - ChatGPT apparently shows excellent performance on high level professional exams such as those involved in medical assessment and licensing. This has raised concerns that ChatGPT could be used for academic misconduct, especially in unproctored online exams. However, ChatGPT has also shown weaker performance on pictures with questions, and there have been concerns that ChatGPT’s performance may be artificially inflated by the public nature of the sample questions tested, meaning they likely formed part of the training materials for ChatGPT. This led to suggestions that cheating could be mitigated by using novel questions for every sitting of an exam, and making extensive use of picture-based questions. These approaches remain untested.Here we tested the performance of ChatGPT-4o on existing medical licensing exams in the UK and USA, and on novel questions based on those exams.ChatGPT-4o scored 94% on the United Kingdom Medical Licensing Exam Applied Knowledge Test, and 89.9% on the United States Medical Licensing Exam Step 1. Performance was not diminished when the questions were rewritten into novel versions, or on completely novel questions which were not based on any existing questions. ChatGPT did show a slightly reduced performance on questions containing images, particularly when the answer options were added to an image as text labels.These data demonstrate that the performance of ChatGPT continues to improve and that online unproctored exams are an invalid form of assessment of the foundational knowledge needed for higher order learning.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis study did not receive any fundingAuthor DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesI confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesAll data produced in the present work are contained in the manuscript