PT - JOURNAL ARTICLE AU - Wrightson, J.G. AU - Blazey, P. AU - Moher, D. AU - Khan, K.M. AU - Ardern, C.L. TI - GPT for RCTs?: Using AI to determine adherence to reporting guidelines AID - 10.1101/2023.12.14.23299971 DP - 2024 Jan 01 TA - medRxiv PG - 2023.12.14.23299971 4099 - http://medrxiv.org/content/early/2024/05/14/2023.12.14.23299971.short 4100 - http://medrxiv.org/content/early/2024/05/14/2023.12.14.23299971.full AB - Background Adherence to established reporting guidelines can improve clinical trial reporting standards, but attempts to improve adherence have produced mixed results. This exploratory study aimed to determine how accurate a Large Language Model generative AI system (AI-LLM) was for determining reporting guideline compliance in a sample of sports medicine clinical trial reports.Design and Methods This study was an exploratory retrospective data analysis. The OpenAI GPT-4 and Meta LLama2 AI-LLMa were evaluated for their ability to determine reporting guideline adherence in a sample of 113 published sports medicine and exercise science clinical trial reports. For each paper, the GPT-4-Turbo and Llama 2 70B models were prompted to answer a series of nine reporting guideline questions about the text of the article. The GPT-4-Vision model was prompted to answer two additional reporting guideline questions about the participant flow diagram in a subset of articles. The dataset was randomly split (80/20) into a TRAIN and TEST dataset. Hyperparameter and fine-tuning were performed using the TRAIN dataset. The Llama2 model was fine-tuned using the data from the GPT-4-Turbo analysis of the TRAIN dataset. Primary outcome measure: Model performance (F1-score, classification accuracy) was assessed using the TEST dataset.Results Across all questions about the article text, the GPT-4-Turbo AI-LLM demonstrated acceptable performance (F1-score = 0.89, accuracy[95% CI] = 90%[85-94%]). Accuracy for all reporting guidelines was > 80%. The Llama2 model accuracy was initially poor (F1-score = 0.63, accuracy[95%CI] = 64%[57-71%]), and improved with fine-tuning (F1-score = 0.84, accuracy[95%CI] = 83%[77-88%]). The GPT-4-Vision model accurately identified all participant flow diagrams (accuracy[95% CI] = 100%[89-100%]) but was less accurate at identifying when details were missing from the flow diagram (accuracy[95% CI] = 57%[39-73%]).Conclusions Both the GPT-4 and fine-tuned Llama2 AI-LLMs showed promise as tools for assessing reporting guideline compliance. Next steps should include developing an efficent, open-source AI-LLM and exploring methods to improve model accuracy.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis work was supported by a CIHR Research Operating Grant (Scientific Directors) held by Karim Khan. The funder had no role in the design and conduct of the study.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesI confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesCode notebooks will be available on the Open Science Framework (https://tinyurl.com/gpt4rctdata) upon publication. Copyright issues prevent us from sharing the text extracted from the papers used in this analysis; however, details of the steps needed to reproduce the extracted text from open and closed papers can be found within these notebooks. https://tinyurl.com/gpt4rctdata