PT - JOURNAL ARTICLE AU - Khan, Muhammad Ali AU - Ayub, Umair AU - Naqvi, Syed Arsalan Ahmed AU - Khakwani, Kaneez Zahra Rubab AU - Sipra, Zaryab bin Riaz AU - Raina, Ammad AU - Zou, Sihan AU - He, Huan AU - Hossein, Seyyed Amir AU - Hasan, Bashar AU - Rumble, R. Bryan AU - Bitterman, Danielle S. AU - Warner, Jeremy L. AU - Zou, Jia AU - Tevaarwerk, Amye J. AU - Leventakos, Konstantinos AU - Kehl, Kenneth L. AU - Palmer, Jeanne M. AU - Murad, M. Hassan AU - Baral, Chitta AU - Riaz, Irbaz bin TI - Collaborative Large Language Models for Automated Data Extraction in Living Systematic Reviews AID - 10.1101/2024.09.20.24314108 DP - 2024 Jan 01 TA - medRxiv PG - 2024.09.20.24314108 4099 - http://medrxiv.org/content/early/2024/09/23/2024.09.20.24314108.short 4100 - http://medrxiv.org/content/early/2024/09/23/2024.09.20.24314108.full AB - Objective Data extraction from the published literature is the most laborious step in conducting living systematic reviews (LSRs). We aim to build a generalizable, automated data extraction workflow leveraging large language models (LLMs) that mimics the real-world two-reviewer process.Materials and Methods A dataset of 10 clinical trials (22 publications) from a published LSR was used, focusing on 23 variables related to trial, population, and outcomes data. The dataset was split into prompt development (n=5) and held-out test sets (n=17). GPT-4-turbo and Claude-3-Opus were used for data extraction. Responses from the two LLMs were compared for concordance. In instances with discordance, original responses from each LLM were provided to the other LLM for cross-critique. Evaluation metrics, including accuracy, were used to assess performance against the manually curated gold standard.Results In the prompt development set, 110 (96%) responses were concordant, achieving an accuracy of 0.99 against the gold standard. In the test set, 342 (87%) responses were concordant. The accuracy of the concordant responses was 0.94. The accuracy of the discordant responses was 0.41 for GPT-4-turbo and 0.50 for Claude-3-Opus. Of the 49 discordant responses, 25 (51%) became concordant after cross-critique, with an increase in accuracy to 0.76.Discussion Concordant responses by the LLMs are likely to be accurate. In instances of discordant responses, cross-critique can further increase the accuracy.Conclusion Large language models, when simulated in a collaborative, two-reviewer workflow, can extract data with reasonable performance, enabling truly ‘living’ systematic reviews.Competing Interest StatementCOMPETING INTERESTS Irbaz bin Riaz, Muhammad Ali Khan, Umair Ayub, Syed Arsalan Ahmed Naqvi, Kaneez Zahra Rubab Khakwani, Zaryab bin Riaz Sipra, Ammad Raina, Sihan Zou, Huan He, Seyyed Amir Hossein, Hasan Bashar, R. Bryan Rumble, Jia Zou, Kenneth L. Kehl, Jeanne Palmer, M. Hassan Murad and Chitta Baral do not have any relevant competing interests to disclose. Danielle S. Bitterman (DSB): Editorial, unrelated to the submitted work: Associate Editor of Radiation Oncology, HemOnc.org (no financial compensation); Advisory and consulting, unrelated to the submitted work: MercurialAI Jeremy L. Warner (JLW): Reports funding from AACR, NIH, Brown Physicians Incorporated, unrelated to the submitted work; consulting with Wested and The Lewin Group, unrelated to the submitted work; ownership in HemOnc.org LLC, unrelated to the submitted work Amye J. Tevaarwerk (AJT): Family member at Epic Systems, unrelated to the submitted work Konstantinos Leventakos (KL): Reports consulting activities (honoraria to institution) with Amgen, AstraZeneca Interdisciplinary Corporation, Boehringer Ingelheim Pharmaceuticals, Janssen Biotech, Novartis, unrelated to the submitted work; advisory boards (honoraria to institution) with AstraZeneca, Janssen, Jazz Pharmaceuticals, Mirati Therapeutics, Regeneron, Takeda, and Targeted Oncology, unrelated to the submitted work; CME activities (honoraria to institution) with OncLive and MJH Life Sciences, MD Outlook and Targeted Oncology, unrelated to the submitted work; Research support (to institution) from AstraZeneca and Mirati Therapeutics, unrelated to the submitted work.Funding StatementFUNDING This work was supported by NIH U24 CA265879.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesI confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.Yes