Abstract
Background Artificial intelligence (AI) can greatly enhance efficiency in systematic literature reviews and meta-analyses, but its accuracy in screening titles/abstracts and full-text articles is uncertain.
Objectives This study evaluated the performance metrics (sensitivity, specificity) of a GPT-4 AI program, Review Copilot, against human decisions (gold standard) in screening titles/abstracts and full-text articles from four published systematic reviews/meta-analyses.
Research Design Participant data from four already-published systematic literature reviews were used for this validation study. This was a study comparing Review Copilot to human decision-making (gold standard) in screening titles/abstracts and full-text articles for systematic reviews/meta-analyses. The four studies that were used in this study included observational studies and randomized control trials. Review Copilot operates on the OpenAI, GPT-4 server. We examined the performance metrics of Review Copilot to include and exclude titles/abstracts and full-text articles as compared to human decisions in four systematic reviews/meta-analyses. Sensitivity, specificity, and balanced accuracy of title/abstract and full-text screening were compared between Review Copilot and human decisions.
Results Review Copilot’s sensitivity and specificity for title/abstract screening were 99.2% and 83.6%, respectively, and 97.6% and 47.4% for full-text screening. The average agreement between two runs was 95.4%, with a kappa statistic of 0.83. Review Copilot screened in one-quarter of the time compared to humans.
Conclusions AI use in systematic reviews and meta-analyses is inevitable. Health researchers must understand these technologies’ strengths and limitations to ethically leverage them for research efficiency and evidence-based decision-making in health.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This study received no funding outside the authors' regular salaries.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
The Chan Zuckerberg Initiative, Cold Spring Harbor Laboratory, the Sergey Brin Family Foundation, California Institute of Technology, Centre National de la Recherche Scientifique, Fred Hutchinson Cancer Center, Imperial College London, Massachusetts Institute of Technology, Stanford University, University of Washington, and Vrije Universiteit Amsterdam.