Abstract
Purpose Chain-of-thought prompting is a method to make a Large Language Model (LLM) generate intermediate reasoning steps when solving a complex problem to increase its performance. OpenAI’s o1 preview is an LLM that has been trained with reinforcement learning to create such a chain-of-thought internally, prior to giving a response and has been claimed to surpass various benchmarks requiring complex reasoning. The purpose of this study was to evaluate its performance for text mining in oncology.
Methods Six hundred trials from high-impact medical journals were classified depending on whether they allowed for the inclusion of patients with localized and/or metastatic disease.
GPT–4o and o1 preview were instructed to do the same classification based on the publications’ abstracts.
Results For predicting whether patients with localized disease were enrolled, GPT-4o and o1 preview achieved F1 scores of 0.80 (0.76 - 0.83) and 0.91 (0.89 - 0.94), respectively. For predicting whether patients with metastatic disease were enrolled, GPT-4o and o1 preview achieved F1 scores of 0.97 (0.95 - 0.98) and 0.99 (0.99 - 1.00), respectively.
Conclusion o1 preview outperformed GPT-4o for extracting if people with localized and or metastatic disease were eligible for a trial from its abstract. o1 previews’s performance was close to human annotation but could still be improved when dealing with cancer screening and prevention trials as well as by adhering to the desired output format. While research on additional tasks is necessary, it is likely that reasoning models could become the new state of the art for text mining in oncology and various other tasks in medicine.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
No funding was received for this project.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
Ethics approval and consent to participate: Not applicable
Availability of data and materials: All data and code used to obtain this study’s results have been uploaded to https://github.com/windisch-paul/o1. The data has also been uploaded to Dryad (reviewer link: http://datadryad.org/stash/share/2xRq2jz0e8dbp_EU2KsZVeLFARb-j0C0YZI60wBD7qk).
Competing interests: The authors declare no conflict of interest.
Funding: No funding was received for this project.
All authors read and approved the final manuscript.
Data Availability
All data and code used to obtain this study's results have been uploaded to https://github.com/windisch-paul/o1. The data has also been uploaded to Dryad but is private while the paper is undergoing review.