Abstract
Background Large Language Models (LLMs) show promise in medical diagnosis, but their performance varies with prompting. Recent studies suggest that modifying prompts may enhance diagnostic capabilities.
Objective This study aimed to test whether a prompting approach that aligns with general clinical reasoning methodology—specifically, separating processes of summarizing clinical information and making diagnoses based on the summary instead of one-step processing—can enhance LLM’s medical diagnostic capabilities
Methods 322 quiz questions from Radiology’s Diagnosis Please cases (1998-2023) were used. We employed Claude 3.5 Sonnet, a state-of-the-art LLM, to compare three approaches: 1) Conventional zero-shot chain-of-thought prompt, as a baseline, 2) two-step approach: LLM organizes patient history and imaging findings, then provides diagnoses, and 3) Summary-only approach: Using only the LLM-generated summary for diagnoses.
Results The two-step approach significantly outperformed both baseline and summary-only methods in diagnosis accuracy, as determined by McNemar tests. Primary diagnosis accuracy was 60.6% for the two-step approach, compared to 56.5% for baseline (p=0.042) and 56.3% for summary-only (p=0.035). For the top three diagnoses, accuracy was 70.5%, 66.5%, and 65.5% respectively (p=0.005 for baseline, p=0.008 for summary-only). No significant differences were observed between baseline and summary-only approaches.
Conclusion Our results indicate that a structured clinical reasoning approach enhances LLM’s diagnostic accuracy. This method shows potential as a valuable tool for deriving diagnoses from free-text clinical information. The approach aligns well with established clinical reasoning processes, suggesting its potential applicability in real-world clinical settings.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This study did not receive any funding
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
The Chan Zuckerberg Initiative, Cold Spring Harbor Laboratory, the Sergey Brin Family Foundation, California Institute of Technology, Centre National de la Recherche Scientifique, Fred Hutchinson Cancer Center, Imperial College London, Massachusetts Institute of Technology, Stanford University, University of Washington, and Vrije Universiteit Amsterdam.