Abstract
Background The integration of artificial intelligence (AI) in dermatology presents a promising frontier for enhancing diagnostic accuracy and treatment planning. However, general purpose AI models require rigorous evaluation before being applied to real-world medical cases.
Objective This project specifically evaluates GPT-4V’s performance in accurately diagnosing and generating treatment plans for common dermatological conditions, comparing its assessment of textual versus image data and its performance with multimodal inputs. Beyond the immediate scope, this study contributes to the broader trajectory of integrating AI in healthcare, highlighting the limitations of these technologies, as well as their potential to enhance efficiency, and education within medical training and practice.
Methods A dataset of 102 images representing nine common dermatological conditions was compiled from open-access websites. Fifty-four images were ultimately selected by two board-certified dermatologists as being representative and typical of the common conditions. Additionally, nine clinical scenarios corresponding to these conditions were developed. GPT-4V’s diagnostic capabilities were assessed in three setups: Image Prompt (image-based), Scenario Prompt (text-based), and Image and Scenario Prompt (combining both modalities). The model’s performance was evaluated based on diagnostic accuracy, differential diagnosis, and treatment recommendations.
Results In the Image Prompt setup, GPT-4V correctly identified the primary diagnosis for 29 of 54 images. The Scenario Prompt setup showed a higher accuracy rate of 89% in identifying the primary diagnosis. The multimodal Image and Scenario Prompt setup also achieved an 89% accuracy rate. However, a notable bias towards textual data over visual data was observed. Treatment recommendations were evaluated by the same two dermatologists, using a modified Entrustment Scale, showing competent but not expert-level performance.
Conclusion GPT-4V demonstrates promising capabilities in dermatological diagnosis and treatment recommendations, particularly in text-based scenarios. However, its performance in image-based diagnosis and integration of multimodal data highlights areas for improvement. The study underscores the potential of AI in augmenting dermatological practice, emphasizing the need for further development, and fine-tuning of such models to ensure their efficacy and reliability in clinical settings.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This study did not receive any funding
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The images used in this study can be accessed on dermnet.nz and dermatlas.org.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
The images used in this study can be accessed on dermnet.nz and dermatlas.org. Additional data collected for this study is included in the supplementary materials. This includes the case scenarios created for each dermatological condition, the differential diagnosis breakdown for the image-only analysis, and the top 3 treatment recommendations per scenario in the combined image and scenario analysis. Responses from GPT-4 can be provided upon request on a case-by-case basis.
Abbreviations
- AI
- Artificial Intelligence
- LLM
- Large Language Model
- GPT
- Generative Pre-trained Transformer