AI Article Synopsis

  • Lasting scars like keloids and hypertrophic scars impact patients' quality of life but are often misdiagnosed due to complicated diagnostic criteria.
  • This study assessed the diagnostic accuracy of five AI chatbots, including ChatGPT-4, in interpreting clinical images of different scar types.
  • Results showed that GPT-4 outperformed Bing Chat in accuracy, particularly for keloids and hypertrophic scars, but concluded that current AI technology still requires improvement before it can be reliably used in clinical settings.

Article Abstract

Background: Lasting scars such as keloids and hypertrophic scars adversely affect a patient's quality of life. However, these scars are frequently underdiagnosed because of the complexity of the current diagnostic criteria and classification systems. This study aimed to explore the application of Large Language Models (LLMs) such as ChatGPT in diagnosing scar conditions and to propose a more accessible and straightforward diagnostic approach.

Methods: In this study, five artificial intelligence (AI) chatbots, including ChatGPT-4 (GPT-4), Bing Chat (Precise, Balanced, and Creative modes), and Bard, were evaluated for their ability to interpret clinical scar images using a standardized set of prompts. Thirty mock images of various scar types were analyzed, and each chatbot was queried five times to assess the diagnostic accuracy.

Results: GPT-4 had a significantly higher accuracy rate in diagnosing scars than Bing Chat. The overall accuracy rates of GPT-4 and Bing Chat were 36.0% and 22.0%, respectively (P = 0.027), with GPT-4 showing better performance in terms of specificity for keloids (0.6 vs. 0.006) and hypertrophic scars (0.72 vs. 0.0) than Bing Chat.

Conclusions: Although currently available LLMs show potential for use in scar diagnostics, the current technology is still under development and is not yet sufficient for clinical application standards, highlighting the need for further advancements in AI for more accurate medical diagnostics.

Level Of Evidence Iv: This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online instructions to authors www.springer.com/00266 .

Download full-text PDF

Source
http://dx.doi.org/10.1007/s00266-024-04380-9DOI Listing

Publication Analysis

Top Keywords

hypertrophic scars
12
bing chat
12
artificial intelligence
8
gpt-4 bing
8
scars
6
potential chat-based
4
chat-based artificial
4
intelligence models
4
models differentiating
4
differentiating keloid
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!