Performance of ChatGPT in Answering Clinical Questions on the Practical Guideline of Blepharoptosis.

Aesthetic Plast Surg

Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan.

Published: July 2024

AI Article Synopsis

  • ChatGPT, an AI language model by OpenAI, was evaluated for its ability to accurately answer clinical questions regarding the management of blepharoptosis, based on guidelines from the American Society of Plastic Surgeons.
  • The study involved analyzing responses from ChatGPT to 11 questions posed in both English and Japanese, revealing a higher accuracy rate for English responses (76.4%) compared to Japanese (46.4%).
  • While ChatGPT shows promise as a helpful tool in medical practice, it has significant limitations and should primarily support, rather than replace, the expertise of healthcare professionals.

Article Abstract

Background: ChatGPT is a free artificial intelligence (AI) language model developed and released by OpenAI in late 2022. This study aimed to evaluate the performance of ChatGPT to accurately answer clinical questions (CQs) on the Guideline for the Management of Blepharoptosis published by the American Society of Plastic Surgeons (ASPS) in 2022.

Methods: CQs in the guideline were used as question sources in both English and Japanese. For each question, ChatGPT provided answers for CQs, evidence quality, recommendation strength, reference match, and answered word counts. We compared the performance of ChatGPT in each component between English and Japanese queries.

Results: A total of 11 questions were included in the final analysis, and ChatGPT answered 61.3% of these correctly. ChatGPT demonstrated a higher accuracy rate in English answers for CQs compared to Japanese answers for CQs (76.4% versus 46.4%; p = 0.004) and word counts (123 words versus 35.9 words; p = 0.004). No statistical differences were noted for evidence quality, recommendation strength, and reference match. A total of 697 references were proposed, but only 216 of them (31.0%) existed.

Conclusions: ChatGPT demonstrates potential as an adjunctive tool in the management of blepharoptosis. However, it is crucial to recognize that the existing AI model has distinct limitations, and its primary role should be to complement the expertise of medical professionals.

Level Of Evidence V: Observational study under respected authorities. This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .

Download full-text PDF

Source
http://dx.doi.org/10.1007/s00266-024-04005-1DOI Listing

Publication Analysis

Top Keywords

performance chatgpt
12
answers cqs
12
clinical questions
8
cqs guideline
8
management blepharoptosis
8
english japanese
8
evidence quality
8
quality recommendation
8
recommendation strength
8
strength reference
8

Similar Publications

The integration of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT-4, is transforming healthcare. ChatGPT's potential to assist in decision-making for complex cases, such as spinal metastasis treatment, is promising but widely untested. Especially in cancer patients who develop spinal metastases, precise and personalized treatment is essential.

View Article and Find Full Text PDF

: The use of artificial intelligence (AI) chatbots for obtaining healthcare advice is greatly increased in the general population. This study assessed the performance of general-purpose AI chatbots in giving nutritional advice for patients with obesity with or without multiple comorbidities. : The case of a 35-year-old male with obesity without comorbidities (Case 1), and the case of a 65-year-old female with obesity, type 2 diabetes mellitus, sarcopenia, and chronic kidney disease (Case 2) were submitted to 10 different AI chatbots on three consecutive days.

View Article and Find Full Text PDF

Recent advancements in large language models (LLMs) like ChatGPT and LLaMA have shown significant potential in medical applications, but their effectiveness is limited by a lack of specialized medical knowledge due to general-domain training. In this study, we developed Me-LLaMA, a new family of open-source medical LLMs that uniquely integrate extensive domain-specific knowledge with robust instruction-following capabilities. Me-LLaMA comprises foundation models (Me-LLaMA 13B and 70B) and their chat-enhanced versions, developed through comprehensive continual pretraining and instruction tuning of LLaMA2 models using both biomedical literature and clinical notes.

View Article and Find Full Text PDF

Objective: Evaluate the accuracy and reliability of various generative artificial intelligence (AI) models (ChatGPT-3.5, ChatGPT-4.0, T5, Llama-2, Mistral-Large, and Claude-3 Opus) in predicting Emergency Severity Index (ESI) levels for pediatric emergency department patients and assess the impact of medically oriented fine-tuning.

View Article and Find Full Text PDF

Background: Providing ongoing support to the increasing number of caregivers as their needs change in the long-term course of dementia is a severe challenge to any health care system. Conversational artificial intelligence (AI) operating 24/7 may help to tackle this problem.

Objective: This study describes the development of a generative AI chatbot-the PDC30 Chatbot-and evaluates its acceptability in a mixed methods study.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!