AI Article Synopsis

  • Emergency physicians require diverse skills to handle various medical emergencies, and the effectiveness of large language models (LLMs) like ChatGPT in this field is still being explored.
  • A study tested ChatGPT's performance on board certification exam questions from the Japanese Association of Acute Medicine over five years, showing it achieved a correct response rate of 62.3% overall, with better accuracy on scenario-based questions.
  • Despite the satisfactory performance, the high rate of factual errors highlights the crucial need for oversight by qualified physicians when using AI tools in emergency medicine.

Article Abstract

Background: Emergency physicians need a broad range of knowledge and skills to address critical medical, traumatic, and environmental conditions. Artificial intelligence (AI), including large language models (LLMs), has potential applications in healthcare settings; however, the performance of LLMs in emergency medicine remains unclear.

Methods: To evaluate the reliability of information provided by ChatGPT, an LLM was given the questions set by the Japanese Association of Acute Medicine in its board certification examinations over a period of 5 years (2018-2022) and programmed to answer them twice. Statistical analysis was used to assess agreement of the two responses.

Results: The LLM successfully answered 465 of the 475 text-based questions, achieving an overall correct response rate of 62.3%. For questions without images, the rate of correct answers was 65.9%. For questions with images that were not explained to the LLM, the rate of correct answers was only 52.0%. The annual rates of correct answers to questions without images ranged from 56.3% to 78.8%. Accuracy was better for scenario-based questions (69.1%) than for stand-alone questions (62.1%). Agreement between the two responses was substantial (kappa = 0.70). Factual error accounted for 82% of the incorrectly answered questions.

Conclusion: An LLM performed satisfactorily on an emergency medicine board certification examination in Japanese and without images. However, factual errors in the responses highlight the need for physician oversight when using LLMs.

Download full-text PDF

Source
http://dx.doi.org/10.1272/jnms.JNMS.2024_91-205DOI Listing

Publication Analysis

Top Keywords

emergency medicine
12
medicine board
12
board certification
12
questions images
12
correct answers
12
large language
8
certification examinations
8
rate correct
8
questions
7
performance large
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!