AI Article Synopsis

  • The study evaluated the performance of multimodal large language models (LLMs), specifically GPT-4o, Claude 3 Opus, and Gemini 1.5 Pro, on questions from the Japanese Nuclear Medicine Board Examination (JNMBE) to see how visual information affects decision-making.
  • 92 questions from recent exams were used, where each model answered questions in both text-and-images and text-only conditions, and their accuracy and agreement rates were statistically assessed.
  • Results showed no significant accuracy differences between the conditions, with most models performing poorly overall, indicating that while image utilization is limited, further improvements are necessary for LLMs to be effective in nuclear medicine.

Article Abstract

Objectives: This study aimed to assess the performance of state-of-the-art multimodal large language models (LLMs), specifically GPT-4o, Claude 3 Opus, and Gemini 1.5 Pro, on Japanese Nuclear Medicine Board Examination (JNMBE) questions and to evaluate the influence of visual information on the decision-making process.

Methods: This study utilized 92 questions with images from the JNMBE (2019-2023). The LLMs' responses were assessed under two conditions: providing both text and images and providing only text. Each model answered all questions thrice, and the most frequent answer choice was considered the final answer. The accuracy and agreement rates among the model answers were evaluated using statistical tests.

Results: GPT-4o, Claude 3 Opus, and Gemini 1.5 Pro exhibited no significant differences in terms of accuracy between the text-and-image and text-only conditions. GPT-4o and Claude 3 Opus demonstrated accuracies of 54.3% (95% CI: 44.2%-64.1%) each when provided with both text and images; however, they selected the same options as in the text-only condition for 71.7% of the questions. Gemini 1.5 Pro performed significantly worse than GPT-4o under text and image conditions. The agreement rates among the model answers ranged from weak to moderate.

Conclusion: The influence of images on decision-making in nuclear medicine is limited to the latest multimodal LLMs, and their diagnostic ability in this highly specialized field remains insufficient. Improving the utilization of image information and enhancing the answer reproducibility are crucial for the effective application of LLMs in nuclear medicine education and practice. Further advancements in these areas are necessary to harness the potential of LLMs as assistants in nuclear medicine diagnosis.

Download full-text PDF

Source
http://dx.doi.org/10.1007/s12149-024-01992-8DOI Listing

Publication Analysis

Top Keywords

nuclear medicine
20
gpt-4o claude
12
claude opus
12
gemini pro
12
multimodal large
8
large language
8
japanese nuclear
8
medicine board
8
board examination
8
opus gemini
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!