Rationale And Objectives: The expansion of large language models to process images offers new avenues for application in radiology. This study aims to assess the multimodal capabilities of contemporary large language models, which allow analysis of image inputs in addition to textual data, on radiology board-style examination questions with images.
Materials And Methods: 280 questions were retrospectively selected from the AuntMinnie public test bank. The test questions were converted into three formats of prompts; (1) Multimodal, (2) Image-only, and (3) Text-only input. Three models, GPT-4V, Gemini 1.5 Pro, and Claude 3.5 Sonnet, were evaluated using these prompts. The Cochran Q test and pairwise McNemar test were used to compare performances between prompt formats and models.
Results: No difference was found for the performance in terms of % correct answers between the text, image, and multimodal prompt formats for GPT-4V (54%, 52%, and 57%, respectively; p = .31) and Gemini 1.5 Pro (53%, 54%, and 57%, respectively; p = .53). For Claude 3.5 Sonnet, the image input (48%) significantly underperformed compared to the text input (63%, p < .001) and the multimodal input (66%, p < .001), but no difference was found between the text and multimodal inputs (p = .29). Claude significantly outperformed GPT and Gemini in the text and multimodal formats (p < .01).
Conclusion: Vision-capable large language models cannot effectively use images to increase performance on radiology board-style examination questions. When using textual data alone, Claude 3.5 Sonnet outperforms GPT-4V and Gemini 1.5 Pro, highlighting the advancements in the field and its potential for use in further research.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.acra.2024.11.028 | DOI Listing |
JMIR Med Inform
January 2025
Sungkyunkwan University, Seoul, Republic of Korea.
Background: Mental health chatbots have emerged as a promising tool for providing accessible and convenient support to individuals in need. Building on our previous research on digital interventions for loneliness and depression among Korean college students, this study addresses the limitations identified and explores more advanced artificial intelligence-driven solutions.
Objective: This study aimed to develop and evaluate the performance of HoMemeTown Dr.
Rheumatol Int
January 2025
Department of Pediatric Rheumatology, Istanbul Medeniyet University, Istanbul, Turkey.
Chronic non-bacterial osteomyelitis (CNO) is an inflammatory bone disease, usually diagnosed in childhood. It is characterized by the presence of multifocal or unifocal osteolytic lesions that can cause bone pain and soft tissue swelling. CNO is known to have soft tissue involvement.
View Article and Find Full Text PDFBehav Res Methods
January 2025
Max Planck Institute for Software Systems, Saarbrücken, Germany.
Humans perceive discrete events such as "restaurant visits" and "train rides" in their continuous experience. One important prerequisite for studying human event perception is the ability of researchers to quantify when one event ends and another begins. Typically, this information is derived by aggregating behavioral annotations from several observers.
View Article and Find Full Text PDFBrief Bioinform
November 2024
Department of Computer Science, Hunan University, Changsha 410008, China.
Recently, the impressive performance of large language models (LLMs) on a wide range of tasks has attracted an increasing number of attempts to apply LLMs in drug discovery. However, molecule optimization, a critical task in the drug discovery pipeline, is currently an area that has seen little involvement from LLMs. Most of existing approaches focus solely on capturing the underlying patterns in chemical structures provided by the data, without taking advantage of expert feedback.
View Article and Find Full Text PDFAlzheimers Dement
December 2024
Vanderbilt Memory & Alzheimer's Center, Vanderbilt University Medical Center, Nashville, TN, USA.
Background: "SuperAgers" are older adults (ages 80+) whose cognitive performance resembles that of adults in their 50s to mid-60s. Factors underlying their exemplary aging are underexplored in large, racially diverse cohorts. Using eight cohorts, we investigated the frequency of APOE genotypes in SuperAgers compared to middle-aged and older adults.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!