Objective: Recent studies investigated the potential of large language models (LLMs) for clinical decision making and answering exam questions based on text input. Recent developments of LLMs have extended these models with vision capabilities. These image processing LLMs are called vision-language models (VLMs). However, there is limited investigation on the applicability of VLMs and their capabilities of answering exam questions with image content. Therefore, the aim of this study was to examine the performance of publicly accessible LLMs in 2 different surgical question sets consisting of text and image questions.
Design: Original text and image exam questions from 2 different surgical question subsets from the German Medical Licensing Examination (GMLE) and United States Medical Licensing Examination (USMLE) were collected and answered by publicly available LLMs (GPT-4, Claude-3 Sonnet, Gemini-1.5). LLM outputs were benchmarked for their accuracy in answering text and image questions. Additionally, the LLMs' performance was compared to students' performance based on their average historical performance (AHP) in these exams. Moreover, variations of LLM performance were analyzed in relation to question difficulty and respective image type.
Results: Overall, all LLMs achieved scores equivalent to passing grades (≥60%) on surgical text questions across both datasets. On image-based questions, only GPT-4 exceeded the score required to pass, significantly outperforming Claude-3 and Gemini-1.5 (GPT: 78% vs. Claude-3: 58% vs. Gemini-1.5: 57.3%; p < 0.001). Additionally, GPT-4 outperformed students on both text (GPT: 83.7% vs. AHP students: 67.8%; p < 0.001) and image questions (GPT: 78% vs. AHP students: 67.4%; p < 0.001).
Conclusion: GPT-4 demonstrated substantial capabilities in answering surgical text and image exam questions. Therefore, it holds considerable potential for the use in surgical decision making and education of students and trainee surgeons.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.jsurg.2025.103442 | DOI Listing |
Pol Merkur Lekarski
March 2025
DEPARTAMENT OF CLINICAL PHARMACOLOGY, DEPARTAMENT OF PHARMACOLOGY AND TOXICOLOGY, MEDICAL UNIVERSITY OF LODZ, LODZ, POLAND.
Objective: Aim: To gain knowledge about the attitudes of medical students towards people with intellectual disabilities and the impact of psychiatry teaching on changing these attitudes..
Patients And Methods: Materials and Methods: The study involved 106 students of medical faculties who had not yet taken a course in psychiatry and 104 who had completed the course and passed the exam.
PeerJ Comput Sci
February 2025
Educational Technology and Computer Department, Faculty of Specific Education, Kafrelshiekh University, Kafrelshiekh, Egypt.
University examination papers play a crucial role in the institution's quality, impacting the institution's accreditation status. In this context, ensuring the quality of examination papers is paramount. In practice, however, manual assessments are mostly laborious and time-consuming and generally lack consistency.
View Article and Find Full Text PDFUlus Travma Acil Cerrahi Derg
March 2025
Department of Orthopedics and Traumatology, Nişantaşı University, İstanbul-Türkiye.
Background: Artificial intelligence has been shown to achieve successful outcomes in various orthopedic qualification examinations worldwide. This study aims to assess the performance of ChatGPT in the written section of the Turkish Orthopedics and Traumatology Board Examination, compare its results with those of candidates who took the exam, and determine whether ChatGPT is sufficient to achieve a passing score.
Methods: This retrospective observational study evaluated whether ChatGPT achieved a passing grade on 400 publicly available questions from the Turkish orthopedics qualification exam over the past four years.
Cureus
March 2025
Department of Anatomical Sciences, St. George's University School of Medicine, St. George, GRD.
Artificial intelligence (AI) models, like Chat Generative Pre-Trained Transformer (OpenAI, San Francisco, CA), have recently gained significant popularity due to their ability to make autonomous decisions and engage in complex interactions. To fully harness the potential of these learning machines, users must understand their strengths and limitations. As AI tools become increasingly prevalent in our daily lives, it is essential to explore how this technology has been used so far in healthcare and medical education, as well as the areas of medicine where it can be applied.
View Article and Find Full Text PDFJ Dent Educ
March 2025
Department of Restorative Dentistry & Prosthodontics, University of Texas School of Dentistry at Houston, Houston, Texas, USA.
Purpose: This study examined whether peer instruction enhanced the retention of content for learners who participated in a review session utilizing an Audience Response System (ARS).
Methods: Review sessions for two groups of students taking the same course were conducted. Both groups utilized ARS to answer questions presented in the session, while only one group also utilized an educational method known as peer instruction-otherwise, sessions were identical.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!