Objective: To evaluate the performance of large language models (LLMs), specifically Microsoft Copilot, GPT-4 (GPT-4o and GPT-4o mini), and Google Gemini (Gemini and Gemini Advanced), in answering ophthalmological questions and assessing the impact of prompting techniques on their accuracy.
Design: Prospective qualitative study.
Participants: Microsoft Copilot, GPT-4 (GPT-4o and GPT-4o mini), and Google Gemini (Gemini and Gemini Advanced).
Methods: A total of 300 ophthalmological questions from StatPearls were tested, covering a range of subspecialties and image-based tasks. Each question was evaluated using 2 prompting techniques: zero-shot forced prompting (prompt 1) and combined role-based and zero-shot plan-and-solve+ prompting (prompt 2).
Results: With zero-shot forced prompting, GPT-4o demonstrated significantly superior overall performance, correctly answering 72.3% of questions and outperforming all other models, including Copilot (53.7%), GPT-4o mini (62.0%), Gemini (54.3%), and Gemini Advanced (62.0%) (p < 0.0001). Both Copilot and GPT-4o showed notable improvements with Prompt 2 over Prompt 1, elevating Copilot's accuracy from the lowest (53.7%) to the second highest (72.3%) among the evaluated LLMs.
Conclusions: While newer iterations of LLMs, such as GPT-4o and Gemini Advanced, outperformed their less advanced counterparts (GPT-4o mini and Gemini), this study emphasizes the need for caution in clinical applications of these models. The choice of prompting techniques significantly influences performance, highlighting the necessity for further research to refine LLMs capabilities, particularly in visual data interpretation, to ensure their safe integration into medical practice.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.jcjo.2025.01.001 | DOI Listing |
Can J Ophthalmol
January 2025
Faculty of Medicine, University of Montreal, Montreal, QB, Canada; Department of Ophthalmology, Centre Hospitalier de l'Université de Montréal, Montreal, QB, Canada. Electronic address:
Objective: To evaluate the performance of large language models (LLMs), specifically Microsoft Copilot, GPT-4 (GPT-4o and GPT-4o mini), and Google Gemini (Gemini and Gemini Advanced), in answering ophthalmological questions and assessing the impact of prompting techniques on their accuracy.
Design: Prospective qualitative study.
Participants: Microsoft Copilot, GPT-4 (GPT-4o and GPT-4o mini), and Google Gemini (Gemini and Gemini Advanced).
Cureus
December 2024
Department of Radiation Oncology, Cantonal Hospital Winterthur, Winterthur, CHE.
Introduction The application of natural language processing (NLP) for extracting data from biomedical research has gained momentum with the advent of large language models (LLMs). However, the effect of different LLM parameters, such as temperature settings, on biomedical text mining remains underexplored and a consensus on what settings can be considered "safe" is missing. This study evaluates the impact of temperature settings on LLM performance for a named entity recognition and a classification task in clinical trial publications.
View Article and Find Full Text PDFRadiology
January 2025
From the Department of Diagnostic and Interventional Radiology, University Hospital Bonn, Venusberg-Campus 1, 53127 Bonn, Germany.
Background Large-scale secondary use of clinical databases requires automated tools for retrospective extraction of structured content from free-text radiology reports. Purpose To share data and insights on the application of privacy-preserving open-weights large language models (LLMs) for reporting content extraction with comparison to standard rule-based systems and the closed-weights LLMs from OpenAI. Materials and Methods In this retrospective exploratory study conducted between May 2024 and September 2024, zero-shot prompting of 17 open-weights LLMs was preformed.
View Article and Find Full Text PDFTurk J Ophthalmol
December 2024
Mustafa Kemal University, Tayfur Sökmen Faculty of Medicine, Department of Ophthalmology, Hatay, Türkiye.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!