Introduction The latest generation of large language models (LLMs) features multimodal capabilities, allowing them to interpret graphics, images, and videos, which are crucial in medical fields. This study investigates the vision capabilities of the next-generation Generative Pre-trained Transformer 4 (GPT-4) and Google's Gemini. Methods To establish a comparative baseline, we used GPT-3.
View Article and Find Full Text PDF