Introduction: Large learning models (LLMs) such as GPT are advanced artificial intelligence (AI) models. Originally developed for natural language processing, they have been adapted for multi-modal tasks with vision-language input. One clinically relevant task is scoring the Boston Bowel Preparation Scale (BBPS). While traditional AI techniques use large amounts of data for training, we hypothesise that vision-language LLM can perform this task with fewer examples.
Methods: We used the GPT4V vision-language LLM developed by OpenAI, via the OpenAI application programming interface. A standardised prompt instructed the model to grade BBPS with contextual references extracted from the original paper describing the BBPS by Lai (GIE 2009). Performance was tested on the HyperKvasir dataset, an open dataset for automated BBPS grading.
Results: Of 1794 images, GPT4V returned valid results for 1772 (98%). It had an accuracy of 0.84 for two-class classification (BBPS 0-1 vs 2-3) and 0.74 for four-class classification (BBPS 0, 1, 2, 3). Macro-averaged F1 scores were 0.81 and 0.63, respectively. Qualitatively, most errors arose from misclassification of BBPS 1 as 2. These results compare favourably with current methods using large amounts of training data, which achieve an accuracy in the range of 0.8-0.9.
Conclusion: This study provides proof-of-concept that a vision-language LLM is able to perform BBPS classification accurately, without large training datasets. This represents a paradigm shift in AI classification methods in medicine, where many diseases lack sufficient data to train traditional AI models. An LLM with appropriate examples may be used in such cases.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11881179 | PMC |
http://dx.doi.org/10.1136/bmjgast-2024-001496 | DOI Listing |
BMJ Open Gastroenterol
March 2025
Dept of Gastroenterology and Hepatology, Singapore General Hospital, Singapore.
Introduction: Large learning models (LLMs) such as GPT are advanced artificial intelligence (AI) models. Originally developed for natural language processing, they have been adapted for multi-modal tasks with vision-language input. One clinically relevant task is scoring the Boston Bowel Preparation Scale (BBPS).
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
April 2025
The rising importance of 3D representation learning, pivotal in computer vision, autonomous driving, and robotics, is evident. However, a prevailing trend, which straightforwardly resorted to transferring 2D alignment strategies to the 3D domain, encounters three distinct challenges: (1) Information Degradation: This arises from the alignment of 3D data with mere single-view 2D images and generic texts, neglecting the need for multi-view images and detailed subcategory texts. (2) Insufficient Synergy: These strategies align 3D representations to image and text features individually, hampering the overall optimization for 3D models.
View Article and Find Full Text PDFJ Surg Educ
April 2025
Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany; Medical Oncology, National Center for Tumor Diseases, University Hospital Heidelberg, Heidelberg, Germany; Department of Medicine I, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany.
Objective: Recent studies investigated the potential of large language models (LLMs) for clinical decision making and answering exam questions based on text input. Recent developments of LLMs have extended these models with vision capabilities. These image processing LLMs are called vision-language models (VLMs).
View Article and Find Full Text PDFNeural Netw
April 2025
Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, Shanghai, China. Electronic address:
Recently, the field of multimodal large language models (MLLMs) has grown rapidly, with many Large Vision-Language Models (LVLMs) relying on sequential visual representations. In these models, images are broken down into numerous tokens before being fed into the Large Language Model (LLM) alongside text prompts. However, the opaque nature of these models poses significant challenges to their interpretability, particularly when dealing with complex reasoning tasks.
View Article and Find Full Text PDFCancers (Basel)
December 2024
Department of Radiation Oncology and Molecular Radiation Sciences, Johns Hopkins University, Baltimore, MD 21287, USA.
Background/objectives: Lung cancer is a devastating disease with the highest mortality rate among cancer types. Over 60% of non-small cell lung cancer (NSCLC) patients, accounting for 87% of lung cancer diagnoses, require radiation therapy. Rapid treatment initiation significantly increases the patient's survival rate and reduces the mortality rate.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!