Comparing Large Language Model and Human Reader Accuracy with Image Challenge Case Image Inputs.

Radiology

From the Department of Radiology, Research Institute of Radiological Science and Center for Clinical Imaging Data Science, Yonsei University College of Medicine, Seoul, Republic of Korea (P.S.S.); Department of Radiology and Research Institute of Radiology (W.H.S., C.H.S., K.J.P., P.H.K., S.J.C., Y.A., S.P., H.Y.P., N.E.O.), Department of Medical Science, Asan Medical Institute of Convergence Science and Technology (W.H.S., H.H.), and Department of Internal Medicine (C.Y.W.), Asan Medical Center, University of Ulsan College of Medicine, Olympic-ro 33, Songpa-gu, 05505 Seoul, Republic of Korea; University of Ulsan College of Medicine, Seoul, Republic of Korea (M.W.H.); Department of Orthopaedic Surgery, Seoul Seonam Hospital, Republic of Korea (S.T.C.); and Department of Pulmonary and Critical Care Medicine, Gumdan Top Hospital, Incheon, Republic of Korea (H.P.).

Published: December 2024

Background Application of multimodal large language models (LLMs) with both textual and visual capabilities has been steadily increasing, but their ability to interpret radiologic images is still doubted. Purpose To evaluate the accuracy of LLMs and compare it with that of human readers with varying levels of experience and to assess the factors affecting LLM accuracy in answering Image Challenge cases. Materials and Methods Radiologic images of cases from October 13, 2005, to April 18, 2024, were retrospectively reviewed. Using text and image inputs, LLMs (Open AI's GPT-4 Turbo with Vision [GPT-4V] and GPT-4 Omni [GPT-4o], Google's DeepMind Gemini 1.5 Pro, and Anthropic's Claude 3) provided answers. Human readers (seven junior faculty radiologists, two clinicians, one in-training radiologist, and one medical student), blinded to the published answers, also answered. LLM accuracy with and without image inputs and short (cases from 2005 to 2015) versus long text inputs (from 2016 to 2024) was evaluated in subgroup analysis to determine the effect of these factors. Factor analysis was assessed using multivariable logistic regression. Accuracy was compared with generalized estimating equations, with multiple comparisons adjusted by using Bonferroni correction. Results A total of 272 cases were included. GPT-4o achieved the highest overall accuracy among LLMs (59.6%; 162 of 272), outperforming a medical student (47.1%; 128 of 272; < .001) but not junior faculty (80.9%; 220 of 272; < .001) or the in-training radiologist (70.2%; 191 of 272; = .003). GPT-4o exhibited similar accuracy regardless of image inputs (without images vs with images, 54.0% [147 of 272] vs 59.6% [162 of 272], respectively; = .59). Human reader accuracy was unaffected by text length, whereas LLMs demonstrated higher accuracy with long text inputs (all < .001). Text input length affected LLM accuracy (odds ratio range, 3.2 [95% CI: 1.9, 5.5] to 6.6 [95% CI: 3.7, 12.0]). Conclusion LLMs demonstrated substantial accuracy with text and image inputs, outperforming a medical student. However, their accuracy decreased with shorter text lengths, regardless of image input. © RSNA, 2024

Download full-text PDF

Source
http://dx.doi.org/10.1148/radiol.241668DOI Listing

Publication Analysis

Top Keywords

image inputs
20
accuracy
12
accuracy image
12
llm accuracy
12
medical student
12
large language
8
human reader
8
reader accuracy
8
image
8
image challenge
8

Similar Publications

The cortex and cerebellum are densely connected through reciprocal input/output projections that form segregated circuits. These circuits are shown to differentially connect anterior lobules of the cerebellum to sensorimotor regions, and lobules Crus I and II to prefrontal regions. This differential connectivity pattern leads to the hypothesis that individual differences in structure should be related, especially for connected regions.

View Article and Find Full Text PDF

This study evaluates the efficacy of deep learning models in identifying infarct tissue on computed tomography perfusion (CTP) scans from patients with acute ischemic stroke due to large vessel occlusion, specifically addressing the potential influence of varying noise reduction techniques implemented by different vendors. We analyzed CTP scans from 60 patients who underwent mechanical thrombectomy achieving a modified thrombolysis in cerebral infarction (mTICI) score of 2c or 3, ensuring minimal changes in the infarct core between the initial CTP and follow-up MR imaging. Noise reduction techniques, including principal component analysis (PCA), wavelet, non-local means (NLM), and a no denoising approach, were employed to create hemodynamic parameter maps.

View Article and Find Full Text PDF

The Internet of Things (IoT) has recently attracted substantial interest because of its diverse applications. In the agriculture sector, automated methods for detecting plant diseases offer numerous advantages over traditional methods. In the current study, a new model is developed to categorize plant diseases within an IoT network.

View Article and Find Full Text PDF

FAST: Fast, free, consistent, and unsupervised oligodendrocyte segmentation and tracking system.

eNeuro

January 2025

Penn Statistics in Imaging and Visualization Center (PennSIVE), Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, 19104, USA.

To develop reparative therapies for neurological disorders like multiple sclerosis (MS), we need to better understand the physiology of loss and replacement of oligodendrocytes, the cells that make myelin and are the target of damage in MS. In vivo two-photon fluorescence microscopy allows direct visualization of oligodendrocytes in the intact brain of transgenic mouse models, promising a deeper understanding of the longitudinal dynamics of replacing oligodendrocytes after damage. However, the task of tracking the fate of individual oligodendrocytes requires extensive effort for manual annotation and is especially challenging in three-dimensional images.

View Article and Find Full Text PDF

Structure-switchable branched inhibitors regulate the activity of CRISPR-Cas12a for nucleic acid diagnostics.

Anal Chim Acta

January 2025

Department of Laboratory Medicine, Zhongnan Hospital of Wuhan University, Wuhan University, Wuhan, People's Republic of China; Wuhan Research Center for Infectious Diseases and Cancer, Chinese Academy of Medical Sciences, Wuhan, People's Republic of China; Hubei Engineering Center for Infectious Disease Prevention, Control and Treatment, Wuhan, People's Republic of China. Electronic address:

Background: In current years, the CRISPR (clustered regularly interspaced short palindromic repeats) based strategies have emerged as the most promising molecular tool in the field of gene editing, intracellular imaging, transcriptional regulation and biosensing. However, the recent CRISPR-based diagnostic technologies still require the incorporation of other amplification strategies (such as polymerase chain reaction) to improve the cis/trans cleavage activity of Cas12a, which complicates the detection workflow and lack of a uniform compatible system to respond to the target in one pot.

Results: To better fully-functioning CRISPR/Cas12a, we reported a novel technique for straightforward nucleic acid detection by incorporating enzyme-responsive steric hindrance-based branched inhibitors with CRISPR/AsCas12a methodology.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!