Enhancing Large Language Models with Retrieval-augmented Generation: A Radiology-specific Approach.

Radiol Artif Intell

Department of Radiology & Biomedical Imaging, University of California, San Francisco (UCSF), San Francisco, Calif.

Published: March 2025

Retrieval-augmented generation (RAG) is a strategy to improve performance of large language models (LLMs) by providing the LLM with an updated corpus of knowledge that can be used for answer generation in real-time. RAG may improve LLM performance and clinical applicability in radiology by providing citable, up-to-date information without requiring model fine-tuning. In this retrospective study, a radiology-specific RAG was developed using a vector database of 3,689 articles published from January 1999 to December 2023. Performance of 5 LLMs with and without RAG on a 192-question radiology examination was compared. RAG significantly improved examination scores for GPT-4 (81.2% versus 75.5%, = .04) and Command R+ (70.3% versus 62.0%, = .02), but not for Claude Opus, Mixtral, or Gemini 1.5 Pro. RAG-System performed significantly better than pure LLMs on a 24-question subset directly sourced from (85% versus 76%, = .03). The RAG-System retrieved 21/24 (87.5%, < .001) relevant references cited in the examination's answer explanations and successfully cited them in 18/21 (85.7%, < .001) outputs. The results suggest that RAG is a promising approach to enhance LLM capabilities for radiology knowledge tasks, providing transparent, domain-specific information retrieval. ©RSNA, 2025.

Download full-text PDF

Source
http://dx.doi.org/10.1148/ryai.240313DOI Listing

Publication Analysis

Top Keywords

large language
8
language models
8
retrieval-augmented generation
8
rag
6
enhancing large
4
models retrieval-augmented
4
generation radiology-specific
4
radiology-specific approach
4
approach retrieval-augmented
4
generation rag
4

Similar Publications

Purpose: The goal of this study was to examine potential mediators of the relationship between developmental language disorder (DLD) status and executive function performance.

Method: Participants included preschoolers, of whom 80 met the diagnostic criteria for DLD and 103 were categorized as having typical language abilities. Participants' nonverbal IQ and receptive vocabulary were assessed via standardized tests, and their executive function was tested using the Dimensional Change Card Sort.

View Article and Find Full Text PDF

GPT-3.5 Turbo and GPT-4 Turbo in Title and Abstract Screening for Systematic Reviews.

JMIR Med Inform

March 2025

Department of Emergency and Critical Care Medicine, Chiba University Graduate School of Medicine, 1-8-1 Inohana, Chuo, Chiba, 260-8677, Japan, 81 432262372.

This study demonstrated that while GPT-4 Turbo had superior specificity when compared to GPT-3.5 Turbo (0.98 vs 0.

View Article and Find Full Text PDF

Background: Conversational artificial intelligence (AI) allows for engaging interactions, however, its acceptability, barriers, and enablers to support patients with atrial fibrillation (AF) are unknown.

Objective: This work stems from the Coordinating Health care with AI-supported Technology for patients with AF (CHAT-AF) trial and aims to explore patient perspectives on receiving support from a conversational AI support program.

Methods: Patients with AF recruited for a randomized controlled trial who received the intervention were approached for semistructured interviews using purposive sampling.

View Article and Find Full Text PDF

BackgroundLarge language models (LLMs) are advanced tools capable of understanding and generating human-like text. This study evaluated the accuracy of several commercial LLMs in addressing clinical questions related to diagnosis and management of acute cholecystitis, as outlined in the Tokyo Guidelines 2018 (TG18). We assessed their congruence with the expert panel discussions presented in the guidelines.

View Article and Find Full Text PDF

Cancer gene identification through integrating causal prompting large language model with omics data-driven causal inference.

Brief Bioinform

March 2025

School of Artificial Intelligence, Jilin University, 3003 Qianjin Street, Changchun 130012, Jilin Province, China.

Identifying genes causally linked to cancer from a multi-omics perspective is essential for understanding the mechanisms of cancer and improving therapeutic strategies. Traditional statistical and machine-learning methods that rely on generalized correlation approaches to identify cancer genes often produce redundant, biased predictions with limited interpretability, largely due to overlooking confounding factors, selection biases, and the nonlinear activation function in neural networks. In this study, we introduce a novel framework for identifying cancer genes across multiple omics domains, named ICGI (Integrative Causal Gene Identification), which leverages a large language model (LLM) prompted with causality contextual cues and prompts, in conjunction with data-driven causal feature selection.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!