Using GPT-4 for LI-RADS feature extraction and categorization with multilingual free-text reports.

Kyowon Gu Jeong Hyun Lee Jaeseung Shin Jeong Ah Hwang Ji Hye Min Woo Kyoung Jeong Min Woo Lee Kyoung Doo Song Sung Hwan Bae

Liver Int

Department of Radiology, Soonchunhyang University College of Medicine, Seoul Hospital, Seoul, Republic of Korea.

Published: July 2024

The study evaluates how well GPT-4 can extract liver cancer-related information (LI-RADS features) from MRI reports written in both Korean and English.
Researchers created 160 fake reports and considered both these and genuine reports to test the model's accuracy in extracting the relevant data.
Results showed high accuracy (up to 99% for some features), but the model still needs improvements for consistency and effectiveness in real-world applications.

Background And Aims: The Liver Imaging Reporting and Data System (LI-RADS) offers a standardized approach for imaging hepatocellular carcinoma. However, the diverse styles and structures of radiology reports complicate automatic data extraction. Large language models hold the potential for structured data extraction from free-text reports. Our objective was to evaluate the performance of Generative Pre-trained Transformer (GPT)-4 in extracting LI-RADS features and categories from free-text liver magnetic resonance imaging (MRI) reports.

Methods: Three radiologists generated 160 fictitious free-text liver MRI reports written in Korean and English, simulating real-world practice. Of these, 20 were used for prompt engineering, and 140 formed the internal test cohort. Seventy-two genuine reports, authored by 17 radiologists were collected and de-identified for the external test cohort. LI-RADS features were extracted using GPT-4, with a Python script calculating categories. Accuracies in each test cohort were compared.

Results: On the external test, the accuracy for the extraction of major LI-RADS features, which encompass size, nonrim arterial phase hyperenhancement, nonperipheral 'washout', enhancing 'capsule' and threshold growth, ranged from .92 to .99. For the rest of the LI-RADS features, the accuracy ranged from .86 to .97. For the LI-RADS category, the model showed an accuracy of .85 (95% CI: .76, .93).

Conclusions: GPT-4 shows promise in extracting LI-RADS features, yet further refinement of its prompting strategy and advancements in its neural network architecture are crucial for reliable use in processing complex real-world MRI reports.

Download full-text PDF	Source
http://dx.doi.org/10.1111/liv.15891	DOI Listing

Publication Analysis

Top Keywords

li-rads features

test cohort

free-text reports

data extraction

extracting li-rads

free-text liver

mri reports

external test

li-rads

reports

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!