Background: The large language model ChatGPT can now accept image input with the GPT4-vision (GPT4V) version. We aimed to compare the performance of GPT4V to pretrained U-Net and vision transformer (ViT) models for the identification of the progression of multiple sclerosis (MS) on magnetic resonance imaging (MRI).

Methods: Paired coregistered MR images with and without progression were provided as input to ChatGPT4V in a zero-shot experiment to identify radiologic progression. Its performance was compared to pretrained U-Net and ViT models. Accuracy was the primary evaluation metric and 95% confidence interval (CIs) were calculated by bootstrapping. We included 170 patients with MS (50 males, 120 females), aged 21-74 years (mean 42.3), imaged at a single institution from 2019 to 2021, each with 2-5 MRI studies (496 in total).

Results: One hundred seventy patients were included, 110 for training, 30 for tuning, and 30 for testing; 100 unseen paired images were randomly selected from the test set for evaluation. Both U-Net and ViT had 94% (95% CI: 89-98%) accuracy while GPT4V had 85% (77-91%). GPT4V gave cautious nonanswers in six cases. GPT4V had precision (specificity), recall (sensitivity), and F1 score of 89% (75-93%), 92% (82-98%), 91 (82-97%) compared to 100% (100-100%), 88 (78-96%), and 0.94 (88-98%) for U-Net and 94% (87-100%), 94 (88-100%), and 94 (89-98%) for ViT.

Conclusion: The performance of GPT4V combined with its accessibility suggests has the potential to impact AI radiology research. However, misclassified cases and overly cautious non-answers confirm that it is not yet ready for clinical use.

Relevance Statement: GPT4V can identify the radiologic progression of MS in a simplified experimental setting. However, GPT4V is not a medical device, and its widespread availability highlights the need for caution and education for lay users, especially those with limited access to expert healthcare.

Key Points: Without fine-tuning or the need for prior coding experience, GPT4V can perform a zero-shot radiologic change detection task with reasonable accuracy. However, in absolute terms, in a simplified "spot the difference" medical imaging task, GPT4V was inferior to state-of-the-art computer vision methods. GPT4V's performance metrics were more similar to the ViT than the U-net. This is an exploratory experimental study and GPT4V is not intended for use as a medical device.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11735712PMC
http://dx.doi.org/10.1186/s41747-024-00547-wDOI Listing

Publication Analysis

Top Keywords

identify radiologic
12
radiologic progression
12
gpt4v
11
progression multiple
8
multiple sclerosis
8
performance gpt4v
8
pretrained u-net
8
vit models
8
u-net vit
8
medical device
8

Similar Publications

Importance: Cannabis use has increased globally, but its effects on brain function are not fully known, highlighting the need to better determine recent and long-term brain activation outcomes of cannabis use.

Objective: To examine the association of lifetime history of heavy cannabis use and recent cannabis use with brain activation across a range of brain functions in a large sample of young adults in the US.

Design, Setting, And Participants: This cross-sectional study used data (2017 release) from the Human Connectome Project (collected between August 2012 and 2015).

View Article and Find Full Text PDF

Purpose: We present the case of a rare extrahepatic portocaval shunt that resulted in communication of the portal vein and the inferior vena cava (IVC) at the level between two right renal veins that was incidentally diagnosed with contrast-enhanced computed tomography (CECT) in an asymptomatic patient.

Methods: A woman in her sixties with abdominal pain and diarrhea of unclear origin underwent exploratory abdominal CECT.

Results: The CECT incidentally revealed an extrahepatic portocaval shunt, whereby a vessel arising from the portal vein superior to the confluence of the superior mesenteric and splenic veins drained into the posterior aspect of the IVC between two right renal veins.

View Article and Find Full Text PDF

Background: Ductal carcinoma in situ (DCIS) is overtreated, in part because of inability to predict which DCIS cases diagnosed at core needle biopsy (CNB) will be upstaged at excision. This study aimed to determine whether quantitative magnetic resonance imaging (MRI) features can identify DCIS at risk of upstaging to invasive cancer.

Methods: This prospective observational clinical trial analyzed women with a diagnosis of DCIS on CNB.

View Article and Find Full Text PDF

Purpose: To describe a case of short common trunk of the occipital artery (OA) and ascending pharyngeal artery (APA) arising from the internal carotid artery (ICA).

Methods: A 36-year-old woman with a history of surgical resection of a right lateral ventricular meningioma and atheromatous plaque of the right ICA underwent cranial magnetic resonance (MR) imaging and MR angiography of the head and neck region with a 3-Tesla scanner.

Results: MR angiography of the neck region showed a small atheromatous plaque at the origin of the right ICA and an anomalous artery arising from the posteromedial aspect of the right ICA at the distal end of the carotid bulb.

View Article and Find Full Text PDF

Background: Pneumatosis intestinalis on CT presents a diagnostic dilemma, because it could reflect bowel ischemia or benign finding.

Purpose: To determine radiological and clinical features that can predict bowel ischemia in patients with pneumatosis intestinalis on CT.

Materials And Methods: Patients with "pneumatosis" in abdominal CT reports performed between 1/1/2002 and 12/31/2018 were retrospectively included.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!