Artificial intelligence model GPT4 narrowly fails simulated radiological protection exam.

J Radiol Prot

MSKCC, 1275 York Avenue, New York, NY 10065, United States of America.

Published: January 2024

This study assesses the efficacy of Generative Pre-Trained Transformers (GPT) published by OpenAI in the specialised domains of radiological protection and health physics. Utilising a set of 1064 surrogate questions designed to mimic a health physics certification exam, we evaluated the models' ability to accurately respond to questions across five knowledge domains. Our results indicated that neither model met the 67% passing threshold, with GPT-3.5 achieving a 45.3% weighted average and GPT-4 attaining 61.7%. Despite GPT-4's significant parameter increase and multimodal capabilities, it demonstrated superior performance in all categories yet still fell short of a passing score. The study's methodology involved a simple, standardised prompting strategy without employing prompt engineering or in-context learning, which are known to potentially enhance performance. The analysis revealed that GPT-3.5 formatted answers more correctly, despite GPT-4's higher overall accuracy. The findings suggest that while GPT-3.5 and GPT-4 show promise in handling domain-specific content, their application in the field of radiological protection should be approached with caution, emphasising the need for human oversight and verification.

Download full-text PDF

Source
http://dx.doi.org/10.1088/1361-6498/ad1fdfDOI Listing

Publication Analysis

Top Keywords

radiological protection
12
health physics
8
despite gpt-4's
8
artificial intelligence
4
intelligence model
4
model gpt4
4
gpt4 narrowly
4
narrowly fails
4
fails simulated
4
simulated radiological
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!